0. 基本资料

0.1 相关文本链接

YOLO-X论文链接：

YOLOX: Exceeding YOLO Series in 2021

0.2 相关代码链接

该文章将MMDeteciton中的代码与YOLO-X论文中解耦头的理论部分相对应，并在代码中加以注释。

1. 解耦头 Decoupled head

Decoupled head

In object detection, the conflict between classification and regression tasks is a well-known problem [27, 35]. Thus the decoupled head for classification and localization is widely used in the most of one-stage and two-stage detectors [16, 29, 34, 35]. However, as YOLO series’ backbones and feature pyramids ( e.g., FPN [13], PAN [19].) continuously evolving, their detection heads remain coupled as shown in Fig. 2.

Our two analytical experiments indicate that the coupled detection head may harm the performance. 1). Replacing YOLO’s head with a decoupled one greatly improves the converging speed as shown in Fig. 3. 2). The decoupled head is essential to the end-to-end version of YOLO (will be described next). One can tell from Tab. 1, the end-to-end property decreases by 4.2% AP with the coupled head, while the decreasing reduces to 0.8% AP for a decoupled head. We thus replace the YOLO detect head with a lite decoupled head as in Fig. 1. Concretely, it contains a 1×1 conv layer to reduce the channel dimension, followed by two parallel branches with two 3×3 conv layers respectively. We report the inference time with batch=1 on V100 in Tab. 2 and the lite decoupled head brings additional 1.1 ms (11.6 ms v.s. 10.5 ms).

解耦头

在目标检测中，分类和回归任务之间的冲突是一个众所周知的问题 [27, 35]。因此，用于分类和定位的解耦头被广泛用于大多数单级和两级检测器[16、29、34、35]。然而，随着 YOLO 系列的主干和特征金字塔（例如 FPN [13]、PAN [19]）不断发展，它们的检测头保持耦合，如图 2 所示。

我们的两个分析实验表明，耦合检测头可能会损害性能。 1）。用解耦的头部替换 YOLO 的头部大大提高了收敛速度，如图 3.2）所示。解耦头对于 YOLO 的端到端版本是必不可少的（将在下面进行描述），从Tab可以看出。如图1所示，耦合头的端到端属性降低了4.2% AP，而解耦头降低了0.8% AP。因此，我们将 YOLO 检测头替换为 lite 解耦头，如图 2 所示。具体来说，它包含一个 1×1 卷积层以减少通道维度，然后是两个平行分支，分别具有两个 3×3 卷积层。我们在选项卡的 V100 上报告了 batch=1 的推理时间。 2 和 lite 解耦头带来额外的 1.1 毫秒（11.6 毫秒 vs. 10.5 毫秒）。

Untitled

Figure 1: Illustration of the difference between YOLOv3 head and the proposed decoupled head. For each level of FPN feature, we first adopt a 1×1 conv layer to reduce the feature channel to 256 and then add two parallel branches with two 3×3 conv layers each for classification and regression tasks respectively. IoU branch is added on the regression branch.

Table 1: The effect of decoupled head for end-to-end YOLO in terms of AP (%) on COCO.

Models	Coupled Head	Decoupled Head
Vanilla YOLO	38.5	39.6
End-to-end YOLO	34.3 (-4.2)	38.8 (-0.8)

代码实现

查看YOLO-X配置文件，可看到MMDetection框架中YOLO-X的定义为

img_scale = (640, 640)  # height, width

# model settings
model = dict(
    type='YOLOX',
    input_size=img_scale,
    random_size_range=(15, 25),
    random_size_interval=10,
    backbone=dict(type='CSPDarknet', deepen_factor=0.33, widen_factor=0.5),
    neck=dict(
        type='YOLOXPAFPN',
        in_channels=[128, 256, 512],
        out_channels=128,
        num_csp_blocks=1),
    bbox_head=dict(
        type='YOLOXHead', num_classes=80, in_channels=128, feat_channels=128),
    train_cfg=dict(assigner=dict(type='SimOTAAssigner', center_radius=2.5)),
    # In order to align the source code, the threshold of the val phase is
    # 0.01, and the threshold of the test phase is 0.001.
    test_cfg=dict(score_thr=0.01, nms=dict(type='nms', iou_threshold=0.65)))

可以看到YOLO-X的bbox_head使用的类为YOLOXHead，在源代码中寻找该类，可得到YOLOXHead的定义文件为：https://github.com/open-mmlab/mmdetection/blob/240d7a31c745578aa8c4df54c3074ce78b690c34/mmdet/models/dense_heads/yolox_head.py