File size: 7,615 Bytes
3d49622
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197

# Benchmarks

Here we benchmark the training speed of a Mask R-CNN in detectron2,
with some other popular open source Mask R-CNN implementations.


### Settings

* Hardware: 8 NVIDIA V100s with NVLink.
* Software: Python 3.7, CUDA 10.1, cuDNN 7.6.5, PyTorch 1.5,
  TensorFlow 1.15.0rc2, Keras 2.2.5, MxNet 1.6.0b20190820.
* Model: an end-to-end R-50-FPN Mask-RCNN model, using the same hyperparameter as the
  [Detectron baseline config](https://github.com/facebookresearch/Detectron/blob/master/configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml)
	(it does no have scale augmentation).
* Metrics: We use the average throughput in iterations 100-500 to skip GPU warmup time.
  Note that for R-CNN-style models, the throughput of a model typically changes during training, because
  it depends on the predictions of the model. Therefore this metric is not directly comparable with
  "train speed" in model zoo, which is the average speed of the entire training run.


### Main Results

```eval_rst
+-------------------------------+--------------------+
| Implementation                | Throughput (img/s) |
+===============================+====================+
| |D2| |PT|                     | 62                 |
+-------------------------------+--------------------+
| mmdetection_  |PT|            | 53                 |
+-------------------------------+--------------------+
| maskrcnn-benchmark_  |PT|     | 53                 |
+-------------------------------+--------------------+
| tensorpack_ |TF|              | 50                 |
+-------------------------------+--------------------+
| simpledet_ |mxnet|            | 39                 |
+-------------------------------+--------------------+
| Detectron_  |C2|              | 19                 |
+-------------------------------+--------------------+
| `matterport/Mask_RCNN`__ |TF| | 14                 |
+-------------------------------+--------------------+

.. _maskrcnn-benchmark: https://github.com/facebookresearch/maskrcnn-benchmark/
.. _tensorpack: https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN
.. _mmdetection: https://github.com/open-mmlab/mmdetection/
.. _simpledet: https://github.com/TuSimple/simpledet/
.. _Detectron: https://github.com/facebookresearch/Detectron
__ https://github.com/matterport/Mask_RCNN/

.. |D2| image:: https://github.com/facebookresearch/detectron2/raw/master/.github/Detectron2-Logo-Horz.svg?sanitize=true
   :height: 15pt
   :target: https://github.com/facebookresearch/detectron2/
.. |PT| image:: https://pytorch.org/assets/images/logo-icon.svg
   :width: 15pt
   :height: 15pt
   :target: https://pytorch.org
.. |TF| image:: https://static.nvidiagrid.net/ngc/containers/tensorflow.png
   :width: 15pt
   :height: 15pt
   :target: https://tensorflow.org
.. |mxnet| image:: https://github.com/dmlc/web-data/raw/master/mxnet/image/mxnet_favicon.png
   :width: 15pt
   :height: 15pt
   :target: https://mxnet.apache.org/
.. |C2| image:: https://caffe2.ai/static/logo.svg
   :width: 15pt
   :height: 15pt
   :target: https://caffe2.ai
```


Details for each implementation:

* __Detectron2__: with release v0.1.2, run:
  ```
  python tools/train_net.py  --config-file configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml --num-gpus 8
  ```

* __mmdetection__: at commit `b0d845f`, run
  ```
  ./tools/dist_train.sh configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_1x_coco.py 8
  ```

* __maskrcnn-benchmark__: use commit `0ce8f6f` with `sed -i ‘s/torch.uint8/torch.bool/g’ **/*.py; sed -i 's/AT_CHECK/TORCH_CHECK/g' **/*.cu`
	to make it compatible with PyTorch 1.5. Then, run training with
  ```
  python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml
  ```
  The speed we observed is faster than its model zoo, likely due to different software versions.

* __tensorpack__: at commit `caafda`, `export TF_CUDNN_USE_AUTOTUNE=0`, then run
  ```
  mpirun -np 8 ./train.py --config DATA.BASEDIR=/data/coco TRAINER=horovod BACKBONE.STRIDE_1X1=True TRAIN.STEPS_PER_EPOCH=50 --load ImageNet-R50-AlignPadding.npz
  ```

* __SimpleDet__: at commit `9187a1`, run
  ```
  python detection_train.py --config config/mask_r50v1_fpn_1x.py
  ```

* __Detectron__: run
  ```
  python tools/train_net.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml
  ```
  Note that many of its ops run on CPUs, therefore the performance is limited.

* __matterport/Mask_RCNN__: at commit `3deaec`, apply the following diff, `export TF_CUDNN_USE_AUTOTUNE=0`, then run
  ```
  python coco.py train --dataset=/data/coco/ --model=imagenet
  ```
  Note that many small details in this implementation might be different
  from Detectron's standards.

  <details>
  <summary>
  (diff to make it use the same hyperparameters - click to expand)
  </summary>

  ```diff
  diff --git i/mrcnn/model.py w/mrcnn/model.py
  index 62cb2b0..61d7779 100644
  --- i/mrcnn/model.py
  +++ w/mrcnn/model.py
  @@ -2367,8 +2367,8 @@ class MaskRCNN():
        epochs=epochs,
        steps_per_epoch=self.config.STEPS_PER_EPOCH,
        callbacks=callbacks,
  -            validation_data=val_generator,
  -            validation_steps=self.config.VALIDATION_STEPS,
  +            #validation_data=val_generator,
  +            #validation_steps=self.config.VALIDATION_STEPS,
        max_queue_size=100,
        workers=workers,
        use_multiprocessing=True,
  diff --git i/mrcnn/parallel_model.py w/mrcnn/parallel_model.py
  index d2bf53b..060172a 100644
  --- i/mrcnn/parallel_model.py
  +++ w/mrcnn/parallel_model.py
  @@ -32,6 +32,7 @@ class ParallelModel(KM.Model):
      keras_model: The Keras model to parallelize
      gpu_count: Number of GPUs. Must be > 1
      """
  +        super().__init__()
      self.inner_model = keras_model
      self.gpu_count = gpu_count
      merged_outputs = self.make_parallel()
  diff --git i/samples/coco/coco.py w/samples/coco/coco.py
  index 5d172b5..239ed75 100644
  --- i/samples/coco/coco.py
  +++ w/samples/coco/coco.py
  @@ -81,7 +81,10 @@ class CocoConfig(Config):
    IMAGES_PER_GPU = 2

    # Uncomment to train on 8 GPUs (default is 1)
  -    # GPU_COUNT = 8
  +    GPU_COUNT = 8
  +    BACKBONE = "resnet50"
  +    STEPS_PER_EPOCH = 50
  +    TRAIN_ROIS_PER_IMAGE = 512

    # Number of classes (including background)
    NUM_CLASSES = 1 + 80  # COCO has 80 classes
  @@ -496,29 +499,10 @@ if __name__ == '__main__':
      # *** This training schedule is an example. Update to your needs ***

      # Training - Stage 1
  -        print("Training network heads")
      model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=40,
  -                    layers='heads',
  -                    augmentation=augmentation)
  -
  -        # Training - Stage 2
  -        # Finetune layers from ResNet stage 4 and up
  -        print("Fine tune Resnet stage 4 and up")
  -        model.train(dataset_train, dataset_val,
  -                    learning_rate=config.LEARNING_RATE,
  -                    epochs=120,
  -                    layers='4+',
  -                    augmentation=augmentation)
  -
  -        # Training - Stage 3
  -        # Fine tune all layers
  -        print("Fine tune all layers")
  -        model.train(dataset_train, dataset_val,
  -                    learning_rate=config.LEARNING_RATE / 10,
  -                    epochs=160,
  -                    layers='all',
  +                    layers='3+',
            augmentation=augmentation)

    elif args.command == "evaluate":
  ```

  </details>