PulseFocusPlatform/static/configs/gcnet/README_cn.md

70 lines
3.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
## 简介
Nonlocal基于自注意力机制给出了捕捉长时依赖的方法但是在该论文中作者通过可视化分析发现相同图像中对于不同位置点的attention map几乎是一致的也就是说在Nonlocal计算过程中有很大的资源浪费冗余计算。SENet使用全局上下文对不同的通道进行权重标定计算量很小但是这样无法充分利用全局上下文信息。论文中作者结合了Nonlocal和SENet两者的优点提出了GCNet模块在保证较小计算量的情况下很好地融合了全局上下文信息。
论文中基于attention map差距很小的现象设计了simplified nonlocal结构SNL结构如下图所示对所有位置共享全局attention map。
<div align="center">
<img src="../../docs/images/models/gcnet_snl_module.png" width="300">
</div>
SNL的网络输出计算如下
<div align="center">
<img src="../../docs/images/models/gcnet_snl_out.png" width="400">
</div>
为进一步减少计算量,将$W_v$提取到attention pooling计算的外面表示为
<div align="center">
<img src="../../docs/images/models/gcnet_snl_out_simple.png" width="400">
</div>
对应结构如下所示。通过共享attention map计算量减少为之前的1/WH。
<div align="center">
<img src="../../docs/images/models/gcnet_snl_module_simple.png" width="250">
</div>
SNL模块可以抽象为上下文建模、特征转换和特征聚合三个部分特征转化部分有大量参数因此在这里参考SE的结构最终GC block的结构如下所示。使用两层降维的1*1卷积降低计算量由于两层卷积参数较难优化在这里加入layer normalization的正则化层降低优化难度。
<div align="center">
<img src="../../docs/images/models/gcnet_gcblock_module.png" width="300">
</div>
该模块可以很方便地插入到骨干网络中,提升模型的全局上下文表达能力,可以提升检测和分割任务的模型性能。
## 模型库
| 骨架网络 | 网络类型 | Context设置 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP | 下载 | 配置文件 |
| :---------------------- | :-------------: | :-------------: | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: |
| ResNet50-vd-FPN | Mask | GC(c3-c5, r16, add) | 2 | 2x | 15.31 | 41.4 | 36.8 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_add_r16_2x.yml) |
| ResNet50-vd-FPN | Mask | GC(c3-c5, r16, mul) | 2 | 2x | 15.35 | 40.7 | 36.1 | [model](https://paddlemodels.bj.bcebos.com/object_detection/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/static/configs/gcnet/mask_rcnn_r50_vd_fpn_gcb_mul_r16_2x.yml) |
## 引用
```
@article{DBLP:journals/corr/abs-1904-11492,
author = {Yue Cao and
Jiarui Xu and
Stephen Lin and
Fangyun Wei and
Han Hu},
title = {GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond},
journal = {CoRR},
volume = {abs/1904.11492},
year = {2019},
url = {http://arxiv.org/abs/1904.11492},
archivePrefix = {arXiv},
eprint = {1904.11492},
timestamp = {Tue, 09 Jul 2019 16:48:55 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1904-11492},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```