Gaussian Dynamic Convolution for Semantic Segmentation in Remote Sensing Images
Abstract
:1. Introduction
- (1)
- We introduce the GDConv layer to the field of semantic segmentation for high-resolution remote sensing images. It can dynamically adjust the size of the receptive field to make the extracted multi-scale features rich and vivid.
- (2)
- We construct a Gaussian pyramid pooling (GPP) module and a Gaussian dynamic convolutional network (GDCN) to obtain high accuracy of the multi-scale object segmentation.
- (3)
2. Related Work
2.1. General Semantic Segmentation
2.2. Semantic Segmentation in Remote Sensing
3. Method
3.1. Gaussian Dynamic Convolution
3.2. Gaussian Dynamic Convolution Network
4. Experiments
4.1. Dataset and Metric
4.2. Experimental Setting
4.3. Performance
4.4. Ablation Study
5. Conclusions
Funding
Conflicts of Interest
References
- Zhang, M.; Hu, X.; Zhao, L.; Lv, Y.; Luo, M.; Pang, S. Learning dual multi-scale manifold ranking for semantic segmentation of high-resolution images. Remote Sens. 2017, 9, 500. [Google Scholar] [CrossRef] [Green Version]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Chen, G.; Tan, X.; Guo, B.; Zhu, K.; Liao, P.; Wang, T.; Wang, Q.; Zhang, X. SDFCNv2: An Improved FCN Framework for Remote Sensing Images Semantic Segmentation. Remote Sens. 2021, 13, 4902. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Waqas Zamir, S.; Arora, A.; Gupta, A.; Khan, S.; Sun, G.; Shahbaz Khan, F.; Zhu, F.; Shao, L.; Xia, G.S.; Bai, X. isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seoul, Korea, 27–28 October 2019; pp. 28–37. [Google Scholar]
- Wang, J.; Zheng, Z.; Ma, A.; Lu, X.; Zhong, Y. LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation. arXiv 2021, arXiv:2110.08733. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Sun, X.; Chen, C.; Wang, X.; Dong, J.; Zhou, H.; Chen, S. Gaussian dynamic convolution for efficient single-image segmentation. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2937–2948. [Google Scholar] [CrossRef]
- Lv, Q.; Feng, M.; Sun, X.; Dong, J.; Chen, C.; Zhang, Y. Embedded Attention Network for Semantic Segmentation. IEEE Robot. Autom. Lett. 2021, 7, 326–333. [Google Scholar] [CrossRef]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef] [Green Version]
- Johnson, B.A.; Ma, L. Image segmentation and object-based image analysis for environmental monitoring: Recent areas of interest, researchers’ views on the future priorities. Remote Sens. 2020, 11, 1772. [Google Scholar] [CrossRef]
- Li, Q.; Zorzi, S.; Shi, Y.; Fraundorfer, F.; Zhu, X.X. RegGAN: An End-to-End Network for Building Footprint Generation with Boundary Regularization. Remote Sens. 2022, 14, 1835. [Google Scholar] [CrossRef]
- Chen, C.; Zhang, Y.; Lv, Q.; Wei, S.; Wang, X.; Sun, X.; Dong, J. Rrnet: A hybrid detector for object detection in drone-captured images. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Nitze, I.; Heidler, K.; Barth, S.; Grosse, G. Developing and Testing a Deep Learning Approach for Mapping Retrogressive Thaw Slumps. Remote Sens. 2021, 13, 4294. [Google Scholar] [CrossRef]
- Sun, X.; Zhang, M.; Dong, J.; Lguensat, R.; Yang, Y.; Lu, X. A deep framework for eddy detection and tracking from satellite sea surface height data. IEEE Trans. Geosci. Remote Sens. 2020, 59, 7224–7234. [Google Scholar] [CrossRef]
- Guo, J.; Xu, Q.; Zeng, Y.; Liu, Z.; Zhu, X. Semi-Supervised Cloud Detection in Satellite Images by Considering the Domain Shift Problem. Remote Sens. 2022, 14, 2641. [Google Scholar] [CrossRef]
- Mou, L.; Hua, Y.; Zhu, X.X. Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7557–7569. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4096–4105. [Google Scholar]
- Ma, A.; Wang, J.; Zhong, Y.; Zheng, Z. FactSeg: Foreground Activation-Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
- Yang, F.; Yuan, X.; Ran, J.; Shu, W.; Zhao, Y.; Qin, A.; Gao, C. Accurate Instance Segmentation for Remote Sensing Images via Adaptive and Dynamic Feature Learning. Remote Sens. 2021, 13, 4774. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, C.; Li, R.; Duan, C.; Meng, X.; Atkinson, P.M. Scale-aware neural network for semantic segmentation of multi-resolution remote sensing images. Remote Sens. 2021, 13, 5015. [Google Scholar] [CrossRef]
Method | mIoU(%) | Ship | ST | BD | TC | BC | GTF | Bridge |
---|---|---|---|---|---|---|---|---|
Deeplab v3+ [4] | 59.33 | 59.02 | 55.15 | 75.94 | 84.18 | 58.52 | 59.24 | 32.11 |
PSPNet [3] | 60.25 | 65.2 | 52.1 | 75.7 | 85.57 | 61.12 | 60.15 | 32.46 |
DCN [7] | 60.12 | 61.24 | 54.69 | 72.88 | 82.96 | 55.32 | 55.46 | 34.58 |
FarSeg [25] | 63.71 | 65.38 | 61.80 | 77.73 | 86.35 | 62.08 | 56.70 | 36.70 |
FactSeg [26] | 63.79 | 68.34 | 56.83 | 78.36 | 88.91 | 64.89 | 54.60 | 36.34 |
ours | 64.22 | 69.33 | 62.2 | 78.48 | 85.59 | 65.26 | 60.28 | 37.23 |
Method | LV | SV | HC | SP | RA | SBF | Plane | Harbor |
Deeplabv3+ [4] | 54.54 | 33.79 | 31.14 | 44.24 | 67.51 | 73.78 | 75.70 | 45.76 |
PSPNet [3] | 58.03 | 42.96 | 40.89 | 46.78 | 68.6 | 71.9 | 79.5 | 54.26 |
DCN [7] | 56.25 | 39.25 | 33.36 | 47.77 | 69.81 | 70.33 | 76.21 | 49.85 |
FarSeg [25] | 60.59 | 46.34 | 35.82 | 51.21 | 71.35 | 72.53 | 82.03 | 53.91 |
FactSeg [26] | 62.65 | 49.53 | 42.72 | 51.47 | 69.42 | 73.55 | 84.13 | 55.74 |
ours | 61.67 | 49.85 | 42.85 | 52.01 | 69.65 | 73.25 | 84.29 | 56.71 |
Method | mIoU(%) |
---|---|
DeepLab v3+ [4] | 47.62 |
PSPNet [3] | 48.31 |
DCN [7] | 47.92 |
FarSeg [25] | 48.69 |
FactSeg [26] | 48.94 |
ours | 49.75 |
Methods | mIoU(%) |
---|---|
1 × Dilated Conv (dilation = 6) | 58.4 |
1 × Deformable Conv | 56.6 |
1 × GDConv (base = 6, Σ = 2) | 60.7 |
2 × Dilated Conv (dilation = 6, 12) | 61.3 |
2 × Deformable Conv | 59.3 |
2 × GDConv (base = 6, 12, Σ = 2) | 64.2 |
Σ | 1.0 | 1.5 | 2.0 | 2.5 |
mIoU (%) | 63.5 | 64.4 | 64.9 | 64.2 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Feng, M.; Sun, X.; Dong, J.; Zhao, H. Gaussian Dynamic Convolution for Semantic Segmentation in Remote Sensing Images. Remote Sens. 2022, 14, 5736. https://doi.org/10.3390/rs14225736
Feng M, Sun X, Dong J, Zhao H. Gaussian Dynamic Convolution for Semantic Segmentation in Remote Sensing Images. Remote Sensing. 2022; 14(22):5736. https://doi.org/10.3390/rs14225736
Chicago/Turabian StyleFeng, Mingzhe, Xin Sun, Junyu Dong, and Haoran Zhao. 2022. "Gaussian Dynamic Convolution for Semantic Segmentation in Remote Sensing Images" Remote Sensing 14, no. 22: 5736. https://doi.org/10.3390/rs14225736