MlTr: Multi-label Classification with Transformer

Cheng, Xing; Lin, Hezheng; Wu, Xiangyu; Yang, Fan; Shen, Dong; Wang, Zhongyuan; Shi, Nian; Liu, Honglin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2106.06195 (cs)

[Submitted on 11 Jun 2021]

Title:MlTr: Multi-label Classification with Transformer

Authors:Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Nian Shi, Honglin Liu

View PDF

Abstract:The task of multi-label image classification is to recognize all the object labels presented in an image. Though advancing for years, small objects, similar objects and objects with high conditional probability are still the main bottlenecks of previous convolutional neural network(CNN) based models, limited by convolutional kernels' representational capacity. Recent vision transformer networks utilize the self-attention mechanism to extract the feature of pixel granularity, which expresses richer local semantic information, while is insufficient for mining global spatial dependence. In this paper, we point out the three crucial problems that CNN-based methods encounter and explore the possibility of conducting specific transformer modules to settle them. We put forward a Multi-label Transformer architecture(MlTr) constructed with windows partitioning, in-window pixel attention, cross-window attention, particularly improving the performance of multi-label image classification tasks. The proposed MlTr shows state-of-the-art results on various prevalent multi-label datasets such as MS-COCO, Pascal-VOC, and NUS-WIDE with 88.5%, 95.8%, and 65.5% respectively. The code will be available soon at this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2106.06195 [cs.CV]
	(or arXiv:2106.06195v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2106.06195

Submission history

From: Xing Cheng [view email]
[v1] Fri, 11 Jun 2021 06:53:09 UTC (27,039 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MlTr: Multi-label Classification with Transformer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MlTr: Multi-label Classification with Transformer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators