Exploiting Unlabeled Data with Vision and Language Models for Object Detection

Zhao, Shiyu; Zhang, Zhixing; Schulter, Samuel; Zhao, Long; G, Vijay Kumar B.; Stathopoulos, Anastasis; Chandraker, Manmohan; Metaxas, Dimitris

Computer Science > Computer Vision and Pattern Recognition

arXiv:2207.08954 (cs)

[Submitted on 18 Jul 2022]

Title:Exploiting Unlabeled Data with Vision and Language Models for Object Detection

Authors:Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B.G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas

View PDF

Abstract:Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categories at a large scale. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection. Starting with a generic and class-agnostic region proposal mechanism, we use vision and language models to categorize each region of an image into any object category that is required for downstream tasks. We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection, where a model needs to generalize to unseen object categories, and semi-supervised object detection, where additional unlabeled images can be used to improve the model. Our empirical evaluation shows the effectiveness of the pseudo labels in both tasks, where we outperform competitive baselines and achieve a novel state-of-the-art for open-vocabulary object detection. Our code is available at this https URL.

Comments:	Accepted to ECCV 2022 (with the supplementary document)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2207.08954 [cs.CV]
	(or arXiv:2207.08954v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2207.08954

Submission history

From: Shiyu Zhao [view email]
[v1] Mon, 18 Jul 2022 21:47:15 UTC (10,972 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Exploiting Unlabeled Data with Vision and Language Models for Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Exploiting Unlabeled Data with Vision and Language Models for Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators