LiT: Zero-Shot Transfer with Locked-image text Tuning

Zhai, Xiaohua; Wang, Xiao; Mustafa, Basil; Steiner, Andreas; Keysers, Daniel; Kolesnikov, Alexander; Beyer, Lucas

Computer Science > Computer Vision and Pattern Recognition

arXiv:2111.07991 (cs)

[Submitted on 15 Nov 2021 (v1), last revised 22 Jun 2022 (this version, v3)]

Title:LiT: Zero-Shot Transfer with Locked-image text Tuning

Authors:Xiaohua Zhai, Xiao Wang, Basil Mustafa, Andreas Steiner, Daniel Keysers, Alexander Kolesnikov, Lucas Beyer

View PDF

Abstract:This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training. In our empirical study we find that locked pre-trained image models with unlocked text models work best. We call this instance of contrastive-tuning "Locked-image Tuning" (LiT), which just teaches a text model to read out good representations from a pre-trained image model for new tasks. A LiT model gains the capability of zero-shot transfer to new vision tasks, such as image classification or retrieval. The proposed LiT is widely applicable; it works reliably with multiple pre-training methods (supervised and unsupervised) and across diverse architectures (ResNet, Vision Transformers and MLP-Mixer) using three different image-text datasets. With the transformer-based pre-trained ViT-g/14 model, the LiT model achieves 85.2% zero-shot transfer accuracy on the ImageNet test set, and 82.5% on the challenging out-of-distribution ObjectNet test set.

Comments:	Xiaohua, Xiao, Basil, Andreas and Lucas contributed equally; CVPR 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2111.07991 [cs.CV]
	(or arXiv:2111.07991v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2111.07991

Submission history

From: Xiaohua Zhai [view email]
[v1] Mon, 15 Nov 2021 18:53:48 UTC (251 KB)
[v2] Fri, 25 Mar 2022 16:24:53 UTC (1,669 KB)
[v3] Wed, 22 Jun 2022 14:43:02 UTC (1,713 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LiT: Zero-Shot Transfer with Locked-image text Tuning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LiT: Zero-Shot Transfer with Locked-image text Tuning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators