Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation

Tang, Yushun; Chen, Shuoshuo; Kan, Zhehan; Zhang, Yi; Guo, Qinghai; He, Zhihai

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.19341 (cs)

[Submitted on 27 Jun 2024 (v1), last revised 17 Jul 2024 (this version, v3)]

Title:Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation

Authors:Yushun Tang, Shuoshuo Chen, Zhehan Kan, Yi Zhang, Qinghai Guo, Zhihai He

View PDF HTML (experimental)

Abstract:Fully test-time adaptation aims to adapt the network model based on sequential analysis of input samples during the inference stage to address the cross-domain performance degradation problem of deep neural networks. This work is based on the following interesting finding: in transformer-based image classification, the class token at the first transformer encoder layer can be learned to capture the domain-specific characteristics of target samples during test-time adaptation. This learned token, when combined with input image patch embeddings, is able to gradually remove the domain-specific information from the feature representations of input samples during the transformer encoding process, thereby significantly improving the test-time adaptation performance of the source model across different domains. We refer to this class token as visual conditioning token (VCT). To successfully learn the VCT, we propose a bi-level learning approach to capture the long-term variations of domain-specific characteristics while accommodating local variations of instance-specific characteristics. Experimental results on the benchmark datasets demonstrate that our proposed bi-level visual conditioning token learning method is able to achieve significantly improved test-time adaptation performance by up to 1.9%.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.19341 [cs.CV]
	(or arXiv:2406.19341v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.19341

Submission history

From: Yushun Tang [view email]
[v1] Thu, 27 Jun 2024 17:16:23 UTC (4,076 KB)
[v2] Thu, 4 Jul 2024 10:54:44 UTC (4,076 KB)
[v3] Wed, 17 Jul 2024 01:57:44 UTC (4,076 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators