Temporal Transductive Inference for Few-Shot Video Object Segmentation

Siam, Mennatullah; Derpanis, Konstantinos G.; Wildes, Richard P.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2203.14308 (cs)

[Submitted on 27 Mar 2022 (v1), last revised 16 Jul 2023 (this version, v2)]

Title:Temporal Transductive Inference for Few-Shot Video Object Segmentation

Authors:Mennatullah Siam, Konstantinos G. Derpanis, Richard P. Wildes

View PDF

Abstract:Few-shot video object segmentation (FS-VOS) aims at segmenting video frames using a few labelled examples of classes not seen during initial training. In this paper, we present a simple but effective temporal transductive inference (TTI) approach that leverages temporal consistency in the unlabelled video frames during few-shot inference. Key to our approach is the use of both global and local temporal constraints. The objective of the global constraint is to learn consistent linear classifiers for novel classes across the image sequence, whereas the local constraint enforces the proportion of foreground/background regions in each frame to be coherent across a local temporal window. These constraints act as spatiotemporal regularizers during the transductive inference to increase temporal coherence and reduce overfitting on the few-shot support set. Empirically, our model outperforms state-of-the-art meta-learning approaches in terms of mean intersection over union on YouTube-VIS by 2.8%. In addition, we introduce improved benchmarks that are exhaustively labelled (i.e. all object occurrences are labelled, unlike the currently available), and present a more realistic evaluation paradigm that targets data distribution shift between training and testing sets. Our empirical results and in-depth analysis confirm the added benefits of the proposed spatiotemporal regularizers to improve temporal coherence and overcome certain overfitting scenarios.

Comments:	IJCV submission under review
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2203.14308 [cs.CV]
	(or arXiv:2203.14308v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2203.14308

Submission history

From: Mennatullah Siam M.S. [view email]
[v1] Sun, 27 Mar 2022 14:08:30 UTC (11,737 KB)
[v2] Sun, 16 Jul 2023 13:31:17 UTC (15,994 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Temporal Transductive Inference for Few-Shot Video Object Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Temporal Transductive Inference for Few-Shot Video Object Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators