CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

Alabdulmohsin, Ibrahim; Wang, Xiao; Steiner, Andreas; Goyal, Priya; D'Amour, Alexander; Zhai, Xiaohua

Computer Science > Machine Learning

arXiv:2403.04547 (cs)

[Submitted on 7 Mar 2024]

Title:CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

Authors:Ibrahim Alabdulmohsin, Xiao Wang, Andreas Steiner, Priya Goyal, Alexander D'Amour, Xiaohua Zhai

View PDF HTML (experimental)

Abstract:We study the effectiveness of data-balancing for mitigating biases in contrastive language-image pretraining (CLIP), identifying areas of strength and limitation. First, we reaffirm prior conclusions that CLIP models can inadvertently absorb societal stereotypes. To counter this, we present a novel algorithm, called Multi-Modal Moment Matching (M4), designed to reduce both representation and association biases (i.e. in first- and second-order statistics) in multimodal data. We use M4 to conduct an in-depth analysis taking into account various factors, such as the model, representation, and data size. Our study also explores the dynamic nature of how CLIP learns and unlearns biases. In particular, we find that fine-tuning is effective in countering representation biases, though its impact diminishes for association biases. Also, data balancing has a mixed impact on quality: it tends to improve classification but can hurt retrieval. Interestingly, data and architectural improvements seem to mitigate the negative impact of data balancing on performance; e.g. applying M4 to SigLIP-B/16 with data quality filters improves COCO image-to-text retrieval @5 from 86% (without data balancing) to 87% and ImageNet 0-shot classification from 77% to 77.5%! Finally, we conclude with recommendations for improving the efficacy of data balancing in multimodal systems.

Comments:	32 pages, 20 figures, 7 tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2403.04547 [cs.LG]
	(or arXiv:2403.04547v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.04547
Journal reference:	ICLR 2024

Submission history

From: Ibrahim Alabdulmohsin [view email]
[v1] Thu, 7 Mar 2024 14:43:17 UTC (8,643 KB)

Computer Science > Machine Learning

Title:CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators