Multi-Prototype Space Learning for Commonsense-Based Scene Graph Generation

Authors

  • Lianggangxu Chen East China Normal University
  • Youqi Song East China Normal University
  • Yiqing Cai East China Normal University
  • Jiale Lu East China Normal University
  • Yang Li East China Normal University
  • Yuan Xie East China Normal University
  • Changbo Wang East China Normal University
  • Gaoqi He East China Normal University Chongqing Key Laboratory of Precision Optics

DOI:

https://doi.org/10.1609/aaai.v38i2.27874

Keywords:

CV: Scene Analysis & Understanding, CV: Language and Vision

Abstract

In the domain of scene graph generation, modeling commonsense as a single-prototype representation has been typically employed to facilitate the recognition of infrequent predicates. However, a fundamental challenge lies in the large intra-class variations of the visual appearance of predicates, resulting in subclasses within a predicate class. Such a challenge typically leads to the problem of misclassifying diverse predicates due to the rough predicate space clustering. In this paper, inspired by cognitive science, we maintain multi-prototype representations for each predicate class, which can accurately find the multiple class centers of the predicate space. Technically, we propose a novel multi-prototype learning framework consisting of three main steps: prototype-predicate matching, prototype updating, and prototype space optimization. We first design a triple-level optimal transport to match each predicate feature within the same class to a specific prototype. In addition, the prototypes are updated using momentum updating to find the class centers according to the matching results. Finally, we enhance the inter-class separability of the prototype space through iterations of the inter-class separability loss and intra-class compactness loss. Extensive evaluations demonstrate that our approach significantly outperforms state-of-the-art methods on the Visual Genome dataset.

Published

2024-03-24

How to Cite

Chen, L., Song, Y., Cai, Y. ., Lu, J., Li, Y., Xie, Y., Wang, C., & He, G. (2024). Multi-Prototype Space Learning for Commonsense-Based Scene Graph Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1129-1137. https://doi.org/10.1609/aaai.v38i2.27874

Issue

Section

AAAI Technical Track on Computer Vision I