Multi-Prototype Space Learning for Commonsense-Based Scene Graph Generation

Lianggangxu Chen; Youqi Song; Yiqing  Cai; Jiale Lu; Yang Li; Yuan Xie; Changbo Wang; Gaoqi He

doi:10.1609/aaai.v38i2.27874

Authors

Lianggangxu Chen East China Normal University
Youqi Song East China Normal University
Yiqing Cai East China Normal University
Jiale Lu East China Normal University
Yang Li East China Normal University
Yuan Xie East China Normal University
Changbo Wang East China Normal University
Gaoqi He East China Normal University Chongqing Key Laboratory of Precision Optics

DOI:

https://doi.org/10.1609/aaai.v38i2.27874

Keywords:

CV: Scene Analysis & Understanding, CV: Language and Vision

Abstract

In the domain of scene graph generation, modeling commonsense as a single-prototype representation has been typically employed to facilitate the recognition of infrequent predicates. However, a fundamental challenge lies in the large intra-class variations of the visual appearance of predicates, resulting in subclasses within a predicate class. Such a challenge typically leads to the problem of misclassifying diverse predicates due to the rough predicate space clustering. In this paper, inspired by cognitive science, we maintain multi-prototype representations for each predicate class, which can accurately find the multiple class centers of the predicate space. Technically, we propose a novel multi-prototype learning framework consisting of three main steps: prototype-predicate matching, prototype updating, and prototype space optimization. We first design a triple-level optimal transport to match each predicate feature within the same class to a specific prototype. In addition, the prototypes are updated using momentum updating to find the class centers according to the matching results. Finally, we enhance the inter-class separability of the prototype space through iterations of the inter-class separability loss and intra-class compactness loss. Extensive evaluations demonstrate that our approach significantly outperforms state-of-the-art methods on the Visual Genome dataset.

Multi-Prototype Space Learning for Commonsense-Based Scene Graph Generation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription