Kumaraswamy Wavelet for Heterophilic Scene Graph Generation

Authors

  • Lianggangxu Chen East China Normal University
  • Youqi Song East China Normal University
  • Shaohui Lin East China Normal University
  • Changbo Wang East China Normal University
  • Gaoqi He East China Normal University Chongqing Key Laboratory of Precision Optics

DOI:

https://doi.org/10.1609/aaai.v38i2.27875

Keywords:

CV: Scene Analysis & Understanding, CV: Language and Vision

Abstract

Graph neural networks (GNNs) has demonstrated its capabilities in the field of scene graph generation (SGG) by updating node representations from neighboring nodes. Actually it can be viewed as a form of low-pass filter in the spatial domain, which smooths node feature representation and retains commonalities among nodes. However, spatial GNNs does not work well in the case of heterophilic SGG in which fine-grained predicates are always connected to a large number of coarse-grained predicates. Blind smoothing undermines the discriminative information of the fine-grained predicates, resulting in failure to predict them accurately. To address the heterophily, our key idea is to design tailored filters by wavelet transform from the spectral domain. First, we prove rigorously that when the heterophily on the scene graph increases, the spectral energy gradually shifts towards the high-frequency part. Inspired by this observation, we subsequently propose the Kumaraswamy Wavelet Graph Neural Network (KWGNN). KWGNN leverages complementary multi-group Kumaraswamy wavelets to cover all frequency bands. Finally, KWGNN adaptively generates band-pass filters and then integrates the filtering results to better accommodate varying levels of smoothness on the graph. Comprehensive experiments on the Visual Genome and Open Images datasets show that our method achieves state-of-the-art performance.

Published

2024-03-24

How to Cite

Chen, L., Song, Y., Lin, S., Wang, C., & He, G. (2024). Kumaraswamy Wavelet for Heterophilic Scene Graph Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1138-1146. https://doi.org/10.1609/aaai.v38i2.27875

Issue

Section

AAAI Technical Track on Computer Vision I