Human interaction understanding with joint graph decomposition and node labeling
IEEE Transactions on Image Processing, 2021•ieeexplore.ieee.org
The task of human interaction understanding involves both recognizing the action of each
individual in the scene and decoding the interaction relationship among people, which is
useful to a series of vision applications such as camera surveillance, video-based sports
analysis and event retrieval. This paper divides the task into two problems including
grouping people into clusters and assigning labels to each of them, and presents an
approach to solving these problems in a joint manner. Our method does not assume the …
individual in the scene and decoding the interaction relationship among people, which is
useful to a series of vision applications such as camera surveillance, video-based sports
analysis and event retrieval. This paper divides the task into two problems including
grouping people into clusters and assigning labels to each of them, and presents an
approach to solving these problems in a joint manner. Our method does not assume the …
The task of human interaction understanding involves both recognizing the action of each individual in the scene and decoding the interaction relationship among people, which is useful to a series of vision applications such as camera surveillance, video-based sports analysis and event retrieval. This paper divides the task into two problems including grouping people into clusters and assigning labels to each of them, and presents an approach to solving these problems in a joint manner. Our method does not assume the number of groups is known beforehand as this will substantially restrict its application. With the observation that the two challenges are highly correlated, the key idea is to model the pairwise interacting relations among people via a complete graph and its associated energy function such that the labeling and grouping problems are translated into the minimization of the energy function. We implement this joint framework by fusing both deep features and rich contextual cues, and learn the fusion parameters from data. An alternating search algorithm is developed in order to efficiently solve the associated inference problem. By combining the grouping and labeling results obtained with our method, we are able to achieve the semantic-level understanding of human interactions. Extensive experiments are performed to qualitatively and quantitatively evaluate the effectiveness of our approach, which outperforms state-of-the-art methods on several important benchmarks. An ablation study is also performed to verify the effectiveness of different modules within our approach.
ieeexplore.ieee.org
Showing the best result for this search. See all results