17 December 2020 Multi-scale temporal feature-based dense convolutional network for action recognition
Xiaoqiang Li, Miao Xie, Yin Zhang, Jide Li
Author Affiliations +
Abstract

We propose a network structure for action recognition that is capable of extracting multi-scale temporal representations of actions. The key of the network is to combine a multi-scale temporal pooling module with a dense connection module, called multi-scale temporal pooling dense convolutional network (MTPDNet). The multi-scale temporal pooling module consists of multiple temporal scale levels. At each scale level, video frames are divided into several segments and a pooling operation is then performed on each segment to get temporal pooling information. The number of segments is set differently at different time scale levels, aiming to obtain multi-scale temporal pooling information. In addition, at each scale level, we adopt a redesigned dense connection module to learn motion representations from temporal pooling information. Finally, predictions are independently made at each scale level and the class scores of each scale level are fused to get the final prediction scores. Experimental results on two standard datasets, UCF101 and HMDB51, demonstrate that MTPDNet gets comparable or even better results among leading methods, which proves the effectiveness of the strategy combining multi-scale temporal pooling and dense connection.

© 2020 SPIE and IS&T 1017-9909/2020/$28.00© 2020 SPIE and IS&T
Xiaoqiang Li, Miao Xie, Yin Zhang, and Jide Li "Multi-scale temporal feature-based dense convolutional network for action recognition," Journal of Electronic Imaging 29(6), 063013 (17 December 2020). https://doi.org/10.1117/1.JEI.29.6.063013
Received: 28 June 2020; Accepted: 2 December 2020; Published: 17 December 2020
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication and 1 patent.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

RGB color model

Optical flow

Video acceleration

Convolution

Network architectures

3D modeling

Back to Top