Multiple cross-attention for video-subtitle moment retrieval.

scholar.google.com › citations

… cross-attention for video-subtitle moment retrieval
Fu · Cited by 3

Multiple cross-attention for video-subtitle moment retrieval

A multiple cross-attention network is proposed to facilitate query-video and query-subtitle matching. Performance improvement in video-subtitle moment ...

Multiple cross-attention for video-subtitle moment retrieval

dl.acm.org › abs › j.patrec.2022.02.016

Apr 1, 2022 · A multiple cross-attention network is proposed to facilitate query-video and query-subtitle matching. •. Performance improvement ...

[PDF] Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

arxiv.org › pdf

MMCDA contains three parts: the shared feature encoders to extract the video and query features in each domain, the cross-modal attention to capture the video- ...

Cross-Modal Interaction Network for Video Moment Retrieval

www.worldscientific.com › doi

The video moment retrieval task aims to fetch a target moment in an untrimmed video, which best matches the semantics of a sentence query.

Modal-Enhanced Semantic Modeling for Video Moment Retrieval - arXiv

arxiv.org › html

Video Moment Retrieval (VMR) aims to retrieve temporal segments in untrimmed videos corresponding to a given language query by constructing cross-modal ...

[PDF] Boosting Video Moment Retrieval via Adapter-Based Multimodal Modeling

openaccess.thecvf.com › papers

through a cross-transformer encoder with two layers. The cross-attention between video and text embeddings can be formulated as: CrossAtt(Qv,Kt,Vt) ...

[PDF] A Unified Video Comprehension Framework for Moment Retrieval ...

openaccess.thecvf.com › papers

Illustration of the intrinsic characteristics of Moment. Retrieval and Highlight Detection. We visualize the attention map of the same video under two tasks.

Cross-Modal Video Moment Retrieval with Spatial and Language ...

www.researchgate.net › publication › 33...

Therefore, our proposed two attention sub-networks can recognize the most relevant objects and interactions in the video, and simultaneously highlight the ...

[PDF] Attentive Moment Retrieval in Videos - Liqiang Nie

liqiangnie.github.io › paper › p15-liu

In particular, we design a memory attention mechanism to emphasize the visual features mentioned in the query and simultaneously incorporate their context. In ...

Maskable Retentive Network for Video Moment Retrieval

dl.acm.org › doi

Oct 28, 2024 · Our approach introduces a new retention mechanism into the multimodal Transformer architecture, incorporating modality-specific attention modes.

Scholarly articles for Multiple cross-attention for video-subtitle moment retrieval.

Multiple cross-attention for video-subtitle moment retrieval

Multiple cross-attention for video-subtitle moment retrieval

[PDF] Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

Cross-Modal Interaction Network for Video Moment Retrieval

Modal-Enhanced Semantic Modeling for Video Moment Retrieval - arXiv

[PDF] Boosting Video Moment Retrieval via Adapter-Based Multimodal Modeling

[PDF] A Unified Video Comprehension Framework for Moment Retrieval ...

Cross-Modal Video Moment Retrieval with Spatial and Language ...

[PDF] Attentive Moment Retrieval in Videos - Liqiang Nie

Maskable Retentive Network for Video Moment Retrieval