Video Captioning

[TOC] Title: Hierarchical Temporal Aware Video Language Pre Training Author: Qinghao Ye, Fei Huang et. al. Publish Year: 30 Dec 2022 Review Date: Thu, Apr 6, 2023 url: https://arxiv.org/pdf/2212.14546.pdf Summary of paper Motivation most previous methods directly inherit or adapt typical image-language pre-training paradigms to video-language pretraining, thus not fully exploiting the unique characteristic of video, i.e., temporal. Contribution this paper, the two novel pretraining tasks for modeling cross-modal alignment between moments and texts as well as the temporal relations of video-text pairs. specifically, we propose a cross-modal moment exploration task to explore moments in videos, which results in detailed video moment representations besides, the inherent temporal relations are capture by alignment video-text pairs as a whole in different time resolutions with multimodal temporal relation exploration tasks Some key terms limitation of previous work ...