overall architecture

Yuetian_weng an Efficient Spatio Temporal Pyramid Transformer for Action Detection 2022

[TOC] Title: An Efficient Spatio-Temporal Pyramid Transformer for Action Detection Author: Yuetian Weng et. al. Publish Year: Jul 2022 Review Date: Thu, Oct 20, 2022 Summary of paper Motivation the task of action detection aims at deducing both the action category and localisation of the start and end moment for each action instance in a long, untrimmed video. it is non-trivial to design an efficient architecture for action detection due to the prohibitively expensive self-attentions over a long sequence of video clips To this end, they present an efficient hierarchical spatial temporal transformer for action detection Building upon the fact that the early self-attention layer in Transformer still focus on local patterns....

<span title='2022-10-20 19:06:41 +1100 AEDT'>October 20, 2022</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;649 words&nbsp;·&nbsp;Sukai Huang