[TOC]
- Title: On the Integration of Self-Attention and Convolution
- Author: Xuran Pan et. al.
- Publish Year: 2022 IEEE
- Review Date: Thu, Apr 25, 2024
- url: https://arxiv.org/abs/2111.14556
Summary of paper
Motivation
- there exists a strong underlying relation between convolution and self-attention.
Related work
Convolution NN
- it uses convolution kernels to extract local features, have become the most powerful and conventional technique for various vision tasks
Self-attention only
- Recently, vision transformer shows that given enough data, we can treat an image as a sequence of 256 tokens and leverage Transformer models to achieve competitive results in image recognition.
Attention enhanced convolution
- Multiple previously proposed attention mechanisms over images suggest it can overcome the limitation of locality for convolutional networks.
Convolution enhanced Attention
- Among which exist researchers focusing on complementing transformer models with convolution operations to introduce additional inductive biases.
- Add convolutions at the early stage to achieve stabler training.