1. Title: On the Integration of Self-Attention and Convolution
  2. Author: Xuran Pan et. al.
  3. Publish Year: 2022 IEEE
  4. Review Date: Thu, Apr 25, 2024
  5. url: https://arxiv.org/abs/2111.14556

Summary of paper


  • there exists a strong underlying relation between convolution and self-attention.


Convolution NN

  • it uses convolution kernels to extract local features, have become the most powerful and conventional technique for various vision tasks

Self-attention only

  • Recently, vision transformer shows that given enough data, we can treat an image as a sequence of 256 tokens and leverage Transformer models to achieve competitive results in image recognition.

Attention enhanced convolution

  • Multiple previously proposed attention mechanisms over images suggest it can overcome the limitation of locality for convolutional networks.

Convolution enhanced Attention

  • Among which exist researchers focusing on complementing transformer models with convolution operations to introduce additional inductive biases.
    • Add convolutions at the early stage to achieve stabler training.