[TOC]

  1. Title: MPNet: Masked and Permuted Pre-training for Language Understanding
  2. Author: Kaitao Song et. al.
  3. Publish Year: 2020
  4. Review Date: Thu, Aug 25, 2022

Summary of paper

Motivation

image-20220825123059997

Contribution

Some key terms

Comparison between BERT and XLNet

image-20220826212237565

image-20220826212754242

Limitation (Output dependency) in Autoencoding (BERT)

MLM assumes the masked token are independent with each other and predicts them separately, which is not sufficient to model the complicated context dependency in natural language

In contrast, PLM factorizes the predicted tokens with the product rule in any permuted order, which avoids the independence assumption in MLM and can better model dependency among predicted tokens.

Limitation (position discrepancy) in random shuffling autoregression (XLNet)

Since in downstream tasks, a model can see the full input sentence to ensure the consistency between the pre-training and fine-tuning, the model should see as much as information as possible of the full sentence during pre-training. In MLM, their position information are available to the model to (partially) represent the information of full sentence (how many tokens in a sentence i.e., the sentence length).

Proposed method

Architecture

image-20220826231656835

Two-stream diagram (not clear actually)

image-20220826215806516

image-20220826214721959

Results

image-20220826214557350

Good things about the paper (one paragraph)

quite good and combine all the stuffs together

Potential future work

We may use this model to get temporal information for video input