[TOC]

  1. Title: DiffCSE: Difference Based Contrastive Learning for Sentence Embeddings
  2. Author: Yung-Sung Chuang et. al.
  3. Publish Year: 21 Apr 2022
  4. Review Date: Sat, Aug 27, 2022

Summary of paper

Motivation

  • DiffCSE learns sentences that are sensitive to the difference between the original sentence and and edited sentence.

Contribution

  • we propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings

Some key terms

DiffCSE

  • this is an unsupervsied contrastive learning framework rather than model architecture

Contrastive learning in single modality data

  1. use multiple augmentations on a single datum to construct positive pairs whose representations are trained to be more similar to one another than negative pairs
    1. e.g., random cropping, color jitter, rotation for vision models
    2. dropout for NLP
  2. Contrastive learning encourages representations to be insensitive to these transformations
    1. i.e. the encoder is trained to be invariant to a set of manually chosen transformations

Limitation of existing contrastive learning data augmentation for language data

  • Gao et. al. find that constructing positive pairs via a simple dropout-based augmentation works much better than more complex augmentations such as word deletions or replacements based on synonyms or masked language models.

Hindsight behind direct augmentation in language

  • while the training objective in contrastive learning encourages representations to be invariant to augmentation transformations, direct augmentations on the input (e.g., deletion, replacement) often change the meaning of the sentence.

Methodology

we propose to learn sentence representations that are aware of, but not necessarily invariant to, such direct surface-level augmentations

  • we operationalise equivariant contrastive learning on sentences by using dropout-based augmentation as the insensitive transformation
  • and MLM-based word replacement as the sensitive transformation

image-20220827170609510

SimCSE

image-20220827170624675

image-20220827170637298

Meaning of $x_i^+ = x_i$

this means that the sentence that is going to augmented is the original sentence.

They output $h_i^+$ just by using the random dropout layer.

Potential future work

1
model_bert_trans = DiffCSE("voidism/diffcse-bert-base-uncased-trans")