Yung_sung_chuang Diffcse Difference Based Contrastive Learning for Sentence Embeddings 2022

[TOC]

Summary of paper

DiffCSE learns sentences that are sensitive to the difference between the original sentence and and edited sentence.

we propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings

DiffCSE

this is an unsupervsied contrastive learning framework rather than model architecture

Contrastive learning in single modality data

use multiple augmentations on a single datum to construct positive pairs whose representations are trained to be more similar to one another than negative pairs
1. e.g., random cropping, color jitter, rotation for vision models
2. dropout for NLP
Contrastive learning encourages representations to be insensitive to these transformations
1. i.e. the encoder is trained to be invariant to a set of manually chosen transformations

Limitation of existing contrastive learning data augmentation for language data

Gao et. al. find that constructing positive pairs via a simple dropout-based augmentation works much better than more complex augmentations such as word deletions or replacements based on synonyms or masked language models.

Hindsight behind direct augmentation in language

while the training objective in contrastive learning encourages representations to be invariant to augmentation transformations, direct augmentations on the input (e.g., deletion, replacement) often change the meaning of the sentence.

we propose to learn sentence representations that are aware of, but not necessarily invariant to, such direct surface-level augmentations

we operationalise equivariant contrastive learning on sentences by using dropout-based augmentation as the insensitive transformation
and MLM-based word replacement as the sensitive transformation

SimCSE

Meaning of $x_i^+ = x_i$

this means that the sentence that is going to augmented is the original sentence.

They output $h_i^+$ just by using the random dropout layer.

1

model_bert_trans = DiffCSE("voidism/diffcse-bert-base-uncased-trans")

However, this sentence embedding is not caring about the binary classification
So if we want to know the binary classification Transformer stuffs, please read: Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval