[TOC]

  1. Title: Secrets of RLHF in Large Language Models Part II: Reward Modelling
  2. Author: Binghai Wang et. al.
  3. Publish Year: 12 Jan 2024
  4. Review Date: Wed, Jan 24, 2024
  5. url: arXiv:2401.06080v2

Summary of paper

Motivation

Some key terms

noisy data

preference strength

flip、margin、soft label for contrastive learning

image-20240125001958563