Weak-To-Strong-Generalization: Eliciting Strong Capabilities with Weak Supervision

[TOC] Title: Weak-To-Strong-Generalization: Eliciting Strong Capabilities with Weak Supervision Author: Collin Burns et. al. Publish Year: 14 Dec 2023 Review Date: Mon, Jan 29, 2024 url: arXiv:2312.09390v1 Summary of paper Motivation Superalignment: OPENAI believe that RLHF is essentially use human to supervise the model (RM is trained by human annotation). One day when superhuman models come out, human are no longer to annotate the good / bad of the model’s output....

<span title='2024-01-29 15:32:21 +1100 AEDT'>January 29, 2024</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;377 words&nbsp;·&nbsp;Sukai Huang