Weak-To-Strong-Generalization: Eliciting Strong Capabilities with Weak Supervision

[TOC] Title: Weak-To-Strong-Generalization: Eliciting Strong Capabilities with Weak Supervision Author: Collin Burns et. al. Publish Year: 14 Dec 2023 Review Date: Mon, Jan 29, 2024 url: arXiv:2312.09390v1 Summary of paper Motivation Superalignment: OPENAI believe that RLHF is essentially use human to supervise the model (RM is trained by human annotation). One day when superhuman models come out, human are no longer to annotate the good / bad of the model’s output. e.g., superhuman model generate a 1M lines complex code and human cannot review it. How to do the alignment in for this case? thus the research question is can we use a weak teacher model to improve strong student model Contribution they used weak model to generate annotations and fine tune the strong model, they empirically did a lot of experiments note: although they use the term teacher and student, the alignment task is not about “teaching”, alignment is to elicit learnt stuffs from strong foundation model (something like finetuning), rather than asking strong model to follow weak teacher model. Some key terms Bootstrapping ...

January 29, 2024 · 2 min · 377 words · Sukai Huang