Tianjun_zhang the Wisdom of Hindsight Makes Language Models Better Instruction Followers 2023

[TOC] Title: The Wisdom of Hindsight Makes Language Models Better Instruction Followers Author: Tianjun Zhang et. al. Publish Year: 10 Feb 2023 Review Date: Thu, Mar 2, 2023 url: https://arxiv.org/pdf/2302.05206.pdf Summary of paper Motivation Reinforcement learning with Human Feedback (RLHF) demonstrates impressive performance on the GPT series models. However, the pipeline for reward and value networks Contribution in this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner....

<span title='2023-03-02 19:06:35 +1100 AEDT'>March 2, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;427 words&nbsp;·&nbsp;Sukai Huang