Revision

[TOC] Title: Recent Language Model Technique 2024 Review Date: Thu, Apr 25, 2024 url: https://www.youtube.com/watch?v=kzB23CoZG30 url2: https://www.youtube.com/watch?v=iH-wmtxHunk url3: https://www.youtube.com/watch?v=o68RRGxAtDo LLama 3 key modification: grouped query attention (GQA) key instruction-tuning process: Their approach to post-training is a combination of supervised fine-tuning (SFT), rejection sampling, proximal policy optimization (PPO), and direct preference optimization (DPO). The quality of the prompts that are used in SFT and the preference rankings that are used in PPO and DPO has an outsized influence on the performance of aligned models. fine-tuning tool: torchtune ...