Damai Dai Deepseekmoe 2024

[TOC] Title: DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture of Experts Language Models Author: Damai Dai et. al. Publish Year: 11 Jan 2024 Review Date: Sat, Jun 22, 2024 url: https://arxiv.org/pdf/2401.06066 Summary of paper Motivation conventional MoE architecture like GShard, which avtivate top-k out of N experts, face challenges in ensuring expert specialization, i.e., each expert acquires non-overlapping and focused knowledge, in response, we propose DeepSeekMoE architecture towards ultimate expert specialization Contribution segmenting expert into mN ones and activating mK from them isolating K_s, experts as shared ones, aiming at capturing common knowledge and mitigating redundancy in routed experts Some key terms MoE architecture ...

June 22, 2024 · 3 min · 582 words · Sukai Huang

Jessy Lin Learning to Model the World With Language 2024

[TOC] Title: Learning to Model the World With Language 2024 Author: Jessy Lin et. al. Publish Year: ICML 2024 Review Date: Fri, Jun 21, 2024 url: https://arxiv.org/abs/2308.01399 Summary of paper Motivation in this work, we propose that agents can ground diverse kinds of language by using it to predict the future in contrast to directly predicting what to do with a language-conditioned policy, Dynalang decouples learning to model the world with language (supervised learning with prediction objectives) from learning to act given that model (RL with task rewards) Future prediction provides a rich grounding signal for learning what language utterances mean, which in turn equip the agent with a richer understanding of the world to solve complex tasks. Contribution investigate whether learning language-conditioned world models enable agents to scale to more diverse language use, compared to language-conditioned policies. Some key terms related work ...

June 21, 2024 · 2 min · 381 words · Sukai Huang

Jiuzhou Reward Engineering for Generating Semi Structured Explan 2023

[TOC] Title: Reward Engineering for Generating Semi-Structured Explanation Author: Jiuzhou Han et. al. Publish Year: EACL2024 Review Date: Thu, Jun 20, 2024 url: https://github.com/Jiuzhouh/Reward-Engineering-for-Generating-SEG Summary of paper Motivation Contribution the objective is to equip moderately-sized LMs with the ability to not only provide answers but also generate structured explanations Some key terms Intro the author talked about some background on Cui et al. incorporate a generative pre-training mechanism over synthetic graphs by aligning inputs pairs of text-graph to improve the model’s capability in generating semi-structured explanation. ...

June 20, 2024 · 1 min · 162 words · Sukai Huang

Jia Li Structured Cot Prompting for Code Generation 2023

[TOC] Title: Structured Chaint of Thought Prompting for Code Generation 2023 Author: Jia Li et. al. Publish Year: 7 Sep 2023 Review Date: Wed, Feb 28, 2024 url: https://arxiv.org/pdf/2305.06599.pdf Summary of paper Contribution The paper introduces Structured CoTs (SCoTs) and a novel prompting technique called SCoT prompting for improving code generation with Large Language Models (LLMs) like ChatGPT and Codex. Unlike the previous Chain-of-Thought (CoT) prompting, which focuses on natural language reasoning steps, SCoT prompting leverages the structural information inherent in source code. By incorporating program structures (sequence, branch, and loop structures) into intermediate reasoning steps (SCoTs), LLMs are guided to generate more structured and accurate code. Evaluation on three benchmarks demonstrates that SCoT prompting outperforms CoT prompting by up to 13.79% in Pass@1, is preferred by human developers in terms of program quality, and exhibits robustness to various examples, leading to substantial improvements in code generation performance. ...

February 28, 2024 · 2 min · 381 words · Sukai Huang

Stephanie Teaching Models to Express Their Uncertainty in Words 2022

[TOC] Title: Teaching Models to Express Their Uncertainty in Words Author: Stephanie Lin et. al. Publish Year: 13 Jun 2022 Review Date: Wed, Feb 28, 2024 url: https://arxiv.org/pdf/2205.14334.pdf Summary of paper Motivation The study demonstrates that a GPT-3 model can articulate uncertainty about its answers in natural language without relying on model logits. It generates both an answer and a confidence level (e.g., “90% confidence” or “high confidence”), which map to well-calibrated probabilities. The model maintains moderate calibration even under distribution shift and shows sensitivity to uncertainty in its answers rather than mimicking human examples. ...

February 28, 2024 · 2 min · 327 words · Sukai Huang

Ziwei Xu Hallucination Is Inevitable an Innate Limitation Llm 2024

[TOC] Title: Hallucination Is Inevitable an Innate Limitation Llm 2024 Author: Ziwei Xu et. al. Publish Year: 22 Jan 2024 Review Date: Sun, Jan 28, 2024 url: arXiv:2401.11817v1 Summary of paper Contribution The paper formalizes the issue of hallucination in large language models (LLMs) and argues that it is impossible to completely eliminate hallucination. It defines hallucination as inconsistencies between a computable LLM and a computable ground truth function. By drawing from learning theory, the paper demonstrates that LLMs cannot learn all computable functions, thus always prone to hallucination. The formal world is deemed a simplified representation of the real world, implying that hallucination is inevitable for real-world LLMs. Additionally, for real-world LLMs with provable time complexity constraints, the paper identifies tasks prone to hallucination and provides empirical validation. Finally, the paper evaluates existing hallucination mitigators using the formal world framework and discusses practical implications for the safe deployment of LLMs. ...

January 28, 2024 · 3 min · 543 words · Sukai Huang