Posts

Daniel Hierarchies of Reward Machines 2023

[TOC] Title: Hierarchies of Reward Machines Author: Daniel Furelos-Blanco et. al. Publish Year: 4 Jun 2023 Review Date: Fri, Apr 12, 2024 url: https://arxiv.org/abs/2205.15752 Summary of paper Motivation Finite state machine are a simple yet powerful formalism for abstractly representing temporal tasks in a structured manner. Contribution The work introduces Hierarchies of Reinforcement Models (HRMs) to enhance the abstraction power of existing models. Key contributions include: HRM Abstraction Power: HRMs allow for the creation of hierarchies of Reinforcement Models (RMs), enabling constituent RMs to call other RMs. It’s proven that any HRM can be converted into an equivalent flat HRM with identical behavior. However, the equivalent flat HRM can have significantly more states and edges, especially under specific conditions. ...

Shanchuan Efficient N Robust Exploration Through Discriminative Ir 2023

[TOC] Title: DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards Author: Shanchuan Wan et. al. Publish Year: 18 May 2023 Review Date: Fri, Apr 12, 2024 url: https://arxiv.org/abs/2304.10770 Summary of paper Motivation Recent studies have shown the effectiveness of encouraging exploration with intrinsic rewards estimated from novelties in observations However, there is a gap between the novelty of an observation and an exploration, as both the stochasticity in the environment and agent’s behaviour may affect the observation. Contribution we propose DEIR, a novel method in which we theoretically derive an intrinsic reward with a conditional mutual information term that principally scales with the novelty contributed by agent explorations, and then implement the reward with a discriminative forward model. want to design a novel intrinsic reward design that considers not only the observed novelty but also the effective contribution brought by the agent. Some key terms internal rewards ...

Discover Hierarchical Achieve in Rl via Cl 2023

[TOC] Title: Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning Author: Seungyong Moon et. al. Publish Year: 2 Nov 2023 Review Date: Tue, Apr 2, 2024 url: https://arxiv.org/abs/2307.03486 Summary of paper Contribution PPO agents demonstrate some ability to predict future achievements. Leveraging this observation, a novel contrastive learning method called achievement distillation is introduced, enhancing the agent’s predictive abilities. This approach excels at discovering hierarchical achievements, Some key terms Model based and explicit module in previous studies are not that good ...

Jia Li Structured Cot Prompting for Code Generation 2023

[TOC] Title: Structured Chaint of Thought Prompting for Code Generation 2023 Author: Jia Li et. al. Publish Year: 7 Sep 2023 Review Date: Wed, Feb 28, 2024 url: https://arxiv.org/pdf/2305.06599.pdf Summary of paper Contribution The paper introduces Structured CoTs (SCoTs) and a novel prompting technique called SCoT prompting for improving code generation with Large Language Models (LLMs) like ChatGPT and Codex. Unlike the previous Chain-of-Thought (CoT) prompting, which focuses on natural language reasoning steps, SCoT prompting leverages the structural information inherent in source code. By incorporating program structures (sequence, branch, and loop structures) into intermediate reasoning steps (SCoTs), LLMs are guided to generate more structured and accurate code. Evaluation on three benchmarks demonstrates that SCoT prompting outperforms CoT prompting by up to 13.79% in Pass@1, is preferred by human developers in terms of program quality, and exhibits robustness to various examples, leading to substantial improvements in code generation performance. ...

Stephanie Teaching Models to Express Their Uncertainty in Words 2022

[TOC] Title: Teaching Models to Express Their Uncertainty in Words Author: Stephanie Lin et. al. Publish Year: 13 Jun 2022 Review Date: Wed, Feb 28, 2024 url: https://arxiv.org/pdf/2205.14334.pdf Summary of paper Motivation The study demonstrates that a GPT-3 model can articulate uncertainty about its answers in natural language without relying on model logits. It generates both an answer and a confidence level (e.g., “90% confidence” or “high confidence”), which map to well-calibrated probabilities. The model maintains moderate calibration even under distribution shift and shows sensitivity to uncertainty in its answers rather than mimicking human examples. ...

Gwenyth Estimating Confidence of Llm by Prompt Agreement 2023

[TOC] Title: Strength in Numbers: Estimating Confidence of Large Language Models by Prompt Agreement Author: Gwenyth Portillo Wightman et. al. Publish Year: TrustNLP 2023 Review Date: Tue, Feb 27, 2024 url: https://aclanthology.org/2023.trustnlp-1.28.pdf Summary of paper Motivation while traditional classifiers produce scores for each label, language models instead produce scores for the generation which may not be well calibrated. the authors proposed a method that involves comparing generated outputs across diverse prompts to create confidence score. By utilizing multiple prompts, they aim to obtain more precise confidence estimates, using response diversity as a measure of confidence. Contribution The results show that this method produces more calibrated confidence estimates compared to the log probability of the answer to a single prompt, which could be valuable for users relying on prediction confidence in larger systems or decision-making processes. in one sentence: try multiple times, get the mean, mean is more robust and consistent. Some key terms calibrated confidence score ...

Sudhir Agarwal Translate Infer Compile for Accurate Text to Plan 2024

[TOC] Title: TIC: Translate-Infer-Compile for accurate “text to plan” using LLMs and logical intermediate representations Author: Sudhir Agarwal et. al. Publish Year: Jan 2024 Review Date: Sat, Feb 17, 2024 url: https://arxiv.org/pdf/2402.06608.pdf Summary of paper Motivation using an LLM to generate the task PDDL from a natural language planning task descriptions is challenging. One of the primary reasons for failure is that the LLM often make errors generating information that must abide by the constraints specified in the domain knowledge or the task descriptions ...

Philip Cohen Intention Is Choice With Commitment 1990

[TOC] Title: Intention Is Choice With Commitment Author: Philip Cohen et. al. Publish Year: 1990 Review Date: Tue, Jan 30, 2024 url: https://www.sciencedirect.com/science/article/pii/0004370290900555 Summary of paper Contribution This paper delves into the principles governing the rational balance between an agent’s beliefs, goals, actions, and intentions, offering valuable insights for both artificial agents and a theory of human action. It focuses on clarifying when an agent can abandon their goals and how strongly they are committed to these goals. The formalism used in the paper captures several crucial aspects of intentions, including an analysis of Bratman’s three characteristic functional roles of intentions and how agents can avoid intending all the unintended consequences of their actions. Furthermore, the paper discusses how intentions can be shaped based on an agent’s relevant beliefs and other intentions or goals. It also introduces a preliminary concept of interpersonal commitments by relating one agent’s intentions to their beliefs about another agent’s intentions or beliefs. ...

Christian Muise Planning for Goal Oriented Dialgue Systems 2019

[TOC] Title: Christian Muise Planning for Goal Oriented Dialgue Systems 2019 Author: Publish Year: Review Date: Tue, Jan 30, 2024 url: arXiv:1910.08137v1 Summary of paper Motivation there is increasing demand for dialogue agents capable of handling specific tasks and interactions in a business context Contribution the author propose a new approach that eliminates the need for manual specification of dialogue trees, a common practice in existing systems. they suggest using a declarative representation of the dialogue agent, which can be processed by advanced planning tech (tree -> planning) The paper introduces a paradigm shift in specifying complex dialogue agents by recognizing that many aspects of these agents share similarities or identical underlying processes. Instead of manually creating and maintaining entire dialogue graphs, the authors propose a declarative approach where behavior is specified compactly, and the complete implicit graphs are generated from this specification. Some key terms limitation of end to end trained machine learning architectures ...

Vishal Pallagani Llm N Planning Survey 2024

[TOC] Title: “On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS).” Author: Pallagani, Vishal, et al. Publish Year: arXiv preprint arXiv:2401.02500 (2024). Review Date: Mon, Jan 29, 2024 url: Summary of paper Contribution The paper provides a comprehensive review of 126 papers focusing on the integration of Large Language Models (LLMs) within Automated Planning and Scheduling, a growing area in Artificial Intelligence (AI). It identifies eight categories where LLMs are applied in addressing various aspects of planning problems: ...

Ishika Singh Progprompt Program Generation for Robot Task Planning 2023

[TOC] Title: ProgPrompt: program generation for situated robot task planning using large language models Author: Ishika Singh et. al. Publish Year: 28 August 2023 Review Date: Mon, Jan 29, 2024 url: https://progprompt.github.io/ Summary of paper Motivation Classical Task planning requires myriad domain knowledge large serach space, hard toscale domain specific require concrete goal specification Planning with LLMs LLM is not situated in the scene Plan steps using unavailable actions and objects Text-to-robot action mapping may not be trivial combinatorial admissible action space. Contribution present a programmatic LLM prompt structure that enables plan generation function across situated environments, robot capabilities and tasks ...

Avichai Levy Understanding Natural Language in Context 2023

[TOC] Title: Understanding Natural Language in Context Author: Avichai Levy et. al. Publish Year: ICAPS 2023 Review Date: Mon, Jan 29, 2024 url: https://ojs.aaai.org/index.php/ICAPS/article/view/27248 Summary of paper Contribution The paper discusses the increasing prevalence of applications with natural language interfaces, such as chatbots and personal assistants like Alexa, Google Assistant, Siri, and Cortana. While current dialogue systems mainly involve static robots, the challenge intensifies with cognitive robots capable of movement and object manipulation in home environments. The focus is on cognitive robots equipped with knowledge-based models of the world, enabling reasoning and planning. The paper proposes an approach to translate natural language directives into the robot’s formalism, leveraging state-of-the-art large language models, planning tools, and the robot’s knowledge of the world and its own model. This approach enhances the interpretation of directives in natural language, facilitating the completion of complex household tasks. ...

Mingyu Jin the Impact of Reasoning Steps Length on Llm 2024

[TOC] Title: The Impact of Reasoning Steps Length on Large Language Models Author: Mingyu Jin et. al. Publish Year: 20 Jan 2024 Review Date: Mon, Jan 29, 2024 url: arXiv:2401.04925v3 Summary of paper Contribution The study investigates the impact of the length of reasoning steps in prompts on the reasoning abilities of Large Language Models (LLMs), focusing on Chain of Thought (CoT). Here are the key findings: Effect of Reasoning Step Length: ...

Weak-To-Strong-Generalization: Eliciting Strong Capabilities with Weak Supervision

[TOC] Title: Weak-To-Strong-Generalization: Eliciting Strong Capabilities with Weak Supervision Author: Collin Burns et. al. Publish Year: 14 Dec 2023 Review Date: Mon, Jan 29, 2024 url: arXiv:2312.09390v1 Summary of paper Motivation Superalignment: OPENAI believe that RLHF is essentially use human to supervise the model (RM is trained by human annotation). One day when superhuman models come out, human are no longer to annotate the good / bad of the model’s output. e.g., superhuman model generate a 1M lines complex code and human cannot review it. How to do the alignment in for this case? thus the research question is can we use a weak teacher model to improve strong student model Contribution they used weak model to generate annotations and fine tune the strong model, they empirically did a lot of experiments note: although they use the term teacher and student, the alignment task is not about “teaching”, alignment is to elicit learnt stuffs from strong foundation model (something like finetuning), rather than asking strong model to follow weak teacher model. Some key terms Bootstrapping ...

Ziwei Xu Hallucination Is Inevitable an Innate Limitation Llm 2024

[TOC] Title: Hallucination Is Inevitable an Innate Limitation Llm 2024 Author: Ziwei Xu et. al. Publish Year: 22 Jan 2024 Review Date: Sun, Jan 28, 2024 url: arXiv:2401.11817v1 Summary of paper Contribution The paper formalizes the issue of hallucination in large language models (LLMs) and argues that it is impossible to completely eliminate hallucination. It defines hallucination as inconsistencies between a computable LLM and a computable ground truth function. By drawing from learning theory, the paper demonstrates that LLMs cannot learn all computable functions, thus always prone to hallucination. The formal world is deemed a simplified representation of the real world, implying that hallucination is inevitable for real-world LLMs. Additionally, for real-world LLMs with provable time complexity constraints, the paper identifies tasks prone to hallucination and provides empirical validation. Finally, the paper evaluates existing hallucination mitigators using the formal world framework and discusses practical implications for the safe deployment of LLMs. ...

Zhiwei He Improving Machine Translation Use Quality Estimation as a Reward Model 2024

[TOC] Title: Improving Machine Translation Use Quality Estimation as a Reward Model 2024 Author: Zhiwei He et. al. Publish Year: 23 Jan 2024 Review Date: Sun, Jan 28, 2024 url: arXiv:2401.12873v1 Summary of paper Contribution In this research, the authors explore using Quality Estimation (QE) models as a basis for reward systems in translation quality improvement through human feedback. They note that while QE has shown promise aligning with human evaluations, there’s a risk of overoptimization where translations receive high rewards despite declining quality. The study addresses this by introducing heuristic rules to identify and penalize incorrect translations, resulting in improved training outcomes. Experimental results demonstrate consistent enhancements across various setups, validated by human preference studies. Additionally, the approach proves highly data-efficient, outperforming systems relying on larger parallel corpora with only a small amount of monolingual data. ...

Krishan Rana Sayplan Grounding Llm for Scalable Task Planning 2023

[TOC] Title: SayPlan: Grounding Large Language Models using 3D Scene for for Scalable Task Planning Author: Krishan Rana Publish Year: CoRL 2023 Review Date: Sun, Jan 28, 2024 url: https://arxiv.org/abs/2307.06135 Summary of paper Motivation this is a pipeline introduction paper Contribution Hierarchical Exploration: SayPlan leverages the hierarchical structure of 3DSGs to enable LLMs to conduct semantic searches for task-relevant subgraphs from a condensed representation of the full graph. Path Planning Integration: It integrates a classical path planner to reduce the planning horizon for the LLM, thus improving efficiency. Iterative Replanning Pipeline: An iterative replanning pipeline refines initial plans by incorporating feedback from a scene graph simulator, correcting infeasible actions and preventing planning failures. Some key terms ...

Luigi Bonassi Planning With Qualitative Constraints Pddl3 2022

[TOC] Title: Planning With Qualitative Constraints Pddl3 2022 Author: Luigi Bonassi et. al. Publish Year: Review Date: Sun, Jan 28, 2024 url: https://www.ijcai.org/proceedings/2022/0639.pdf Summary of paper The paper introduces a formalism to express trajectory constraints over actions in plans, complementing the state-trajectory constraints of PDDL3. This new formalism retains PDDL3’s temporal modal operators and adds two modalities. The authors then explore compilation-based methods for dealing with action-trajectory constraints in propositional planning, proposing a new, simple, and effective method. Experimental results demonstrate the utility of action-trajectory constraints for expressing control knowledge, showing significant performance improvements in classical planners when leveraging knowledge expressed through action constraints. Conversely, the same knowledge specified as state constraints and handled by two state-of-the-art systems yields less beneficial results. ...

Parsa Mahmoudieh Zero Shot Reward Specification via Grounded Natural Language 2022

[TOC] Title: Zero Shot Reward Specification via Grounded Natural Language Author: Parsa Mahnoudieh et. al. Publish Year: PMLR 2022 Review Date: Sun, Jan 28, 2024 url: Summary of paper Motivation reward signals in RL are expensive to design and often require access to the true state. common alternatives are usually demonstrations or goal images which can be label intensive on the other hand, text descriptions provide a general low-effect way of communicating. previous work rely on true state or labelled expert demonstration match, this work directly use CLIP to convert the observation to semantic embeddings Contribution Some key terms Difference ...

Allen Z Ren Robots That Ask for Help Uncertainty Alignment 2023

[TOC] Title: Robots That Ask for Help: Uncertainty Alignment for Large Language Model Planners Author: Allen Z. Ren et. al. Publish Year: 4 Sep 2023 Review Date: Fri, Jan 26, 2024 url: arXiv:2307.01928v2 Summary of paper Motivation LLMs have various capabilities but often make overly confident yet incorrect predictions. KNOWNO aims to measure and align this uncertainty, enabling LLM-based planners to recognize their limitations and request assistance when necessary. Contribution built on theory of conformal prediction Some key terms Ambiguity in NL ...

Marta Skreta Replan Robotic Replanning 2024

[TOC] Title: RePlan: Robotic Replanning with Perception and Language Models Author: Marta Skreta et. al. Publish Year: 8 Jan 2024 Review Date: Thu, Jan 25, 2024 url: arXiv:2401.04157v1 Summary of paper Motivation However, the challenge remains that even with syntac- tically correct plans, robots can still fail to achieve their intended goals. This failure can be attributed to imperfect plans proposed by LLMs or to unforeseeable environmental circumstances that hinder the execution of planned subtasks due to erroneous assumptions about the state of objects. Contribution Robotic Replanning with Perception and Language Models that enables real-time replanning capabilities for long-horizon tasks. Some key terms Address the challenge of multi-stage long-horizon tasks ...

Binghai Wang Secrets of Rlhf Reward Modelling 2024

[TOC] Title: Secrets of RLHF in Large Language Models Part II: Reward Modelling Author: Binghai Wang et. al. Publish Year: 12 Jan 2024 Review Date: Wed, Jan 24, 2024 url: arXiv:2401.06080v2 Summary of paper Motivation a crucial technology for aligning language models with human values. Two main issues are tackled: (1) Incorrect and ambiguous preference pairs in the dataset hindering reward model accuracy, and (2) Difficulty in generalization for reward models trained on specific distributions. a method measuring preference strength within the data is proposed, utilizing a voting mechanism of multiple reward models. Novel techniques are introduced to mitigate the impact of incorrect preferences and leverage high-quality preference data. For the second issue, contrastive learning is introduced to enhance the reward models’ ability to distinguish between chosen and rejected responses, improving generalization. Some key terms noisy data ...

Rui Zheng Secrets of Rlhf in Llm Part Ppo 2023

[TOC] Title: Secrets of RLHF in Large Language Models Part1: PPO Author: Rui Zheng et. al. Publish Year: 18 Jul 2023 Review Date: Mon, Jan 22, 2024 url: arXiv:2307.04964v2 Summary of paper Motivation Current approaches involve creating reward models to measure human preferences, using Proximal Policy Optimization (PPO) to improve policy models, and enhancing step-by-step reasoning through process supervision. However, challenges in reward design, interaction with the environment, and agent training, along with the high trial and error costs of LLMs, make it difficult for researchers to develop technically aligned and safe LLMs. Contribution finding that LLMs trained using their algorithm can better understand query meanings and provide responses that resonate with people. A new PPO algorithm called PPO-max is introduced, which incorporates effective implementations and addresses stability issues. Some key terms RLHF limitation ...

Zhiting Hu Language Agent and World Models 2023

[TOC] Title: Zhiting Hu Language Agent and World Models 2023 Author: Publish Year: Review Date: Mon, Jan 22, 2024 url: arXiv:2312.05230v1 Summary of paper Motivation LAW proposes that world and agent models, which encompass beliefs about the world, anticipation of consequences, goals/rewards, and strategic planning, provide a better abstraction of reasoning. In this framework, language models play a crucial role as a backend Some key terms Limitation of Language ...

Gautier Dagan Dynamic Planning With a Llm 2023

[TOC] Title: Dynamic Planning With a LLM Author: Gautier Dagan et. al. Publish Year: 11 Aug 2023 Review Date: Sun, Jan 21, 2024 url: arXiv:2308.06391v1 Summary of paper Motivation Traditional symbolic planners can find optimal solutions quickly but need complete and accurate problem representations. In contrast, LLMs can handle noisy data and uncertainty but struggle with planning tasks. The LLM-DP framework combines LLMs and traditional planners to solve embodied tasks efficiently. Traditional Planner need maximal information Some key terms Hallucination ...