Paper Review

Vishal Pallagani Llm N Planning Survey 2024

[TOC] Title: “On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS).” Author: Pallagani, Vishal, et al. Publish Year: arXiv preprint arXiv:2401.02500 (2024). Review Date: Mon, Jan 29, 2024 url: Summary of paper Contribution The paper provides a comprehensive review of 126 papers focusing on the integration of Large Language Models (LLMs) within Automated Planning and Scheduling, a growing area in Artificial Intelligence (AI). It identifies eight categories where LLMs are applied in addressing various aspects of planning problems: ...

Ishika Singh Progprompt Program Generation for Robot Task Planning 2023

[TOC] Title: ProgPrompt: program generation for situated robot task planning using large language models Author: Ishika Singh et. al. Publish Year: 28 August 2023 Review Date: Mon, Jan 29, 2024 url: https://progprompt.github.io/ Summary of paper Motivation Classical Task planning requires myriad domain knowledge large serach space, hard toscale domain specific require concrete goal specification Planning with LLMs LLM is not situated in the scene Plan steps using unavailable actions and objects Text-to-robot action mapping may not be trivial combinatorial admissible action space. Contribution present a programmatic LLM prompt structure that enables plan generation function across situated environments, robot capabilities and tasks

Avichai Levy Understanding Natural Language in Context 2023

[TOC] Title: Understanding Natural Language in Context Author: Avichai Levy et. al. Publish Year: ICAPS 2023 Review Date: Mon, Jan 29, 2024 url: https://ojs.aaai.org/index.php/ICAPS/article/view/27248 Summary of paper Contribution The paper discusses the increasing prevalence of applications with natural language interfaces, such as chatbots and personal assistants like Alexa, Google Assistant, Siri, and Cortana. While current dialogue systems mainly involve static robots, the challenge intensifies with cognitive robots capable of movement and object manipulation in home environments. The focus is on cognitive robots equipped with knowledge-based models of the world, enabling reasoning and planning. The paper proposes an approach to translate natural language directives into the robot’s formalism, leveraging state-of-the-art large language models, planning tools, and the robot’s knowledge of the world and its own model. This approach enhances the interpretation of directives in natural language, facilitating the completion of complex household tasks. ...

Mingyu Jin the Impact of Reasoning Steps Length on Llm 2024

[TOC] Title: The Impact of Reasoning Steps Length on Large Language Models Author: Mingyu Jin et. al. Publish Year: 20 Jan 2024 Review Date: Mon, Jan 29, 2024 url: arXiv:2401.04925v3 Summary of paper Contribution The study investigates the impact of the length of reasoning steps in prompts on the reasoning abilities of Large Language Models (LLMs), focusing on Chain of Thought (CoT). Here are the key findings: Effect of Reasoning Step Length: ...

Weak-To-Strong-Generalization: Eliciting Strong Capabilities with Weak Supervision

[TOC] Title: Weak-To-Strong-Generalization: Eliciting Strong Capabilities with Weak Supervision Author: Collin Burns et. al. Publish Year: 14 Dec 2023 Review Date: Mon, Jan 29, 2024 url: arXiv:2312.09390v1 Summary of paper Motivation Superalignment: OPENAI believe that RLHF is essentially use human to supervise the model (RM is trained by human annotation). One day when superhuman models come out, human are no longer to annotate the good / bad of the model’s output. e.g., superhuman model generate a 1M lines complex code and human cannot review it. How to do the alignment in for this case? thus the research question is can we use a weak teacher model to improve strong student model Contribution they used weak model to generate annotations and fine tune the strong model, they empirically did a lot of experiments note: although they use the term teacher and student, the alignment task is not about “teaching”, alignment is to elicit learnt stuffs from strong foundation model (something like finetuning), rather than asking strong model to follow weak teacher model. Some key terms Bootstrapping ...

Ziwei Xu Hallucination Is Inevitable an Innate Limitation Llm 2024

[TOC] Title: Hallucination Is Inevitable an Innate Limitation Llm 2024 Author: Ziwei Xu et. al. Publish Year: 22 Jan 2024 Review Date: Sun, Jan 28, 2024 url: arXiv:2401.11817v1 Summary of paper Contribution The paper formalizes the issue of hallucination in large language models (LLMs) and argues that it is impossible to completely eliminate hallucination. It defines hallucination as inconsistencies between a computable LLM and a computable ground truth function. By drawing from learning theory, the paper demonstrates that LLMs cannot learn all computable functions, thus always prone to hallucination. The formal world is deemed a simplified representation of the real world, implying that hallucination is inevitable for real-world LLMs. Additionally, for real-world LLMs with provable time complexity constraints, the paper identifies tasks prone to hallucination and provides empirical validation. Finally, the paper evaluates existing hallucination mitigators using the formal world framework and discusses practical implications for the safe deployment of LLMs. ...

Zhiwei He Improving Machine Translation Use Quality Estimation as a Reward Model 2024

[TOC] Title: Improving Machine Translation Use Quality Estimation as a Reward Model 2024 Author: Zhiwei He et. al. Publish Year: 23 Jan 2024 Review Date: Sun, Jan 28, 2024 url: arXiv:2401.12873v1 Summary of paper Contribution In this research, the authors explore using Quality Estimation (QE) models as a basis for reward systems in translation quality improvement through human feedback. They note that while QE has shown promise aligning with human evaluations, there’s a risk of overoptimization where translations receive high rewards despite declining quality. The study addresses this by introducing heuristic rules to identify and penalize incorrect translations, resulting in improved training outcomes. Experimental results demonstrate consistent enhancements across various setups, validated by human preference studies. Additionally, the approach proves highly data-efficient, outperforming systems relying on larger parallel corpora with only a small amount of monolingual data. ...

Krishan Rana Sayplan Grounding Llm for Scalable Task Planning 2023

[TOC] Title: SayPlan: Grounding Large Language Models using 3D Scene for for Scalable Task Planning Author: Krishan Rana Publish Year: CoRL 2023 Review Date: Sun, Jan 28, 2024 url: https://arxiv.org/abs/2307.06135 Summary of paper Motivation this is a pipeline introduction paper Contribution Hierarchical Exploration: SayPlan leverages the hierarchical structure of 3DSGs to enable LLMs to conduct semantic searches for task-relevant subgraphs from a condensed representation of the full graph. Path Planning Integration: It integrates a classical path planner to reduce the planning horizon for the LLM, thus improving efficiency. Iterative Replanning Pipeline: An iterative replanning pipeline refines initial plans by incorporating feedback from a scene graph simulator, correcting infeasible actions and preventing planning failures. Some key terms ...

Luigi Bonassi Planning With Qualitative Constraints Pddl3 2022

[TOC] Title: Planning With Qualitative Constraints Pddl3 2022 Author: Luigi Bonassi et. al. Publish Year: Review Date: Sun, Jan 28, 2024 url: https://www.ijcai.org/proceedings/2022/0639.pdf Summary of paper The paper introduces a formalism to express trajectory constraints over actions in plans, complementing the state-trajectory constraints of PDDL3. This new formalism retains PDDL3’s temporal modal operators and adds two modalities. The authors then explore compilation-based methods for dealing with action-trajectory constraints in propositional planning, proposing a new, simple, and effective method. Experimental results demonstrate the utility of action-trajectory constraints for expressing control knowledge, showing significant performance improvements in classical planners when leveraging knowledge expressed through action constraints. Conversely, the same knowledge specified as state constraints and handled by two state-of-the-art systems yields less beneficial results. ...

Parsa Mahmoudieh Zero Shot Reward Specification via Grounded Natural Language 2022

[TOC] Title: Zero Shot Reward Specification via Grounded Natural Language Author: Parsa Mahnoudieh et. al. Publish Year: PMLR 2022 Review Date: Sun, Jan 28, 2024 url: Summary of paper Motivation reward signals in RL are expensive to design and often require access to the true state. common alternatives are usually demonstrations or goal images which can be label intensive on the other hand, text descriptions provide a general low-effect way of communicating. previous work rely on true state or labelled expert demonstration match, this work directly use CLIP to convert the observation to semantic embeddings Contribution Some key terms Difference ...

Allen Z Ren Robots That Ask for Help Uncertainty Alignment 2023

[TOC] Title: Robots That Ask for Help: Uncertainty Alignment for Large Language Model Planners Author: Allen Z. Ren et. al. Publish Year: 4 Sep 2023 Review Date: Fri, Jan 26, 2024 url: arXiv:2307.01928v2 Summary of paper Motivation LLMs have various capabilities but often make overly confident yet incorrect predictions. KNOWNO aims to measure and align this uncertainty, enabling LLM-based planners to recognize their limitations and request assistance when necessary. Contribution built on theory of conformal prediction Some key terms Ambiguity in NL ...

Marta Skreta Replan Robotic Replanning 2024

[TOC] Title: RePlan: Robotic Replanning with Perception and Language Models Author: Marta Skreta et. al. Publish Year: 8 Jan 2024 Review Date: Thu, Jan 25, 2024 url: arXiv:2401.04157v1 Summary of paper Motivation However, the challenge remains that even with syntac- tically correct plans, robots can still fail to achieve their intended goals. This failure can be attributed to imperfect plans proposed by LLMs or to unforeseeable environmental circumstances that hinder the execution of planned subtasks due to erroneous assumptions about the state of objects. Contribution Robotic Replanning with Perception and Language Models that enables real-time replanning capabilities for long-horizon tasks. Some key terms Address the challenge of multi-stage long-horizon tasks ...

Binghai Wang Secrets of Rlhf Reward Modelling 2024

[TOC] Title: Secrets of RLHF in Large Language Models Part II: Reward Modelling Author: Binghai Wang et. al. Publish Year: 12 Jan 2024 Review Date: Wed, Jan 24, 2024 url: arXiv:2401.06080v2 Summary of paper Motivation a crucial technology for aligning language models with human values. Two main issues are tackled: (1) Incorrect and ambiguous preference pairs in the dataset hindering reward model accuracy, and (2) Difficulty in generalization for reward models trained on specific distributions. a method measuring preference strength within the data is proposed, utilizing a voting mechanism of multiple reward models. Novel techniques are introduced to mitigate the impact of incorrect preferences and leverage high-quality preference data. For the second issue, contrastive learning is introduced to enhance the reward models’ ability to distinguish between chosen and rejected responses, improving generalization. Some key terms noisy data ...

Rui Zheng Secrets of Rlhf in Llm Part Ppo 2023

[TOC] Title: Secrets of RLHF in Large Language Models Part1: PPO Author: Rui Zheng et. al. Publish Year: 18 Jul 2023 Review Date: Mon, Jan 22, 2024 url: arXiv:2307.04964v2 Summary of paper Motivation Current approaches involve creating reward models to measure human preferences, using Proximal Policy Optimization (PPO) to improve policy models, and enhancing step-by-step reasoning through process supervision. However, challenges in reward design, interaction with the environment, and agent training, along with the high trial and error costs of LLMs, make it difficult for researchers to develop technically aligned and safe LLMs. Contribution finding that LLMs trained using their algorithm can better understand query meanings and provide responses that resonate with people. A new PPO algorithm called PPO-max is introduced, which incorporates effective implementations and addresses stability issues. Some key terms RLHF limitation ...

Zhiting Hu Language Agent and World Models 2023

[TOC] Title: Zhiting Hu Language Agent and World Models 2023 Author: Publish Year: Review Date: Mon, Jan 22, 2024 url: arXiv:2312.05230v1 Summary of paper Motivation LAW proposes that world and agent models, which encompass beliefs about the world, anticipation of consequences, goals/rewards, and strategic planning, provide a better abstraction of reasoning. In this framework, language models play a crucial role as a backend Some key terms Limitation of Language ...

Gautier Dagan Dynamic Planning With a Llm 2023

[TOC] Title: Dynamic Planning With a LLM Author: Gautier Dagan et. al. Publish Year: 11 Aug 2023 Review Date: Sun, Jan 21, 2024 url: arXiv:2308.06391v1 Summary of paper Motivation Traditional symbolic planners can find optimal solutions quickly but need complete and accurate problem representations. In contrast, LLMs can handle noisy data and uncertainty but struggle with planning tasks. The LLM-DP framework combines LLMs and traditional planners to solve embodied tasks efficiently. Traditional Planner need maximal information Some key terms Hallucination ...

Jun Wang Conformal Temporal Logic Planning Using Llm 2023

[TOC] Title: Conformal Temporal Logic Planning Using Llm 2023 Author: Jun Wang et. al. Publish Year: 19 Dec, 2023 Review Date: Sun, Jan 21, 2024 url: arXiv:2309.10092v2 Summary of paper Motivation Unlike previous methods that focus on low-level system configurations, this approach focuses on NL-based atomic propositions. now the LTL tasks are defined over NL-based atomic propositions Robots are required to perform high-level sub tasks specified in natural language. To formally define the overarching mission, they leverage LTL defined over atomic predicates modelling these NL-based sub-tasks. Contribution To address the challenge of ensuring the correctness of robot plans with respect to these LTL-encoded tasks, the authors propose HERACLEs, a hierarchical conformal natural language planner. HERACLEs employs automata theory to determine the next NL-specified sub-tasks for mission progress, employs Large Language Models to design robot plans to fulfill these sub-tasks, and uses conformal prediction to assess the probabilistic correctness of the plans, deciding whether external assistance is needed. The paper provides theoretical probabilistic guarantees for mission satisfaction and presents extensive comparative experiments on mobile manipulation tasks. Some key terms Limitation for previous work ...

Gerevini Plan Constraints and Preferences in Pddl3 2005

[TOC] Title: Gerevini Plan Constraints and Preferences in PDDL3 Author: Alfonso Gerevini, Derek Long Publish Year: 2005 Review Date: Thu, Jan 11, 2024 url: http://www.cs.yale.edu/~dvm/papers/pddl-ipc5.pdf Summary of paper Motivation the notion of plan quality in automated planning is a practically very important issue. it is important to generate plans of good or optimal quality and we need to express the plan quality the proposed extended language allows us to express strong and soft constraints on plan trajectories i.e., constraints over possible actions in the plan and intermediate states reached by the plan as well as strong and soft problem goals. Some key terms some scenarios ...

Nir Lipo Planning With Perspectives Using Functional Strips 2022

[TOC] Title: Planning With Perspectives – Using Decomposing Epistemic Planning using Functional STRIPS Author: Guang Hu, Nir Lipovetzky Publish Year: 2022 Review Date: Thu, Jan 11, 2024 url: https://nirlipo.github.io/publication/hu-2022-planning/ Summary of paper Motivation we present a novel approach to epistemic planning called planning with perspectives (PWP) that is both more expressive and computationally more efficient than existing state of the art epistemic planning tools. Contribution in this paper, we decompose epistemic planning by delegating reasoning about epistemic formulae to an external solver, i.e., Functional STRIPS F-STRIPS supports the user of external, black-box functions within action models. Building on recent work that demonstrates the relationship between what an agent ‘sees’ and what it knows, we define the perspective of each agent using an external function, and build a solver for epistemic logic around this. Some key terms external functions (black-box) ...

Alex_coulter Theory Alignment via a Classical Encoding of Regular Bismulation 2022

[TOC] Title: Theory Alignment via a Classical Encoding of Regular Bismulation 2022 Author: Alex Coulter et. al. Publish Year: KEPS 2022 Review Date: Wed, Nov 29, 2023 url: https://icaps22.icaps-conference.org/workshops/KEPS/KEPS-22_paper_7781.pdf Summary of paper Motivation the main question we seek to answer is how we can test if two models align (where the fluents and action implementations may differ), and if not, where that misalignment occurs. Contribution the work is built on a foundation of regular bisimulation found that the proposed alignment was not only viable, with many submissions having “solutions” to the merged model showing where a modelling error occurs, but several cases demonstrated errors with the submitted domains that were subtle and detected only by this added approach. Some key terms Bisimulation ...

Pascal Bercher Detecting Ai Planning Modelling Mistakes Potential Errors and Benchmark Domains 2023

[TOC] Title: Detecting Ai Planning Modelling Mistakes Potential Errors and Benchmark Domains Author: Pascal Bercher et. al. Publish Year: 2023 Review Date: Mon, Nov 13, 2023 url: https://bercher.net/publications/2023/Sleath2023PossibleModelingErrors.pdf Summary of paper Contribution the author provided a compilation of potential modelling errors the author supply a public repository of 56 (flawed) benchmark domains conducted an evaluation of well-known AI planning tools for their ability to diagnose those errors, showing that not a single tool is able to spot all errors, with no tool being strictly stronger than another. Some key terms list of errors ...

Yecheng Jason Ma Eureka Human Level Reward Design via Coding Large Language Models 2023

[TOC] Title: Eureka Human Level Reward Design via Coding Large Language Models 2023 Author: Yecheng Jason Ma et. al. Publish Year: 19 Oct 2023 Review Date: Fri, Oct 27, 2023 url: https://arxiv.org/pdf/2310.12931.pdf Summary of paper Motivation harnessing LLMs to learn complex low-level manipulation tasks, remains an open problem. we bridge this fundamental gap by using LLMs to produce rewards that can be used to acquire conplex skill via reinforcement learning. Contribution Eureka generate reward functions that outperform expert human-engineered rewards. the generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF) Some key terms given detailed environmental code and natural language description about the task, the LLMs can generate reward function candidate sampling. As many real-world RL tasks admit sparse rewards that are difficult for learning, reward shaping that provides incremental learning signals is necessary in practice reward design problem ...

Mark Chen Evaluating Large Language Models Trained on Code 2021

[TOC] Title: Evaluating Large Language Models Trained on Code Author: Mark Chen et. al. OPENAI Publish Year: 14 Jul 2021 Review Date: Mon, Oct 16, 2023 url: https://arxiv.org/pdf/2107.03374.pdf Summary of paper Motivation it is the research paper behind Github Copilot tech more recently, language models have also fueled progress towards the longstanding challenge of program synthesis. Contribution we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. limitation difficulty with docstrings describing long chain of operations and with binding operations to variables. Some key terms HumanEval ...

Baptiste Roziere Code Llama Open Foundation Model for Code 2023

[TOC] Title: Code Llama Open Foundation Model for Code Author: Baptiste Roziere et. al. META AI Publish Year: 2023 Review Date: Mon, Oct 16, 2023 url: https://scontent.fmel13-1.fna.fbcdn.net/v/t39.2365-6/369856151_1754812304950972_1159666448927483931_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=Hcg6QsYJx1wAX_okEZO&_nc_ht=scontent.fmel13-1.fna&oh=00_AfAYtfHJfYeomAQWiMUTRo96iP8d4sZrlIfD_KAeYlYaDQ&oe=6531E8CF Summary of paper Motivation CODE Llama, support for large input contexts, and zero-shot instruction following ability for programming tasks. Contribution CODE llama reaches SOTA performance among open models on several code benchmarks Some key terms By training on domain-specific datasets, LLM have proved effective more broadly on applications that require advanced natural language understanding. ...

Haotian Liu Improved Baselines With Visual Instruction Tuning 2023

[TOC] Title: Improved Baselines With Visual Instruction Tuning Author: Haotian Liu et. al. Publish Year: Oct 5 2023 Review Date: Sun, Oct 8, 2023 url: https://arxiv.org/pdf/2310.03744.pdf Summary of paper Motivation we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. Contribution with simple modifications to LLaVA, namely, using CLIP-ViT with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, they establish stronger baseline. Some key terms Improvement one: MLP cross modal connector ...