Mingyu Jin the Impact of Reasoning Steps Length on Llm 2024

[TOC] Title: The Impact of Reasoning Steps Length on Large Language Models Author: Mingyu Jin et. al. Publish Year: 20 Jan 2024 Review Date: Mon, Jan 29, 2024 url: arXiv:2401.04925v3 Summary of paper Contribution The study investigates the impact of the length of reasoning steps in prompts on the reasoning abilities of Large Language Models (LLMs), focusing on Chain of Thought (CoT). Here are the key findings: Effect of Reasoning Step Length:...

<span title='2024-01-29 17:44:10 +1100 AEDT'>January 29, 2024</span>&nbsp;ยท&nbsp;3 min&nbsp;ยท&nbsp;568 words&nbsp;ยท&nbsp;Sukai Huang

Weak-To-Strong-Generalization: Eliciting Strong Capabilities with Weak Supervision

[TOC] Title: Weak-To-Strong-Generalization: Eliciting Strong Capabilities with Weak Supervision Author: Collin Burns et. al. Publish Year: 14 Dec 2023 Review Date: Mon, Jan 29, 2024 url: arXiv:2312.09390v1 Summary of paper Motivation Superalignment: OPENAI believe that RLHF is essentially use human to supervise the model (RM is trained by human annotation). One day when superhuman models come out, human are no longer to annotate the good / bad of the modelโ€™s output....

<span title='2024-01-29 15:32:21 +1100 AEDT'>January 29, 2024</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;377 words&nbsp;ยท&nbsp;Sukai Huang

Ziwei Xu Hallucination Is Inevitable an Innate Limitation Llm 2024

[TOC] Title: Hallucination Is Inevitable an Innate Limitation Llm 2024 Author: Ziwei Xu et. al. Publish Year: 22 Jan 2024 Review Date: Sun, Jan 28, 2024 url: arXiv:2401.11817v1 Summary of paper Contribution The paper formalizes the issue of hallucination in large language models (LLMs) and argues that it is impossible to completely eliminate hallucination. It defines hallucination as inconsistencies between a computable LLM and a computable ground truth function. By drawing from learning theory, the paper demonstrates that LLMs cannot learn all computable functions, thus always prone to hallucination....

<span title='2024-01-28 23:11:28 +1100 AEDT'>January 28, 2024</span>&nbsp;ยท&nbsp;3 min&nbsp;ยท&nbsp;543 words&nbsp;ยท&nbsp;Sukai Huang

Zhiwei He Improving Machine Translation Use Quality Estimation as a Reward Model 2024

[TOC] Title: Improving Machine Translation Use Quality Estimation as a Reward Model 2024 Author: Zhiwei He et. al. Publish Year: 23 Jan 2024 Review Date: Sun, Jan 28, 2024 url: arXiv:2401.12873v1 Summary of paper Contribution In this research, the authors explore using Quality Estimation (QE) models as a basis for reward systems in translation quality improvement through human feedback. They note that while QE has shown promise aligning with human evaluations, thereโ€™s a risk of overoptimization where translations receive high rewards despite declining quality....

<span title='2024-01-28 22:53:41 +1100 AEDT'>January 28, 2024</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;285 words&nbsp;ยท&nbsp;Sukai Huang

Krishan Rana Sayplan Grounding Llm for Scalable Task Planning 2023

[TOC] Title: SayPlan: Grounding Large Language Models using 3D Scene for for Scalable Task Planning Author: Krishan Rana Publish Year: CoRL 2023 Review Date: Sun, Jan 28, 2024 url: https://arxiv.org/abs/2307.06135 Summary of paper Motivation this is a pipeline introduction paper Contribution Hierarchical Exploration: SayPlan leverages the hierarchical structure of 3DSGs to enable LLMs to conduct semantic searches for task-relevant subgraphs from a condensed representation of the full graph. Path Planning Integration: It integrates a classical path planner to reduce the planning horizon for the LLM, thus improving efficiency....

<span title='2024-01-28 21:37:21 +1100 AEDT'>January 28, 2024</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;388 words&nbsp;ยท&nbsp;Sukai Huang

Luigi Bonassi Planning With Qualitative Constraints Pddl3 2022

[TOC] Title: Planning With Qualitative Constraints Pddl3 2022 Author: Luigi Bonassi et. al. Publish Year: Review Date: Sun, Jan 28, 2024 url: https://www.ijcai.org/proceedings/2022/0639.pdf Summary of paper The paper introduces a formalism to express trajectory constraints over actions in plans, complementing the state-trajectory constraints of PDDL3. This new formalism retains PDDL3โ€™s temporal modal operators and adds two modalities. The authors then explore compilation-based methods for dealing with action-trajectory constraints in propositional planning, proposing a new, simple, and effective method....

<span title='2024-01-28 21:28:51 +1100 AEDT'>January 28, 2024</span>&nbsp;ยท&nbsp;1 min&nbsp;ยท&nbsp;125 words&nbsp;ยท&nbsp;Sukai Huang

Parsa Mahmoudieh Zero Shot Reward Specification via Grounded Natural Language 2022

[TOC] Title: Zero Shot Reward Specification via Grounded Natural Language Author: Parsa Mahnoudieh et. al. Publish Year: PMLR 2022 Review Date: Sun, Jan 28, 2024 url: Summary of paper Motivation reward signals in RL are expensive to design and often require access to the true state. common alternatives are usually demonstrations or goal images which can be label intensive on the other hand, text descriptions provide a general low-effect way of communicating....

<span title='2024-01-28 09:31:05 +1100 AEDT'>January 28, 2024</span>&nbsp;ยท&nbsp;3 min&nbsp;ยท&nbsp;538 words&nbsp;ยท&nbsp;Sukai Huang

Allen Z Ren Robots That Ask for Help Uncertainty Alignment 2023

[TOC] Title: Robots That Ask for Help: Uncertainty Alignment for Large Language Model Planners Author: Allen Z. Ren et. al. Publish Year: 4 Sep 2023 Review Date: Fri, Jan 26, 2024 url: arXiv:2307.01928v2 Summary of paper Motivation LLMs have various capabilities but often make overly confident yet incorrect predictions. KNOWNO aims to measure and align this uncertainty, enabling LLM-based planners to recognize their limitations and request assistance when necessary. Contribution built on theory of conformal prediction Some key terms Ambiguity in NL...

<span title='2024-01-26 17:29:29 +1100 AEDT'>January 26, 2024</span>&nbsp;ยท&nbsp;3 min&nbsp;ยท&nbsp;510 words&nbsp;ยท&nbsp;Sukai Huang

Marta Skreta Replan Robotic Replanning 2024

[TOC] Title: RePlan: Robotic Replanning with Perception and Language Models Author: Marta Skreta et. al. Publish Year: 8 Jan 2024 Review Date: Thu, Jan 25, 2024 url: arXiv:2401.04157v1 Summary of paper Motivation However, the challenge remains that even with syntac- tically correct plans, robots can still fail to achieve their intended goals. This failure can be attributed to imperfect plans proposed by LLMs or to unforeseeable environmental circumstances that hinder the execution of planned subtasks due to erroneous assumptions about the state of objects....

<span title='2024-01-25 00:55:05 +1100 AEDT'>January 25, 2024</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;261 words&nbsp;ยท&nbsp;Sukai Huang

Binghai Wang Secrets of Rlhf Reward Modelling 2024

[TOC] Title: Secrets of RLHF in Large Language Models Part II: Reward Modelling Author: Binghai Wang et. al. Publish Year: 12 Jan 2024 Review Date: Wed, Jan 24, 2024 url: arXiv:2401.06080v2 Summary of paper Motivation a crucial technology for aligning language models with human values. Two main issues are tackled: (1) Incorrect and ambiguous preference pairs in the dataset hindering reward model accuracy, and (2) Difficulty in generalization for reward models trained on specific distributions....

<span title='2024-01-24 23:31:28 +1100 AEDT'>January 24, 2024</span>&nbsp;ยท&nbsp;1 min&nbsp;ยท&nbsp;144 words&nbsp;ยท&nbsp;Sukai Huang

Rui Zheng Secrets of Rlhf in Llm Part Ppo 2023

[TOC] Title: Secrets of RLHF in Large Language Models Part1: PPO Author: Rui Zheng et. al. Publish Year: 18 Jul 2023 Review Date: Mon, Jan 22, 2024 url: arXiv:2307.04964v2 Summary of paper Motivation Current approaches involve creating reward models to measure human preferences, using Proximal Policy Optimization (PPO) to improve policy models, and enhancing step-by-step reasoning through process supervision. However, challenges in reward design, interaction with the environment, and agent training, along with the high trial and error costs of LLMs, make it difficult for researchers to develop technically aligned and safe LLMs....

<span title='2024-01-22 20:26:18 +1100 AEDT'>January 22, 2024</span>&nbsp;ยท&nbsp;3 min&nbsp;ยท&nbsp;465 words&nbsp;ยท&nbsp;Sukai Huang

Zhiting Hu Language Agent and World Models 2023

[TOC] Title: Zhiting Hu Language Agent and World Models 2023 Author: Publish Year: Review Date: Mon, Jan 22, 2024 url: arXiv:2312.05230v1 Summary of paper Motivation LAW proposes that world and agent models, which encompass beliefs about the world, anticipation of consequences, goals/rewards, and strategic planning, provide a better abstraction of reasoning. In this framework, language models play a crucial role as a backend Some key terms Limitation of Language Ambiguity and Imprecision: LLMs struggle with natural languageโ€™s ambiguity and imprecision because they lack the rich context that humans use when producing text....

<span title='2024-01-22 16:01:20 +1100 AEDT'>January 22, 2024</span>&nbsp;ยท&nbsp;4 min&nbsp;ยท&nbsp;749 words&nbsp;ยท&nbsp;Sukai Huang

Gautier Dagan Dynamic Planning With a Llm 2023

[TOC] Title: Dynamic Planning With a LLM Author: Gautier Dagan et. al. Publish Year: 11 Aug 2023 Review Date: Sun, Jan 21, 2024 url: arXiv:2308.06391v1 Summary of paper Motivation Traditional symbolic planners can find optimal solutions quickly but need complete and accurate problem representations. In contrast, LLMs can handle noisy data and uncertainty but struggle with planning tasks. The LLM-DP framework combines LLMs and traditional planners to solve embodied tasks efficiently....

<span title='2024-01-21 01:42:23 +1100 AEDT'>January 21, 2024</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;384 words&nbsp;ยท&nbsp;Sukai Huang

Jun Wang Conformal Temporal Logic Planning Using Llm 2023

[TOC] Title: Conformal Temporal Logic Planning Using Llm 2023 Author: Jun Wang et. al. Publish Year: 19 Dec, 2023 Review Date: Sun, Jan 21, 2024 url: arXiv:2309.10092v2 Summary of paper Motivation Unlike previous methods that focus on low-level system configurations, this approach focuses on NL-based atomic propositions. now the LTL tasks are defined over NL-based atomic propositions Robots are required to perform high-level sub tasks specified in natural language. To formally define the overarching mission, they leverage LTL defined over atomic predicates modelling these NL-based sub-tasks....

<span title='2024-01-21 00:34:56 +1100 AEDT'>January 21, 2024</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;357 words&nbsp;ยท&nbsp;Sukai Huang

Gerevini Plan Constraints and Preferences in Pddl3 2005

[TOC] Title: Gerevini Plan Constraints and Preferences in PDDL3 Author: Alfonso Gerevini, Derek Long Publish Year: 2005 Review Date: Thu, Jan 11, 2024 url: http://www.cs.yale.edu/~dvm/papers/pddl-ipc5.pdf Summary of paper Motivation the notion of plan quality in automated planning is a practically very important issue. it is important to generate plans of good or optimal quality and we need to express the plan quality the proposed extended language allows us to express strong and soft constraints on plan trajectories i....

<span title='2024-01-11 19:54:29 +1100 AEDT'>January 11, 2024</span>&nbsp;ยท&nbsp;1 min&nbsp;ยท&nbsp;122 words&nbsp;ยท&nbsp;Sukai Huang

Nir Lipo Planning With Perspectives Using Functional Strips 2022

[TOC] Title: Planning With Perspectives โ€“ Using Decomposing Epistemic Planning using Functional STRIPS Author: Guang Hu, Nir Lipovetzky Publish Year: 2022 Review Date: Thu, Jan 11, 2024 url: https://nirlipo.github.io/publication/hu-2022-planning/ Summary of paper Motivation we present a novel approach to epistemic planning called planning with perspectives (PWP) that is both more expressive and computationally more efficient than existing state of the art epistemic planning tools. Contribution in this paper, we decompose epistemic planning by delegating reasoning about epistemic formulae to an external solver, i....

<span title='2024-01-11 19:41:55 +1100 AEDT'>January 11, 2024</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;267 words&nbsp;ยท&nbsp;Sukai Huang

Alex_coulter Theory Alignment via a Classical Encoding of Regular Bismulation 2022

[TOC] Title: Theory Alignment via a Classical Encoding of Regular Bismulation 2022 Author: Alex Coulter et. al. Publish Year: KEPS 2022 Review Date: Wed, Nov 29, 2023 url: https://icaps22.icaps-conference.org/workshops/KEPS/KEPS-22_paper_7781.pdf Summary of paper Motivation the main question we seek to answer is how we can test if two models align (where the fluents and action implementations may differ), and if not, where that misalignment occurs. Contribution the work is built on a foundation of regular bisimulation found that the proposed alignment was not only viable, with many submissions having โ€œsolutionsโ€ to the merged model showing where a modelling error occurs, but several cases demonstrated errors with the submitted domains that were subtle and detected only by this added approach....

<span title='2023-11-29 17:24:08 +1100 AEDT'>November 29, 2023</span>&nbsp;ยท&nbsp;6 min&nbsp;ยท&nbsp;1083 words&nbsp;ยท&nbsp;Sukai Huang

Pascal Bercher Detecting Ai Planning Modelling Mistakes Potential Errors and Benchmark Domains 2023

[TOC] Title: Detecting Ai Planning Modelling Mistakes Potential Errors and Benchmark Domains Author: Pascal Bercher et. al. Publish Year: 2023 Review Date: Mon, Nov 13, 2023 url: https://bercher.net/publications/2023/Sleath2023PossibleModelingErrors.pdf Summary of paper Contribution the author provided a compilation of potential modelling errors the author supply a public repository of 56 (flawed) benchmark domains conducted an evaluation of well-known AI planning tools for their ability to diagnose those errors, showing that not a single tool is able to spot all errors, with no tool being strictly stronger than another....

<span title='2023-11-13 22:33:14 +1100 AEDT'>November 13, 2023</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;408 words&nbsp;ยท&nbsp;Sukai Huang

Yecheng Jason Ma Eureka Human Level Reward Design via Coding Large Language Models 2023

[TOC] Title: Eureka Human Level Reward Design via Coding Large Language Models 2023 Author: Yecheng Jason Ma et. al. Publish Year: 19 Oct 2023 Review Date: Fri, Oct 27, 2023 url: https://arxiv.org/pdf/2310.12931.pdf Summary of paper Motivation harnessing LLMs to learn complex low-level manipulation tasks, remains an open problem. we bridge this fundamental gap by using LLMs to produce rewards that can be used to acquire conplex skill via reinforcement learning. Contribution Eureka generate reward functions that outperform expert human-engineered rewards....

<span title='2023-10-27 16:44:22 +1100 AEDT'>October 27, 2023</span>&nbsp;ยท&nbsp;6 min&nbsp;ยท&nbsp;1163 words&nbsp;ยท&nbsp;Sukai Huang

Mark Chen Evaluating Large Language Models Trained on Code 2021

[TOC] Title: Evaluating Large Language Models Trained on Code Author: Mark Chen et. al. OPENAI Publish Year: 14 Jul 2021 Review Date: Mon, Oct 16, 2023 url: https://arxiv.org/pdf/2107.03374.pdf Summary of paper Motivation it is the research paper behind Github Copilot tech more recently, language models have also fueled progress towards the longstanding challenge of program synthesis. Contribution we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts....

<span title='2023-10-16 07:24:26 +1100 AEDT'>October 16, 2023</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;298 words&nbsp;ยท&nbsp;Sukai Huang

Baptiste Roziere Code Llama Open Foundation Model for Code 2023

[TOC] Title: Code Llama Open Foundation Model for Code Author: Baptiste Roziere et. al. META AI Publish Year: 2023 Review Date: Mon, Oct 16, 2023 url: https://scontent.fmel13-1.fna.fbcdn.net/v/t39.2365-6/369856151_1754812304950972_1159666448927483931_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=Hcg6QsYJx1wAX_okEZO&_nc_ht=scontent.fmel13-1.fna&oh=00_AfAYtfHJfYeomAQWiMUTRo96iP8d4sZrlIfD_KAeYlYaDQ&oe=6531E8CF Summary of paper Motivation CODE Llama, support for large input contexts, and zero-shot instruction following ability for programming tasks. Contribution CODE llama reaches SOTA performance among open models on several code benchmarks Some key terms By training on domain-specific datasets, LLM have proved effective more broadly on applications that require advanced natural language understanding....

<span title='2023-10-16 02:58:20 +1100 AEDT'>October 16, 2023</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;284 words&nbsp;ยท&nbsp;Sukai Huang

Haotian Liu Improved Baselines With Visual Instruction Tuning 2023

[TOC] Title: Improved Baselines With Visual Instruction Tuning Author: Haotian Liu et. al. Publish Year: Oct 5 2023 Review Date: Sun, Oct 8, 2023 url: https://arxiv.org/pdf/2310.03744.pdf Summary of paper Motivation we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. Contribution with simple modifications to LLaVA, namely, using CLIP-ViT with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, they establish stronger baseline....

<span title='2023-10-08 10:37:37 +1100 AEDT'>October 8, 2023</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;240 words&nbsp;ยท&nbsp;Sukai Huang

Christabel Wayllace Goal Recognition Design With Stochastic Agent Action Outcomes 2016

[TOC] Title: Christabel Wayllace Goal Recognition Design With Stochastic Agent Action Outcomes 2016 Author: Christable Wayllace et. al. Publish Year: IJCAI 2016 Review Date: Fri, Oct 6, 2023 url: https://www.ijcai.org/Proceedings/16/Papers/464.pdf Summary of paper Motivation in this paper, they generalize the Goal Recognition Design (GRD) problem to Stochastic GRD (S-GRD) problems, which handle stochastic action outcomes. Some key terms Plan and goal recognition problem it aims to identify the actual plan or goal of an agent given its behaviour....

<span title='2023-10-06 18:16:28 +1100 AEDT'>October 6, 2023</span>&nbsp;ยท&nbsp;1 min&nbsp;ยท&nbsp;191 words&nbsp;ยท&nbsp;Sukai Huang

Alba Gragera Pddl Domain Repair Fixing Domains With Incomplete Action Effects 2023

[TOC] Title: PDDL Domain Repair Fixing Domains With Incomplete Action Effects Author: Alba Gragera et. al. Publish Year: ICAPS 2023 Review Date: Wed, Sep 20, 2023 url: https://icaps23.icaps-conference.org/demos/papers/2791_paper.pdf Summary of paper Contribution in this paper, they present a tool to repair planning models where the effects of some actions are incomplete. The received input is compiled to a new extended planning task, in which actions are permitted to insert possible missing effects....

<span title='2023-09-20 23:17:51 +1000 AEST'>September 20, 2023</span>&nbsp;ยท&nbsp;1 min&nbsp;ยท&nbsp;153 words&nbsp;ยท&nbsp;Sukai Huang

Alba Gragera Exploring the Limitations of Using LLMs to Fix Planning Tasks 2023

[TOC] Title: Exploring the Limitations of Using LLMs to Fix Planning Tasks Author: Alba Gragera et. al. Publish Year: icaps23.icaps-conference Review Date: Wed, Sep 20, 2023 url: https://icaps23.icaps-conference.org/program/workshops/keps/KEPS-23_paper_3645.pdf Summary of paper Motivation In this work, the authors present ongoing efforts on exploring the limitations of LLMs in task requiring reasoning and planning competences: that of assisting humans in the process of fixing planning tasks. Contribution investigate how good LLMs are at repairing planning tasks when the prompt is given in PDDL and when it is given in natural language....

<span title='2023-09-20 20:22:32 +1000 AEST'>September 20, 2023</span>&nbsp;ยท&nbsp;2 min&nbsp;ยท&nbsp;403 words&nbsp;ยท&nbsp;Sukai Huang