Gerevini Plan Constraints and Preferences in Pddl3 2005

[TOC] Title: Gerevini Plan Constraints and Preferences in PDDL3 Author: Alfonso Gerevini, Derek Long Publish Year: 2005 Review Date: Thu, Jan 11, 2024 url: http://www.cs.yale.edu/~dvm/papers/pddl-ipc5.pdf Summary of paper Motivation the notion of plan quality in automated planning is a practically very important issue. it is important to generate plans of good or optimal quality and we need to express the plan quality the proposed extended language allows us to express strong and soft constraints on plan trajectories i.e., constraints over possible actions in the plan and intermediate states reached by the plan as well as strong and soft problem goals. Some key terms some scenarios ...

January 11, 2024 · 1 min · 122 words · Sukai Huang

Nir Lipo Planning With Perspectives Using Functional Strips 2022

[TOC] Title: Planning With Perspectives – Using Decomposing Epistemic Planning using Functional STRIPS Author: Guang Hu, Nir Lipovetzky Publish Year: 2022 Review Date: Thu, Jan 11, 2024 url: https://nirlipo.github.io/publication/hu-2022-planning/ Summary of paper Motivation we present a novel approach to epistemic planning called planning with perspectives (PWP) that is both more expressive and computationally more efficient than existing state of the art epistemic planning tools. Contribution in this paper, we decompose epistemic planning by delegating reasoning about epistemic formulae to an external solver, i.e., Functional STRIPS F-STRIPS supports the user of external, black-box functions within action models. Building on recent work that demonstrates the relationship between what an agent ‘sees’ and what it knows, we define the perspective of each agent using an external function, and build a solver for epistemic logic around this. Some key terms external functions (black-box) ...

January 11, 2024 · 2 min · 267 words · Sukai Huang

Alex_coulter Theory Alignment via a Classical Encoding of Regular Bismulation 2022

[TOC] Title: Theory Alignment via a Classical Encoding of Regular Bismulation 2022 Author: Alex Coulter et. al. Publish Year: KEPS 2022 Review Date: Wed, Nov 29, 2023 url: https://icaps22.icaps-conference.org/workshops/KEPS/KEPS-22_paper_7781.pdf Summary of paper Motivation the main question we seek to answer is how we can test if two models align (where the fluents and action implementations may differ), and if not, where that misalignment occurs. Contribution the work is built on a foundation of regular bisimulation found that the proposed alignment was not only viable, with many submissions having “solutions” to the merged model showing where a modelling error occurs, but several cases demonstrated errors with the submitted domains that were subtle and detected only by this added approach. Some key terms Bisimulation ...

November 29, 2023 · 6 min · 1083 words · Sukai Huang

Pascal Bercher Detecting Ai Planning Modelling Mistakes Potential Errors and Benchmark Domains 2023

[TOC] Title: Detecting Ai Planning Modelling Mistakes Potential Errors and Benchmark Domains Author: Pascal Bercher et. al. Publish Year: 2023 Review Date: Mon, Nov 13, 2023 url: https://bercher.net/publications/2023/Sleath2023PossibleModelingErrors.pdf Summary of paper Contribution the author provided a compilation of potential modelling errors the author supply a public repository of 56 (flawed) benchmark domains conducted an evaluation of well-known AI planning tools for their ability to diagnose those errors, showing that not a single tool is able to spot all errors, with no tool being strictly stronger than another. Some key terms list of errors ...

November 13, 2023 · 2 min · 408 words · Sukai Huang

Yecheng Jason Ma Eureka Human Level Reward Design via Coding Large Language Models 2023

[TOC] Title: Eureka Human Level Reward Design via Coding Large Language Models 2023 Author: Yecheng Jason Ma et. al. Publish Year: 19 Oct 2023 Review Date: Fri, Oct 27, 2023 url: https://arxiv.org/pdf/2310.12931.pdf Summary of paper Motivation harnessing LLMs to learn complex low-level manipulation tasks, remains an open problem. we bridge this fundamental gap by using LLMs to produce rewards that can be used to acquire conplex skill via reinforcement learning. Contribution Eureka generate reward functions that outperform expert human-engineered rewards. the generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF) Some key terms given detailed environmental code and natural language description about the task, the LLMs can generate reward function candidate sampling. As many real-world RL tasks admit sparse rewards that are difficult for learning, reward shaping that provides incremental learning signals is necessary in practice reward design problem ...

October 27, 2023 · 6 min · 1163 words · Sukai Huang

Mark Chen Evaluating Large Language Models Trained on Code 2021

[TOC] Title: Evaluating Large Language Models Trained on Code Author: Mark Chen et. al. OPENAI Publish Year: 14 Jul 2021 Review Date: Mon, Oct 16, 2023 url: https://arxiv.org/pdf/2107.03374.pdf Summary of paper Motivation it is the research paper behind Github Copilot tech more recently, language models have also fueled progress towards the longstanding challenge of program synthesis. Contribution we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. limitation difficulty with docstrings describing long chain of operations and with binding operations to variables. Some key terms HumanEval ...

October 16, 2023 · 2 min · 298 words · Sukai Huang

Baptiste Roziere Code Llama Open Foundation Model for Code 2023

[TOC] Title: Code Llama Open Foundation Model for Code Author: Baptiste Roziere et. al. META AI Publish Year: 2023 Review Date: Mon, Oct 16, 2023 url: https://scontent.fmel13-1.fna.fbcdn.net/v/t39.2365-6/369856151_1754812304950972_1159666448927483931_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=Hcg6QsYJx1wAX_okEZO&_nc_ht=scontent.fmel13-1.fna&oh=00_AfAYtfHJfYeomAQWiMUTRo96iP8d4sZrlIfD_KAeYlYaDQ&oe=6531E8CF Summary of paper Motivation CODE Llama, support for large input contexts, and zero-shot instruction following ability for programming tasks. Contribution CODE llama reaches SOTA performance among open models on several code benchmarks Some key terms By training on domain-specific datasets, LLM have proved effective more broadly on applications that require advanced natural language understanding. ...

October 16, 2023 · 2 min · 284 words · Sukai Huang

Haotian Liu Improved Baselines With Visual Instruction Tuning 2023

[TOC] Title: Improved Baselines With Visual Instruction Tuning Author: Haotian Liu et. al. Publish Year: Oct 5 2023 Review Date: Sun, Oct 8, 2023 url: https://arxiv.org/pdf/2310.03744.pdf Summary of paper Motivation we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. Contribution with simple modifications to LLaVA, namely, using CLIP-ViT with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, they establish stronger baseline. Some key terms Improvement one: MLP cross modal connector ...

October 8, 2023 · 2 min · 240 words · Sukai Huang

Christabel Wayllace Goal Recognition Design With Stochastic Agent Action Outcomes 2016

[TOC] Title: Christabel Wayllace Goal Recognition Design With Stochastic Agent Action Outcomes 2016 Author: Christable Wayllace et. al. Publish Year: IJCAI 2016 Review Date: Fri, Oct 6, 2023 url: https://www.ijcai.org/Proceedings/16/Papers/464.pdf Summary of paper Motivation in this paper, they generalize the Goal Recognition Design (GRD) problem to Stochastic GRD (S-GRD) problems, which handle stochastic action outcomes. Some key terms Plan and goal recognition problem it aims to identify the actual plan or goal of an agent given its behaviour. Goal Recognition Design ...

October 6, 2023 · 1 min · 191 words · Sukai Huang

Alba Gragera Pddl Domain Repair Fixing Domains With Incomplete Action Effects 2023

[TOC] Title: PDDL Domain Repair Fixing Domains With Incomplete Action Effects Author: Alba Gragera et. al. Publish Year: ICAPS 2023 Review Date: Wed, Sep 20, 2023 url: https://icaps23.icaps-conference.org/demos/papers/2791_paper.pdf Summary of paper Contribution in this paper, they present a tool to repair planning models where the effects of some actions are incomplete. The received input is compiled to a new extended planning task, in which actions are permitted to insert possible missing effects. The solution is a plan that achieves the goals of the original problem while also alerting users of the modification made. ...

September 20, 2023 · 1 min · 153 words · Sukai Huang

Alba Gragera Exploring the Limitations of Using LLMs to Fix Planning Tasks 2023

[TOC] Title: Exploring the Limitations of Using LLMs to Fix Planning Tasks Author: Alba Gragera et. al. Publish Year: icaps23.icaps-conference Review Date: Wed, Sep 20, 2023 url: https://icaps23.icaps-conference.org/program/workshops/keps/KEPS-23_paper_3645.pdf Summary of paper Motivation In this work, the authors present ongoing efforts on exploring the limitations of LLMs in task requiring reasoning and planning competences: that of assisting humans in the process of fixing planning tasks. Contribution investigate how good LLMs are at repairing planning tasks when the prompt is given in PDDL and when it is given in natural language. also they tested on incomplete initial state and also incomplete domains which lack a necessary action effect to achieve the goals. in all cases, LLMs are used as stand-alone, and they directly assess the correctness of the solutions it generates. conclusion: they demonstrate that although LLMs can in principle facilitate iterative refinement of PDDL models through user interaction, their limited reasoning abilities render them insufficient for identifying meaningful changes to ill-defined planning models that result into solvable planning tasks. ...

September 20, 2023 · 2 min · 403 words · Sukai Huang

Tathagata Chakraborti Plan Explanations as Model Reconciliation 2017

[TOC] Title: Plan Explanations as Model Reconciliation: Moving beyond explanation as soliloquy Author: Tathagata Chakraborti Publish Year: 30 May 2017 Review Date: Tue, Sep 19, 2023 url: https://arxiv.org/pdf/1701.08317.pdf Summary of paper Motivation Past work on plan explanations primarily involved AI system explaining the correctness of its plan and t he rationale for its decision in terms of its own model. Such soliloquy is inadequate (think about the case where GPT4 cannot find errors in PDDL domain file due to over confidence) in this work, the author said that due to the domain and task model difference between human and AI system, the soliloquy is inadequate. Contribution They show how explanation can be seen as a “model reconciliation problem” (MRP), where AI system in effect suggests changes to the human’s model, so as to make its plan be optimal with respected to that changed human model. In other words, they need to update human’s mindset about the domain and task model such that the plan generated from the AI system fits human’s expectation. Some key terms Definition of a classical planning problem ...

September 19, 2023 · 3 min · 630 words · Sukai Huang

Vishal Pallagani Plansformer Tool Demonstrating Generation of Symbolic Plans Using Transformers 2023

[TOC] Title: Plansformer – Tool Demonstrating Generation of Symbolic Plans Using Transformers Author: Vishal Pallagani et. al. Publish Year: IJCAI-23 Review Date: Sat, Sep 16, 2023 url: https://www.ijcai.org/proceedings/2023/0839.pdf Summary of paper Motivation making a bridge between planning in LLM and planning in traditional automatic planner Design of Plansformer in the evaluation phase, planner testing helps to validate the plan (both the syntax validation and plan optimality validation), model testing helps to force a linguistic consistency (in this case it supervise the semantics). Function of this Plansformer The Plansformer operates as an AI planner designed for plan generation, not for creating PDDLs from natural language descriptions. ...

September 16, 2023 · 1 min · 105 words · Sukai Huang

Junnan_li Blip2 Boostrapping Language Image Pretraining 2023

[TOC] Title: BLIP2 - Boostrapping Language Image Pretraining 2023 Author: Junnan Li et. al. Publish Year: 15 Jun 2023 Review Date: Mon, Aug 28, 2023 url: https://arxiv.org/pdf/2301.12597.pdf Summary of paper The paper titled “BLIP-2” proposes a new and efficient pre-training strategy for vision-and-language models. The cost of training such models has been increasingly prohibitive due to the large scale of the models. BLIP-2 aims to address this issue by leveraging off-the-shelf, pre-trained image encoders and large language models (LLMs) that are kept frozen during the pre-training process. ...

August 28, 2023 · 2 min · 327 words · Sukai Huang

Peng_gao Llama Adapter V2 2023

[TOC] Title: Llama Adapter V2 Author: Peng Gao et. al. Publish Year: 28 Apr 2023 Review Date: Mon, Aug 28, 2023 url: https://arxiv.org/pdf/2304.15010.pdf Summary of paper The paper presents LLaMA-Adapter V2, an enhanced version of the original LLaMA-Adapter designed for multi-modal reasoning and instruction following. The paper aims to address the limitations of the original LLaMA-Adapter, which could not generalize well to open-ended visual instructions and lagged behind GPT-4 in performance. ...

August 28, 2023 · 2 min · 246 words · Sukai Huang

Rodrigo Reward Machines Exploiting Reward Function Structure in Rl 2022

[TOC] Title: Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning 2022 Author: Rodrigo Toro Icarte et. al. Publish Year: 2022 AI Access Foundation Review Date: Thu, Aug 17, 2023 url: https://arxiv.org/abs/2010.03950 Summary of paper Motivation in most RL applications, however, users have to program the reward function and hence, there is the opportunity to make the reward function visible and RL agent can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. Contribution different methodology of RL for Reward Machines compared to their previous studies, this work tested a collection of RL methods that can exploit a reward machine’s internal structure to improve sample efficiency Some key terms counterfactual experiences for reward machines (CRM) ...

August 17, 2023 · 2 min · 321 words · Sukai Huang

Rodrigo Using Reward Machines for High Level Task Specification and Decomposition in Rl 2018

[TOC] Title: Reward Machines for High Level Task Specification and Decomposition in Reinforcement Learning Author: Rodrigo Toro Icarte et. al. Publish Year: PMLR 2018 Review Date: Thu, Aug 17, 2023 url: http://proceedings.mlr.press/v80/icarte18a/icarte18a.pdf Summary of paper Motivation proposing a reward machine while exposing reward function structure to the learner and supporting decomposition. Contribution in contrast to hierarchical RL methods which might converge to suboptimal policies. We prove that QRM is guaranteed to converge to an optimal policy in the tabular case. Some key terms intro ...

August 17, 2023 · 2 min · 360 words · Sukai Huang

William_berrios Towards Language Models That Can See 2023

[TOC] Title: Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language Author: William Berrios et. al. Publish Year: 28 Jun 2023 Review Date: Mon, Jul 3, 2023 url: https://arxiv.org/pdf/2306.16410.pdf Summary of paper Contribution proposing LENS, a modular approach that addresses computer vision tasks by harnessing the few-shot, in-context learning abilities of language models through natural language descriptions of visual inputs LENS enables any off-the-shelf LLM to have visual capabilities without auxiliary training or data LENS framework ...

July 3, 2023 · 1 min · 152 words · Sukai Huang

Lionel_wong From Word Models to World Models 2023

[TOC] Title: From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought Author: Lionel Wong et. al. Publish Year: 23 Jun 2023 Review Date: Sun, Jul 2, 2023 url: https://arxiv.org/pdf/2306.12672.pdf Summary of paper Motivation leverage a theory of linguistic meaning to build machines that think in more human-like ways. we frame linguistic meaning as a context-sensitive mapping from NL into a probabilistic language of thought (PLoT) – a general-purpose symbolic substrate for probabilistic, generative world modelling ...

July 2, 2023 · 3 min · 460 words · Sukai Huang

Jianning_wang Boosting Language Models Reasoning With Chain of Knowledge Prompting 2023

[TOC] Title: Boosting Language Models Reasoning With Chain of Knowledge Prompting Author: Jianing Wang et. al. Publish Year: 10 Jun 2023 Review Date: Sun, Jul 2, 2023 url: https://arxiv.org/pdf/2306.06427.pdf Summary of paper Motivation “Chain of Thought (CoT)” aims at designing a simple prompt like “Let’s think step by step” however, the generated rationales often come with mistakes, making unfactual and unfaithful reasoning chain To mitigate this brittleness, we propose a novel Chain-of-Knowlege knowledge evidence in the form of structure triple Contribution Benefiting from CoK, we additional introduce a F^2 -Verification method to estimate the reliable response, the wrong evidence can be indicated to prompt the LLM to rethink. Some key terms ...

July 2, 2023 · 2 min · 264 words · Sukai Huang

Lin_guan Leveraging Pretrained Llm to Construct and Utilise World Models for Model Based Task Planning 2023

[TOC] Title: Leveraging Pretrained Large Language Models to Construct and Utilise World Models for Model Based Task Planning Author: Lin Guan et. al. Publish Year: 24 May 2023 Review Date: Sun, Jun 4, 2023 url: https://arxiv.org/pdf/2305.14909.pdf Summary of paper Motivation However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. Contribution introduce a alternative paradigm that construct an explicit world (domain) model in planning domain definition language (PDDL) and then use it to plan with sound domain-independent planners. users can correct the PDDL before the real planning. Findings GPT-4 can readily correct all the errors according to natural language feedback from PDDL validators and humans. Some key terms approach ...

June 4, 2023 · 3 min · 499 words · Sukai Huang

Dharma_kc Neural Machine Translation for Code Generation 2023

[TOC] Title: Neural Machine Translation for Code Generation Author: Dharma KC et. al. Publish Year: 22 May 2023 Review Date: Sun, May 28, 2023 url: https://arxiv.org/pdf/2305.13504.pdf Summary of paper Motivation Recently, NMT methods have been adapted to the generation of program code. In NMT for code generation, the task is to generate output source code that satisfies constraints expressed in the input. Conclusion NMT-based architecture are getting quite popular for source generation from various input. The NMT-based code generation is useful in multiple domains such as code generation from input binary or assembly (decompilation), code-to-code translation, code repair, bug fixing, and many more. some open problems source code has long dependencies in multiple places next-token prediction technique may lost the dependency information Methods that can break down a problem into small problems, generate code for such subprograms, and evaluate them are good potential research direction sample efficiency Current code generation does not combine code abstraction to higher-level abstractions as human do. Execution-guided synthesis currently works with DSLs, but extending them to real-world source code generation is a research direction. Retrieve-and-Edit framework

May 28, 2023 · 1 min · 181 words · Sukai Huang

Jiannan_xiang Language Models Meet World Models 2023

[TOC] Title: Language Models Meet World Models: Embodied Experiences Enhance Language Models Author: Jiannan Xiang et. al. Publish Year: 22 May 2023 Review Date: Fri, May 26, 2023 url: https://arxiv.org/pdf/2305.10626v2.pdf Summary of paper Motivation LLM often struggle with simple reasoning and planning in physical environment the limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. Contribution we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities. the experiments in a virtual physical world simulation environment will be used to finetune LMs to teach diverse abilities of reasoning and acting in the physical world, e.g., planning and completing goals, object permanence and tracking etc. to preserve the generalisation ability of LM models, we use elastic weight consolidation (EWC) for selective weight updates, combined with low-rank adapters (LoRA) for training efficiency. Some key terms ...

May 26, 2023 · 2 min · 357 words · Sukai Huang

Ryan_yang PG3 Policy Guided Planning for Generalised Policy Generation 2022

[TOC] Title: PG3 Policy Guided Planning for Generalised Policy Generation Author: Ryan Yang et. al. Publish Year: 21 Apr 2022 Review Date: Wed, May 24, 2023 url: https://arxiv.org/pdf/2204.10420.pdf Summary of paper Motivation a longstanding objective in classical planning is to synthesise policies that generalise across multiple problems from the same domain this work, we study generalised policy search-based methods with a focus on the score function used to guide the search over policies Contribution we study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Some key terms what is generalised planning and generalised policy search (GPS) ...

May 24, 2023 · 2 min · 304 words · Sukai Huang

Shunyu_yao Tree of Thoughts 2023

[TOC] Title: Tree of Thoughts: Deliberate Problem Solving with LLM Author: Shunyu Yao et. al. Publish Year: 17 May 2023 Review Date: Wed, May 24, 2023 url: https://arxiv.org/pdf/2305.10601.pdf Summary of paper Motivation might benefit from augmentation by a more deliberate “System 2” planning process that (1) maintains and explores diverse alternatives for current choices instead of just picking one, and (2) evaluates its current status and actively looks ahead or backtracks to make more global decisions. search through a combinatorial problem space, represented as a tree. We thus propose the Tree of Thoughts (ToT) framework for general problem solving with language models. Contribution limitation ...

May 24, 2023 · 1 min · 104 words · Sukai Huang