Publications#
-
preprint\textsc{Pacmand}: A Robust PDDL Action Constructor for Modelling Ambiguous Descriptions using LLMSukai Huang, Nir Lipovetzky and Trevor CohnarXiv ePrint 2023
Despite the strides in the hybrid planning systems that capitalize on Large Language Models (LLMs)' capabilities to translate natural language directives directly into Planning Domain Definition Language (PDDL) problem files, a significant gap remains -- the dependency on domain experts to supply the PDDL domain models, a requirement that restricts the scalability and accessibility of the hybrid planning solution. In this work, we tackle this challenge by focusing on the crucial aspect of action modeling within PDDL domain modeling. We introduce \textsc{Pacmand} (\emph{\textbf{P}DDL \textbf{A}ction \textbf{C}onstructor for \textbf{M}odelling \textbf{A}mbiguous \textbf{N}atural language \textbf{D}escriptions}), a pipeline designed to autonomously translate ambiguous natural language descriptions from non-expert users into formal PDDL action definitions using LLMs. By integrating task-agnostic Chain of Thought (CoT) prompting with Conformal Prediction (CP) techniques, \textsc{Pacmand} effectively manages the inherent ambiguity present in non-expert descriptions, ensuring the generation of accurate PDDL action definitions. [\emph{TODO: ADD PERFORMANCE INCREASE STATS COMPARED TO THE BASELINE}] This work contributes to the vision of creating more inclusive and intuitive interfaces for planning systems, marking a significant step towards the broader integration of LLMs into automated planning and scheduling.
-
preprintA Reminder of its Brittleness: Language Reward Shaping May Hinder Learning for Instruction Following AgentsSukai Huang, Nir Lipovetzky and Trevor CohnarXiv ePrint 2023
Teaching agents to follow complex written instructions has been an important yet elusive goal. One technique for improving learning efficiency is language reward shaping (LRS), which is used in reinforcement learning (RL) to reward actions that represent progress towards a sparse reward. We argue that the apparent success of LRS is brittle, and prior positive findings can be attributed to weak RL baselines. Specifically, we identified suboptimal LRS designs that reward partially matched trajectories, and we characterised a novel type of reward perturbation that addresses this issue based on the concept of loosening task constraints. We provided theoretical and empirical evidence that agents trained using LRS rewards converge more slowly compared to pure RL agents.
-
honours thesisAngry Birds Level Generation Using Walkthrough DescriptionsSukai HuangFor the degree of Bachelor of Advanced Computing (Honours) at The Australian National University
Angry Birds is a famous environment for agents to learn physical reasoning. How- ever, the deep reinforcement learning agents often underperform due to a lack of training set of game levels. To address the issue, procedural level generation is used to synthesise new Angry Birds game levels. However, the current rule-based Angry Birds procedural level generator is incapable of generating game levels that aid agents in learning physical reasoning, as it cannot guarantee the level of physical reasoning required in order to solve the generated game levels. Hence, in a new approach, we use walkthrough descriptions to generate Angry Birds game levels and train the Generative Adversarial Networks (GANs) based pro- cedural level generator by imitating the high-quality handcrafted levels. Unlike the conventional imitation approach, the proposed one is able to control the style of the generated game levels and also enhance the diversity of the game level dataset via manipulating the input walkthrough descriptions. Both qualitative and quantitative evaluations are conducted to demonstrate that the generated game levels using this method demand high level of physical reasoning to solve, just like the handcrafted game levels. Besides that, we developed a new Angry Birds walkthrough dataset called AbVat. It is a valuable dataset capable of facilitating a variety of meaningful research tasks in the domain of spatial-temporal understanding and reasoning.