Tom_silver Generalised Planning in PDDL Domains With Pretrained Large Language Models 2023

[TOC]

Title: Generalised Planning in Pddl Domains With Pretrained Large Language Models
Author: Tom Silver et. al.
Publish Year: 18 May 2023
Review Date: Tue, May 23, 2023
url: https://arxiv.org/pdf/2305.11014.pdf

Summary of paper

in particular, we consider PDDL domains and use GPT-4 to synthesize Python programs,
we also consider Chain of Thought (CoT) summarisation, where the LLM is prompted to summarize the domain and propose a strategy in words before synthesizing the program
we consider automated debugging, where the program is validated with respect to the training tasks, and in case of errors, the LLM is re-prompted with four types of feedback.

we find that GPT4 is a surprisingly powerful generalised planner.
we also conclude that automated debugging is very important, that CoT summarisation has non-uniform impact, that GPT4 is far superior to GPT3.5, and that just two training tasks are often sufficient for strong generalisation.

the problem

It remains challenging to efficiently synthesize programs from few training tasks that generalise to a wide variety of held-out tasks.

process

we prompt GPT-4 to write a natural language summary of the PDDL domain
We then ask it to describe a solution strategy before finally implementing the strategy in Python
We automatically provide feedback to GPT-4 in the case where it fails to solve training tasks

definition of generalised planning

background on PDDL

A PDDL domain is characterized by a name, a set of types, a set of predicates and a set of operators.
PDDL problem is characterised by a domain, a set of objects an initial state and a goal.

The objective of the work

at evaluation time, a set of held-out evaluation tasks – typically involving many more objects – are used to measure performance. The objective is to use the training tasks to synthesise a program that will produce valid plans for all the evaluation tasks.

prompting method

we hypothesized that decomposing generalised planning into three stages – domain summarisation, strategy proposal and strategy implementation – would improve performance

Can GPT-4 be used for generalised (PDDL) planning?
Are the synthesized programs efficient?
Does CoT summarization help?
Does automated debugging help?
To what extent does GPT-4 rely on names in the PDDL
How does GPT-4 compare to GPT3.5
Do each of the four error types help? (Python exception, Timeout, Plan Syntax, Plan Semantics)

experiment environment

Ablation

Strange thing about the paper

I am a bit confused as the paper is asking the LLM to synthesise python program to generate a plan rather than asking a PDDL planner to generate a plan.
Isn’t it true that a PDDL planner can directly generate a valid plan based on the PDDL domain and problem files. Why do we need a LLM to synthesise a python program to generate plan?
Does it means that it still requires the LLM model to perform functional reasoning ability?

limitation and direction of future work

the author did not know why CoT hamper the plan generation
A major limitation of this work and previous work on generalized planning is that it is easy enough to hand-design generalized plans for all of the domains considered.
- In some cases, it may be considerably easier to specify PDDL domain and problem descriptions than it is to directly specify a generalized plan.