Ryan_yang PG3 Policy Guided Planning for Generalised Policy Generation 2022

[TOC]

Summary of paper

a longstanding objective in classical planning is to synthesise policies that generalise across multiple problems from the same domain
this work, we study generalised policy search-based methods with a focus on the score function used to guide the search over policies

we study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists.

what is generalised planning and generalised policy search (GPS)

GPS is a flexible paradigm for generalised planning. In this family of methods, a search is performed through a class of generalised (goal-conditioned) policies, with the search informed by a score function that maps candidates policies to scalar values.
there has been relatively less work on the score function. the score function plays an important role: if the score are uninformative or misleading, the search will languish in less promising regions of policy space.

Problem setting in generalised planning

Given:
- PDDL domain
- Training problems
- PLanner
Goal: Learn a goal-conditioned policy that generalises to all test problems in domain.

How does PG3 work

search through the space of candidate policies
candidate policies representation is a lifted decision list which consists of an ordered list of rules

Evaluation process

policy evaluation

executes the policy on the training tasks and records the number of success
extremely sparse, effectively forcing an exhaustive search until reaching a region of non-zero scores.

plan comparison

plans on the training tasks and records the agreement between the found plans and the candidate policy.
(in practice the score is the priority score, lower is better)