[TOC]
- Title: PG3 Policy Guided Planning for Generalised Policy Generation
- Author: Ryan Yang et. al.
- Publish Year: 21 Apr 2022
- Review Date: Wed, May 24, 2023
- url: https://arxiv.org/pdf/2204.10420.pdf
Summary of paper
Motivation
- a longstanding objective in classical planning is to synthesise policies that generalise across multiple problems from the same domain
- this work, we study generalised policy search-based methods with a focus on the score function used to guide the search over policies
Contribution
- we study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists.
Some key terms
what is generalised planning and generalised policy search (GPS)
- GPS is a flexible paradigm for generalised planning. In this family of methods, a search is performed through a class of generalised (goal-conditioned) policies, with the search informed by a score function that maps candidates policies to scalar values.
- there has been relatively less work on the score function. the score function plays an important role: if the score are uninformative or misleading, the search will languish in less promising regions of policy space.
Problem setting in generalised planning
- Given:
- PDDL domain
- Training problems
- PLanner
- Goal: Learn a goal-conditioned policy that generalises to all test problems in domain.
How does PG3 work
- search through the space of candidate policies
- candidate policies representation is a lifted decision list which consists of an ordered list of rules
Evaluation process
- calculate how many problems can a given candidate policy is able to solve
policy evaluation
- executes the policy on the training tasks and records the number of success
- extremely sparse, effectively forcing an exhaustive search until reaching a region of non-zero scores.
plan comparison
- plans on the training tasks and records the agreement between the found plans and the candidate policy.
- (in practice the score is the priority score, lower is better)