[TOC]
- Title: SayPlan: Grounding Large Language Models using 3D Scene for for Scalable Task Planning
- Author: Krishan Rana
- Publish Year: CoRL 2023
- Review Date: Sun, Jan 28, 2024
- url: https://arxiv.org/abs/2307.06135
Summary of paper
Motivation
- this is a pipeline introduction paper
Contribution
- Hierarchical Exploration: SayPlan leverages the hierarchical structure of 3DSGs to enable LLMs to conduct semantic searches for task-relevant subgraphs from a condensed representation of the full graph.
- Path Planning Integration: It integrates a classical path planner to reduce the planning horizon for the LLM, thus improving efficiency.
- Iterative Replanning Pipeline: An iterative replanning pipeline refines initial plans by incorporating feedback from a scene graph simulator, correcting infeasible actions and preventing planning failures.
Some key terms
Semantic search stage
The Semantic Search stage in SayPlan addresses the challenges of planning over 3D scene graphs (3DSGs) using Large Language Models (LLMs) by considering two key observations:
- Practical Representation Limitations: Due to token limits and the potential infinite growth of a large-scale environment’s 3DSG, it’s impractical to pass the full representation to an LLM.
- Task-Specific Subset Identification: Only a subset of the full 3DSG, denoted as G’, is necessary to solve a given task, as irrelevant details can be disregarded.
To identify the task-specific subgraph G’ from the full 3DSG, SayPlan leverages:
- Semantic Hierarchy: SayPlan exploits the semantic hierarchy within 3DSGs.
- Reasoning Capabilities of LLMs: The LLM is guided to manipulate the collapsed graph via expand and contract API calls, reducing the token representation by approximately 80%.
The process involves:
- Manipulation of Collapsed Graph: The LLM manipulates the collapsed graph through expand and contract API calls based on the given task instruction I, utilizing in-context learning and chain-of-thought prompting.
- Scene Graph Simulator Interaction: API calls and node manipulations are executed within the scene graph simulator.
- Maintenance of Task-Specific Subgraph: If expanded nodes contain irrelevant entities, the LLM contracts them to manage token limitations and maintain the task-specific subgraph.
- Memory Input: A list of previously expanded nodes is maintained and passed as additional memory input to facilitate decision-making.
- Autonomous Planning Phase: The LLM proceeds to the planning phase once all necessary assets and objects are identified in the current subgraph G'.
An example of the LLM-scene graph interaction during Semantic Search is provided, demonstrating the systematic approach adopted by SayPlan to efficiently identify task-specific subsets within large-scale 3DSGs for effective planning.