[TOC]

  1. Title: SayPlan: Grounding Large Language Models using 3D Scene for for Scalable Task Planning
  2. Author: Krishan Rana
  3. Publish Year: CoRL 2023
  4. Review Date: Sun, Jan 28, 2024
  5. url: https://arxiv.org/abs/2307.06135

Summary of paper

Motivation

  • this is a pipeline introduction paper

Contribution

  • Hierarchical Exploration: SayPlan leverages the hierarchical structure of 3DSGs to enable LLMs to conduct semantic searches for task-relevant subgraphs from a condensed representation of the full graph.
  • Path Planning Integration: It integrates a classical path planner to reduce the planning horizon for the LLM, thus improving efficiency.
  • Iterative Replanning Pipeline: An iterative replanning pipeline refines initial plans by incorporating feedback from a scene graph simulator, correcting infeasible actions and preventing planning failures.

Some key terms

image-20240128220929621

Semantic search stage

The Semantic Search stage in SayPlan addresses the challenges of planning over 3D scene graphs (3DSGs) using Large Language Models (LLMs) by considering two key observations:

  1. Practical Representation Limitations: Due to token limits and the potential infinite growth of a large-scale environment’s 3DSG, it’s impractical to pass the full representation to an LLM.
  2. Task-Specific Subset Identification: Only a subset of the full 3DSG, denoted as G’, is necessary to solve a given task, as irrelevant details can be disregarded.

To identify the task-specific subgraph G’ from the full 3DSG, SayPlan leverages:

  • Semantic Hierarchy: SayPlan exploits the semantic hierarchy within 3DSGs.
  • Reasoning Capabilities of LLMs: The LLM is guided to manipulate the collapsed graph via expand and contract API calls, reducing the token representation by approximately 80%.

The process involves:

  • Manipulation of Collapsed Graph: The LLM manipulates the collapsed graph through expand and contract API calls based on the given task instruction I, utilizing in-context learning and chain-of-thought prompting.
  • Scene Graph Simulator Interaction: API calls and node manipulations are executed within the scene graph simulator.
  • Maintenance of Task-Specific Subgraph: If expanded nodes contain irrelevant entities, the LLM contracts them to manage token limitations and maintain the task-specific subgraph.
  • Memory Input: A list of previously expanded nodes is maintained and passed as additional memory input to facilitate decision-making.
  • Autonomous Planning Phase: The LLM proceeds to the planning phase once all necessary assets and objects are identified in the current subgraph G'.

An example of the LLM-scene graph interaction during Semantic Search is provided, demonstrating the systematic approach adopted by SayPlan to efficiently identify task-specific subsets within large-scale 3DSGs for effective planning.