[TOC]

  1. Title: SayPlan: Grounding Large Language Models using 3D Scene for for Scalable Task Planning
  2. Author: Krishan Rana
  3. Publish Year: CoRL 2023
  4. Review Date: Sun, Jan 28, 2024
  5. url: https://arxiv.org/abs/2307.06135

Summary of paper

Motivation

Contribution

Some key terms

image-20240128220929621

Semantic search stage

The Semantic Search stage in SayPlan addresses the challenges of planning over 3D scene graphs (3DSGs) using Large Language Models (LLMs) by considering two key observations:

  1. Practical Representation Limitations: Due to token limits and the potential infinite growth of a large-scale environment’s 3DSG, it’s impractical to pass the full representation to an LLM.
  2. Task-Specific Subset Identification: Only a subset of the full 3DSG, denoted as G’, is necessary to solve a given task, as irrelevant details can be disregarded.

To identify the task-specific subgraph G’ from the full 3DSG, SayPlan leverages:

The process involves:

An example of the LLM-scene graph interaction during Semantic Search is provided, demonstrating the systematic approach adopted by SayPlan to efficiently identify task-specific subsets within large-scale 3DSGs for effective planning.