[TOC]

  1. Title: Language Models as Zero Shot Planners: Extracting Actionable Knowledge for Embodied Agents
  2. Author: Wenlong Huang et. al.
  3. Publish Year: Mar 2022
  4. Review Date: Mon, Sep 19, 2022

Summary of paper

Motivation

  • Large language models are learning general commonsense world knowledge.
  • so this paper, the author investigate the possibility of grounding high-level tasks, expressed as natural language (e.g., “make breakfast”) to a chosen set of action steps (“open fridge”).

Contribution

  • they found out that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into mid-level plans without any further training.
  • they proposed several tools to improve executability of the model generation without invasive probing or modifications to the model.

Some key terms

What is prompt learning

image-20220919224254019

image-20220919224336743

image-20220919224428520

Methodology

they will have 2 steps

  1. in prompt learning way, convert the high level tasks into mid-level plans
  2. convert the mid-level plans into admissible actions
  3. loop

image-20220919225600471

image-20220920000323201

admissible action planning by semantic translation

  • Many reasons cause the failure of mapping from free-form language to unambiguous actionable steps
    • the output does not follow pre-defined mapping of any atomic actions
      • e.g., “I first walk to the bedroom” is not of the format “walk to <PLACE>”
    • the output may refer to atomic action and objects using words unrecognisable by the environment
      • e.g., “microwave the chocolate milk” where “microwave” and “chocolate milk” cannot be mapped to precise action and objects.
    • the output contains lexically ambiguous words
      • e.g., open TV vs switch on TV
  • SOLUTION: cosine similarity of the language embeddings of the action phrases …