Wenlong_huang Language Models as Zero Shot Planners Extracting Actionable Knowledge for Embodied Agents 2022

[TOC]

Title: Language Models as Zero Shot Planners: Extracting Actionable Knowledge for Embodied Agents
Author: Wenlong Huang et. al.
Publish Year: Mar 2022
Review Date: Mon, Sep 19, 2022

Summary of paper

Large language models are learning general commonsense world knowledge.
so this paper, the author investigate the possibility of grounding high-level tasks, expressed as natural language (e.g., “make breakfast”) to a chosen set of action steps (“open fridge”).

they found out that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into mid-level plans without any further training.
they proposed several tools to improve executability of the model generation without invasive probing or modifications to the model.

What is prompt learning

they will have 2 steps

Many reasons cause the failure of mapping from free-form language to unambiguous actionable steps
- the output does not follow pre-defined mapping of any atomic actions
  - e.g., “I first walk to the bedroom” is not of the format “walk to <PLACE>”
- the output may refer to atomic action and objects using words unrecognisable by the environment
  - e.g., “microwave the chocolate milk” where “microwave” and “chocolate milk” cannot be mapped to precise action and objects.
- the output contains lexically ambiguous words
  - e.g., open TV vs switch on TV
SOLUTION: cosine similarity of the language embeddings of the action phrases …