Jiannan_xiang Language Models Meet World Models 2023

[TOC]

Title: Language Models Meet World Models: Embodied Experiences Enhance Language Models
Author: Jiannan Xiang et. al.
Publish Year: 22 May 2023
Review Date: Fri, May 26, 2023
url: https://arxiv.org/pdf/2305.10626v2.pdf

Summary of paper

LLM often struggle with simple reasoning and planning in physical environment
the limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills.

we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities.
the experiments in a virtual physical world simulation environment will be used to finetune LMs to teach diverse abilities of reasoning and acting in the physical world, e.g., planning and completing goals, object permanence and tracking etc.
to preserve the generalisation ability of LM models, we use elastic weight consolidation (EWC) for selective weight updates, combined with low-rank adapters (LoRA) for training efficiency.

limitation of current ChatGPT
- fail to track the world state. Consequently, they lack robust and comprehensive embodied knowledge necessary for reasoning and planning associated with physical environments

definition of world model

world models areembodies simulators that emulate physical interactions in real world environments.

two ways to collect embodied experience

goal-oriented planning and random exploration
Specifically, goal-oriented planning aims to gather experiences associated with planning and goal-oriented agent behaviors, while random exploration focuses on accumulating experiences that involve object and world state tracking.
In goal-oriented planning, the process will be stored as an embodied experiences.

after gathering the embodied experiments

we will use them to construct a set of fine-tuning tasks (e.g., plan generation, activity recognition, and tracking)

definition of EWC

check https://arxiv.org/pdf/1612.00796.pdf
We show that EWC is substantially more effective than the popular KL regularization

Low-Rank Adaptation (LoRA)

the present work is limited to a household environment as a single world model. In the future, we intend to study how to integrate embodied experiences from different work models and generalise knowledge learned from one world model to different domain.