[TOC]
- Title: Language Models Meet World Models: Embodied Experiences Enhance Language Models
- Author: Jiannan Xiang et. al.
- Publish Year: 22 May 2023
- Review Date: Fri, May 26, 2023
- url: https://arxiv.org/pdf/2305.10626v2.pdf
Summary of paper
Motivation
- LLM often struggle with simple reasoning and planning in physical environment
- the limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills.
Contribution
- we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities.
- the experiments in a virtual physical world simulation environment will be used to finetune LMs to teach diverse abilities of reasoning and acting in the physical world, e.g., planning and completing goals, object permanence and tracking etc.
- to preserve the generalisation ability of LM models, we use elastic weight consolidation (EWC) for selective weight updates, combined with low-rank adapters (LoRA) for training efficiency.
Some key terms
- limitation of current ChatGPT
- fail to track the world state. Consequently, they lack robust and comprehensive embodied knowledge necessary for reasoning and planning associated with physical environments
definition of world model
- world models areembodies simulators that emulate physical interactions in real world environments.
two ways to collect embodied experience
- goal-oriented planning and random exploration
- Specifically, goal-oriented planning aims to gather experiences associated with planning and goal-oriented agent behaviors, while random exploration focuses on accumulating experiences that involve object and world state tracking.
- In goal-oriented planning, the process will be stored as an embodied experiences.
after gathering the embodied experiments
- we will use them to construct a set of fine-tuning tasks (e.g., plan generation, activity recognition, and tracking)
definition of EWC
- check https://arxiv.org/pdf/1612.00796.pdf
- We show that EWC is substantially more effective than the popular KL regularization
Low-Rank Adaptation (LoRA)
Conclusion & Limitations
- the present work is limited to a household environment as a single world model. In the future, we intend to study how to integrate embodied experiences from different work models and generalise knowledge learned from one world model to different domain.
Potential future work
- this paper present a good way of finetuning LLMs to fit to planning problems.