[TOC]

  1. Title: Zhiting Hu Language Agent and World Models 2023
  2. Author:
  3. Publish Year:
  4. Review Date: Mon, Jan 22, 2024
  5. url: arXiv:2312.05230v1

Summary of paper

image-20240122201639803

Motivation

Some key terms

Limitation of Language

Failure case

image-20240122201312942

System-II reasoning – construct a mental model of the world

Two levels of agent model

There are two levels of agent models:

  1. Level-0 Agent Models: These models represent how an embodied agent optimizes actions to maximize accumulated rewards based on belief and the physical constraints defined in its world model. They are used in embodied tasks, such as a robot searching for a cup.
  2. Level-1 Agent Models: These models are used in social reasoning tasks and involve reasoning about the behaviors of other agents. They encompass Theory of Mind, which means forming mental models of other agents and conducting causal reasoning to interpret their behaviors based on their mental states like goals and beliefs.

LAW framework structure

The paper reviews recent works relevant to the LAW framework, highlighting several approaches:

  1. LMs as Both World and Agent Models (Reasoning-via-Planning, or RAP): LMs are repurposed to serve as world models by predicting future states in reasoning and as agent models by generating actions. This approach allows for reasoning traces that consist of interleaved states and reasoning steps, improving inference coherence. RAP incorporates Monte Carlo Tree Search (MCTS) for strategic exploration in reasoning.
  2. Probabilistic Programs: Probabilistic programs are used to construct world and agent models for physical and social reasoning. LMs are employed to translate natural language descriptions into probabilistic programs, serving as an interface between language and thought.
  3. LMs as the Planner in Agent Models: LMs are used to generate plans based on prompts specifying the state, task, and memory. Interactive planning paradigms provide feedback and reflection on past actions to adjust future plans. LMs can also simulate social behaviors in abstract environments, enhancing social reasoning.
  4. LMs as the Goal/Reward in Agent Models: LMs are considered for generating goals or rewards in agent models. They can translate language descriptions of intended tasks into goal and reward specifications, simplifying the process.
  5. LMs as the Belief in Agent Models: Although less explored, there is potential for using LMs to explicitly model belief representations in agent models, similar to their role as planners, goals, or rewards.

Results

However, the authors acknowledge certain limitations of the LAW framework:

  1. Symbolic Representations: The language model backend relies on symbolic representations in a discrete space. While there’s potential to augment this space with continuous latent spaces from other modalities, it remains unclear whether a single continuous latent space can achieve similar capacity as symbolic representations.
  2. Incomplete Modeling: The current world and agent modeling may not capture all knowledge about the world and agents. For example, it assumes that agent behaviors are primarily driven by goals or rewards, overlooking other potential factors like social norms.
  3. Transformer Architecture Limits: The paper does not delve into the inherent limits of Transformer architectures, which are foundational to many language models. Further research into understanding the learning mechanisms of Transformers may complement the development of machine reasoning.

Overall, while the LAW framework presents a promising direction for advancing machine reasoning, it is essential to address these limitations and continue exploring ways to enhance its capabilities.

Summary

This is a discussion paper