[TOC]

  1. Title: Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
  2. Author: Thomas Carta el. al.
  3. Publish Year: 6 Sep 2023
  4. Review Date: Tue, Apr 23, 2024
  5. url: arXiv:2302.02662v3

Summary of paper

image-20240423132057110

Summary

The author considered an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online reinforcement learning to improve its performance to solve goals (under the RL paradigm environment (MDP))

The author studied several questions

  1. Sample efficiency How fast can an LLM adapt and learn to solve various spatial and navigation problems specified in natural language? How does the use of pre-trained knowledge from LLM boosts sample efficiency?
  2. Generalization to new objects: Once functionally grounded, how can an LLM generalize to various kinds of changes about objects, yet staying in trained tasks?
  3. Generalization to new tasks: How can such an interactively trained LLM perform zero-shot generalization to new tasks? How does generalization depend on the kind of new tasks?
  4. Impact of online interventions: What is the empirical impact of grounding using online RL with incremental interactions in comparison with offline Behavioral Cloning from a dataset of expert trajectories?

env: https://minigrid.farama.org/environments/minigrid/CrossingEnv/

Sample efficiency

image-20240423133024320

Generalisation

image-20240423133204713

image-20240423134101594

image-20240423134316782

image-20240423134458929