[TOC]

  1. Title: Guiding Pretraining in Reinforcement Learning With Large Language Models
  2. Author: Yuqing De, Jacob Andreas et. al.
  3. Publish Year: 13 Feb 2023
  4. Review Date: Wed, Apr 5, 2023
  5. url: https://arxiv.org/pdf/2302.06692.pdf

Summary of paper

image-20230405100408131

Motivation

  • intrinstically motivated exploration methods address sparse reward problem by rewarding agents for visiting novel states or transitions.

Contribution

  • we describe a method that uses background knowledge from text corpora to shape exploration.
  • This method, call Exploring with LLMs, reward an agent for achieving goals suggested by a language model prompted with a description of agent’s current state.

Some key terms

How does ELLM work

  • ELLM uses a pretrained large language model to suggest plausibly useful goals in a task-agnostic way.
    • task-agnostic -> kind of exploration
  • ELLM prompts an LLM to suggest possible goals given an agent’s current context and reward agents for accomplishing those suggestions.
  • as a result, exploration is biased toward completion of goals that are diverse, context-sensitive and human-meaningful.
  • ELLM-trained agents exhibit better coverage of useful behaviours during pretraining, and outperform or match baselines when fine-tuned on downstream tasks

why intrinsically motivated exploration reward is not good

  • not everything novel or unpredictable is useful: noisy TVs and the movements of leaves on a tree may provide an infinite amount of novelty, but do not lead to meaningful behaviours.
  • more recent approaches compute novelty in higher-level representation space such as language, but can continue driving the agent to explore behaviours that are highly unlikely to every be useful

Method

image-20230405111755917

  • $C_{state}(o_t)$, $C$ is the state captioner

image-20230405111908692

  • $\Delta$ is the cosine similarity function.
  • we reward the agent for achieving any of the $k$ suggested goals by taking the maximum of the goal-specific rewards

Potential future work

  • we may want to reuse the state captioner for our future work
  • check Appendix H of the paper
  • image-20230405112717197