Jacob_andreas Guiding Pretraining in Reinforcement Learning With Llms 2023

[TOC]

Summary of paper

intrinstically motivated exploration methods address sparse reward problem by rewarding agents for visiting novel states or transitions.

we describe a method that uses background knowledge from text corpora to shape exploration.
This method, call Exploring with LLMs, reward an agent for achieving goals suggested by a language model prompted with a description of agent’s current state.

How does ELLM work

ELLM uses a pretrained large language model to suggest plausibly useful goals in a task-agnostic way.
- task-agnostic -> kind of exploration
ELLM prompts an LLM to suggest possible goals given an agent’s current context and reward agents for accomplishing those suggestions.
as a result, exploration is biased toward completion of goals that are diverse, context-sensitive and human-meaningful.
ELLM-trained agents exhibit better coverage of useful behaviours during pretraining, and outperform or match baselines when fine-tuned on downstream tasks

why intrinsically motivated exploration reward is not good

not everything novel or unpredictable is useful: noisy TVs and the movements of leaves on a tree may provide an infinite amount of novelty, but do not lead to meaningful behaviours.
more recent approaches compute novelty in higher-level representation space such as language, but can continue driving the agent to explore behaviours that are highly unlikely to every be useful

$\Delta$ is the cosine similarity function.
we reward the agent for achieving any of the $k$ suggested goals by taking the maximum of the goal-specific rewards