[TOC]

  1. Title: Language Reward Modulation for Pretraining Reinforcement Learning
  2. Author: Ademi Adeniji et. al.
  3. Publish Year: ICLR 2023 reject
  4. Review Date: Thu, May 9, 2024
  5. url: https://openreview.net/forum?id=SWRFC2EupO

Summary of paper

image-20240509213653815

Motivation

Learned reward function (LRF) are notorious for noise and reward misspecification errors

Generalization ability issue of multi-modal vision and language model (VLM)

Contribution

Results

image-20240510003713871

Summary

Regrettably, the author did not provide compelling evidence to support the claim that the VLM reward signal is ineffective for training RL agents. Reviewers have requested further explanation on why VLM reward signals may fall short in training complex RL tasks.

Critics often counter with remarks such as, “If this is indeed a problem, why then do many studies successfully employ visual-text alignment based reward models?” So, it is very challenging to convince people that certain widely accepted methods may be flawed or inadequate. It’s a tough journey.

Additionally, the pretraining evaluation section in the paper does not substantiate the author’s assertion that the VLM reward signal is noisy and therefore unsuitable for training RL tasks. From the presented pretraining performance, it is not apparent whether the VLM pretraining results are suboptimal.

Potential future work

it is equivalent to use Language based signals and then abandon it.