[TOC]

  1. Title: Policy Optimization in Noisy Neighborhood
  2. Author: Nate Rahn et. al.
  3. Publish Year: NeruIPS 2023
  4. Review Date: Fri, May 10, 2024
  5. url: https://arxiv.org/abs/2309.14597

Summary of paper

image-20240510141939305

Contribution

Some key terms

Evidence that noisy reward signal leads to substantial variance in performance

It is well-documented that agents trained with deep reinforcement learning can exhibit substantial variations in performance – as measured by their episodic return. The problem is particularly acute in continuous control, where these variations make it difficult to compare the end product of different algorithms or implementations of the same algorithm [ 11 , 20 ] or even reliably measure an agent’s progress from episode to episode [9].

[9] Stephanie CY Chan, Samuel Fishman, Anoop Korattikara, John Canny, and Sergio Guadarrama. Measuring the Reliability of Reinforcement Learning Algorithms. In International Conference on Learning Representations, 2019.

[11] Cédric Colas, Olivier Sigaud, and Pierre-Yves Oudeyer. A hitchhiker’s guide to statistical comparisons of reinforcement learning algorithms. In Reproducibility in Machine Learning, ICLR 2019 Workshop, New Orleans, Louisiana, United States, May 6, 2019. OpenReview.net, 2019.

[12] Felix Draxler, Kambis Veschgini, Manfred Salmhofer, and Fred Hamprecht. Essentially no barriers in neural network energy landscape. In International conference on machine learning, pages 1309–1318. PMLR, 2018

Results

Observation on the performance

Unstability

interpolation policy in the same run

Summary

This work focus on how to stabilize the policy training.

Sadly it did not consider the case where the reward signal is noisy