Tom_everitt Reinforcement Learning With a Corrupted Reward Channel 2017
[TOC] Title: Reinforcement Learning With a Corrupted Reward Channel Author: Tom Everitt Publish Year: August 22, 2017 Review Date: Mon, Dec 26, 2022 Summary of paper Motivation we formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards Contribution two ways around the problem are investigated....