Rodrigo Reward Machines Exploiting Reward Function Structure in Rl 2022

[TOC] Title: Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning 2022 Author: Rodrigo Toro Icarte et. al. Publish Year: 2022 AI Access Foundation Review Date: Thu, Aug 17, 2023 url: https://arxiv.org/abs/2010.03950 Summary of paper Motivation in most RL applications, however, users have to program the reward function and hence, there is the opportunity to make the reward function visible and RL agent can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner....

<span title='2023-08-17 16:32:09 +1000 AEST'>August 17, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;321 words&nbsp;·&nbsp;Sukai Huang

Rodrigo Using Reward Machines for High Level Task Specification and Decomposition in Rl 2018

[TOC] Title: Reward Machines for High Level Task Specification and Decomposition in Reinforcement Learning Author: Rodrigo Toro Icarte et. al. Publish Year: PMLR 2018 Review Date: Thu, Aug 17, 2023 url: http://proceedings.mlr.press/v80/icarte18a/icarte18a.pdf Summary of paper Motivation proposing a reward machine while exposing reward function structure to the learner and supporting decomposition. Contribution in contrast to hierarchical RL methods which might converge to suboptimal policies. We prove that QRM is guaranteed to converge to an optimal policy in the tabular case....

<span title='2023-08-17 11:13:24 +1000 AEST'>August 17, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;360 words&nbsp;·&nbsp;Sukai Huang