Wenshuai_zhao Towards Closing the Sim to Real Gap in Collaborative Multi Robot Deep Reinforcement Learning 2020

[TOC]

Title: Towards Closing the Sim to Real Gap in Collaborative Multi Robot Deep Reinforcement Learning
Author: Wenshuai Zhao et. al.
Publish Year: 2020
Review Date: Sun, Dec 25, 2022

Summary of paper

we introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning
we discuss on how both the different types of perturbations and how the number of agents experiencing those perturbations affect the collaborative learning effort

This is, to the best of our knowledge, the first work exploring the limitation of PPO in multi-robot systems when considering that different robots might be exposed to different environment where their sensors or actuators have induced errors
with the conclusion of this work, we set the initial point for future work on designing and developing methods to achieve robust reinforcement learning on the presence of real-world perturbation that might differ within a multi-robot system.

Deep RL

DRL algorithms work on a trial and error basis, where an agent interacts with its environment and receives a reward based on the performance.
there are DRL approaches that rely on multiple agents to parallelise the learning process or explore a wider variety of experiences.

PPO

conclusion

for a fixed small magnitude in the perturbation, the agents still converge on a policy that works for both subsets (the original and the perturbed)
Among disturbances in the model’s input (sensing, reward) and output (actuation, action) -> the disturbances in the ability of the robots to actuate properly have had a comparatively worse effect than those in their ability to sense to the position of object accurately

limitation

though the empirical analysis showed that RL algorithm can still converge under the reward disturbance, the experiment is not conducted in the sparse reward environment.

citation

PPO has been identifies ad one of the most robust approaches against reward perturbation.
- ref: Zhao, Wenshuai, et al. “Towards closing the sim-to-real gap in collaborative multi-robot deep reinforcement learning.” 2020 5th International Conference on Robotics and Automation Engineering (ICRAE). IEEE, 2020.
- ref: Wang, Jingkang, Yang Liu, and Bo Li. “Reinforcement learning with perturbed rewards.” Proceedings of the AAAI conference on artificial intelligence. Vol. 34. No. 04. 2020.