Alex_petrekno Sample Factory Asynchronous Rl at Very High Fps 2020

[TOC]

Summary of paper

Identifying performance bottlenecks

RL involves three workloads:
1. environment simulation
2. inference
3. backpropagation
- overall performance depends on the lowest workload
- In existing methods (A2C/PPO/IMPALA) the computational workloads are dependent -> under-utilisation of the system resources.
Existing high-throughput methods focus on distributed training, therefore introducing a lot of overhead such as networking serialisation, etc.
- e.g., (Ray & RLLib <==> Redis/Plasma, Seed RL <==> GRPC, Catalyst <==> Mongo DB)

Double-buffered sampling

Resolving bottleneck # 2 (communication)

RL training process is based on static data structures: allocate all memory in the beginning of training, and then only send pointers around (shared memory suits for single server)