[TOC]
- Title: Sample Factory: Asynchronous Rl at Very High FPS
 - Author: Alex Petrenko
 - Publish Year: Oct, 2020
 - Review Date: Sun, Sep 25, 2022
 
Summary of paper
Motivation
Identifying performance bottlenecks
- 
RL involves three workloads:
- environment simulation
 - inference
 - backpropagation
 
- overall performance depends on the lowest workload
 - In existing methods (A2C/PPO/IMPALA) the computational workloads are dependent -> under-utilisation of the system resources.
 
 - 
Existing high-throughput methods focus on distributed training, therefore introducing a lot of overhead such as networking serialisation, etc.
- e.g., (Ray & RLLib <==> Redis/Plasma, Seed RL <==> GRPC, Catalyst <==> Mongo DB)
 
 
Contribution

Some key terms
Double-buffered sampling
- with double-buffered approach, environments simulators never wait
 
Resolving bottleneck # 2 (communication)
- RL training process is based on static data structures: allocate all memory in the beginning of training, and then only send pointers around (shared memory suits for single server)
 
