[TOC]

  1. Title: Sample Factory: Asynchronous Rl at Very High FPS
  2. Author: Alex Petrenko
  3. Publish Year: Oct, 2020
  4. Review Date: Sun, Sep 25, 2022

Summary of paper

Motivation

Identifying performance bottlenecks

  1. RL involves three workloads:

    1. environment simulation
    2. inference
    3. backpropagation
    • overall performance depends on the lowest workload
    • In existing methods (A2C/PPO/IMPALA) the computational workloads are dependent -> under-utilisation of the system resources.
  2. Existing high-throughput methods focus on distributed training, therefore introducing a lot of overhead such as networking serialisation, etc.

    • e.g., (Ray & RLLib <==> Redis/Plasma, Seed RL <==> GRPC, Catalyst <==> Mongo DB)

Contribution

image-20220925165704937

Some key terms

Double-buffered sampling

  • with double-buffered approach, environments simulators never wait
  • image-20220925205313982

Resolving bottleneck # 2 (communication)

  • RL training process is based on static data structures: allocate all memory in the beginning of training, and then only send pointers around (shared memory suits for single server)
  • image-20220925212111232

image-20220925213918914

Good things about the paper (one paragraph)

Major comments

Minor comments

Incomprehension

Potential future work