Younggyo_seo Masked World Models for Visual Control 2022

[TOC]

Title: Masked World Models for Visual Control 2022
Author: Younggyo Seo et. al.
Publish Year: 2022
Review Date: Fri, Jul 1, 2022

https://arxiv.org/abs/2206.14244?context=cs.AI

https://sites.google.com/view/mwm-rl

Summary of paper

Motivation

TL:DR: Masked autoencoders (MAE) has emerged as a scalable and effective self-supervised learning technique. Can MAE be also effective for visual model-based RL? Yes! with the recipe of convolutional feature masking and reward prediction to capture fine-grained and task-relevant information.

Some key terms

Decouple visual representation learning and dynamics learning

compared to DreamerV2, the new model try to decouple visual learning and dynamics learning in model based RL.

specifically, for visual learning, we train a autoencoder with convolutional layers and vision transformer (ViT) to reconstruct pixels given masked convolutional features.
- we also introduce auxiliary reward prediction objective for the observation autoencoder
for model based RL part, we learn a latent dynamics model that operates on the representation from the autoencoder.

Early convolutional layers and masking out convolutional features instead of pixel patches

this approach enables the world model to capture fine-grained visual details from complex visual observations.

we compare convolutional feature masking with pixel masking (i.e., MAE), which shows that convolutional feature masking significantly outperforms pixel masking.
- some possible reasons:
  - raw pixel is noisy and less focused
  - conv features is more robust

Potential future work

We can use this model on our project

Conv feature masking
Dynamics learning

Summary of paper#

Motivation#

Some key terms#

Potential future work#

Summary of paper

Motivation

Some key terms

Potential future work