Mikayel_samvelyan Minihack the Planet a Sandbox for Open Ended Rl Research 2021

[TOC]

Title: MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Author: Mikayel Samvelyan et. al.
Publish Year: Nov 2021
Review Date: Mar 2022

Summary of paper

They presented MiniHack, an easy-to-use framework for creating rich and varied RL environments, as well as a suite of tasks developed using this framework. Built upon NLE and the des-file format, MiniHack enables the use of rich entities and dynamics from the game of NetHack to create a large variety of RL environments for targeted experimentation, while also allowing painless scaling-up of the difficulty of existing environments. MiniHack’s environments are procedurally generated by default, ensuring the evaluation of systematic generalization of RL agents.

Limitation of existing RL benchmark

benchmarks that are widely adopted by the community are not explicitly designed for evaluating specific capabilities of RL methods.
lack the ability to test specific components or open problems of RL methods in well-controlled proof-of-concept test cases
Systematically extending such environments and gradually dropping simplifying assumptions can require arduous engineering and excessive time commitment, while opting for more challenging benchmarks

des-file

The des-file format is a domain-specific language created by the developers of NetHack for describing the levels of the game. des-files are human-readable specifications of levels: distributions of grid layouts together with monsters, objects on the floor, environment features (e.g.walls, water, lava), etc.

Python operation

we can also use python code to construct the environment

Types of tasks

Navigation tasks

MiniHack’s navigation tasks challenge the agent to reach the goal position by overcoming various difficulties on their way, such as fighting monsters in corridors, crossing a river by pushing boulders into it, navigating through complex or procedurally generated mazes.

Skill Acquisition Tasks

The nature of commands in NetHack requires the agent to perform a sequence of actions so that the initial action, which is meant for interaction with an object, has an effect. The exact sequence of subsequent can be inferred by the in-game message bar prompts.

For example, when located in the same grid with an apple lying on the floor, choosing the Eat action will not be enough for the agentto eat it. In this case, the message bar will ask the following question: “There is an apple here; eat it? [y n q] (n)”. Choosing the Y action at the next time step will cause the initial EAT action to take effect, and the agent will eat the apple. Choosing the N action (or MORE action since N is the default choice) will decline the previous EAT action prompt. The rest of the actions will not progress the in-game timer and the agent will stay in the same state. We refer to this skill as Confirmation.

Summary of paper#

Limitation of existing RL benchmark#

des-file#

Python operation#

Types of tasks#

Summary of paper

Limitation of existing RL benchmark

des-file

Python operation

Types of tasks