[TOC]

  1. Title: Interactive Grounded Language Understanding in a Collaborative Environment
  2. Author: Julia Kiseleva et. al.
  3. Publish Year: 2021
  4. Review Date: Dec 2021

Summary of paper

The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment.

image-20211222154135180

image-20211222154153920

The split the problem into following concrete research questions, which correspond to separate tasks that can be used to study each component individually before joining all of them into one system

image-20211222154602736

Silent Builder

image-20211222154934477

Environment info

image-20211222155152963

Action space

The agent’s action space might consist of all possible actions in Minecraft. For the current Builder baseline, we offer a discretised version of 18 actions:

  • noop, step forward, step backward, step right, step left, turn up, turn down, turn left, turn right, jump, attack, place block, choose block type 1-6.

Sequence of annotation of sub-goals

They annotated each dialog’s sub-goals and stored them in a queue, where each sub-goal corresponds to one specific step instruction from the Architect. At each step, they pop up a sub-task (e.g., in about the middle, build a column five tall) and wait until the agent completes it. If the agent completes this sub-task, they pop up the next sub-task. They trained a matching model to decide if the current sub-task has been finished.