[TOC]

  1. Title: Reinforced Cross Modal Matching and Self Supervised Imitation Learning for Vision Language Navigation 2019
  2. Author: Xin Wang et. al.
  3. Publish Year:
  4. Review Date: Wed, Jan 18, 2023

Summary of paper

image-20230118095333795

Motivation

Visual Language Navigation (VLN) presents some unique challenges

  • first, reasoning over images and natural language instructions can be difficult.
  • secondly, except for strictly following expert demonstrations, the feedback is rather coarse, since the “Success” feedback is provided only when the agent reaches a target position (sparse reward)
  • A good “instruction following” trajectory may ended up just stop before you reaching the goal state and then receive zero rewards.
  • existing work suffer from generalisation problem. (need to retrain the agent in new environment)

Implementation

  1. agent can infer which sub-instruction to focus on and where to look at. (automatic splitting long instruction)
  2. with a matching critic that evaluates an executed path by the probability of reconstructing the original instruction from the executed path. P(original instruction | past trajectory)
    1. cycle reconstruction: we have P(target trajectory | the instruction) = 1, and we want to measure P(original instruction | past trajectory)
    2. this will enhance the interpretability as now you understand how the robot was thinking about