Language Reward Shaping

[TOC] Title: Reinforced Cross Modal Matching and Self Supervised Imitation Learning for Vision Language Navigation 2019 Author: Xin Wang et. al. Publish Year: Review Date: Wed, Jan 18, 2023 Summary of paper Motivation Visual Language Navigation (VLN) presents some unique challenges first, reasoning over images and natural language instructions can be difficult. secondly, except for strictly following expert demonstrations, the feedback is rather coarse, since the “Success” feedback is provided only when the agent reaches a target position (sparse reward) A good “instruction following” trajectory may ended up just stop before you reaching the goal state and then receive zero rewards. existing work suffer from generalisation problem. (need to retrain the agent in new environment) Implementation agent can infer which sub-instruction to focus on and where to look at. (automatic splitting long instruction) with a matching critic that evaluates an executed path by the probability of reconstructing the original instruction from the executed path. P(original instruction | past trajectory) cycle reconstruction: we have P(target trajectory | the instruction) = 1, and we want to measure P(original instruction | past trajectory) this will enhance the interpretability as now you understand how the robot was thinking about

Language Reward Shaping

Parsa Mahmoudieh Zero Shot Reward Specification via Grounded Natural Language 2022

Xin_wang Reinforced Cross Modal Matching and Self Supervised Imitation Learning for Vision Language Navigation 2019