[TOC]

  1. Title: Shailaja_keyur_sampat Reasoning About Actions Over Visual and Linguistic Modalities a Survey 2022
  2. Author:
  3. Publish Year:
  4. Review Date: Fri, Jan 20, 2023

Summary of paper

image-20230120140740245

Motivation

Contribution

Some key terms

Six most frequent types of commonsense knowledge

tasks that involve language-based reasoning about actions

Role of multi-modality

Instruction following

  1. language guided image manipulation is an emerging research direction in vision+language. While a majority of dataset involve object and attribute level scene manipulations, ….
  2. Another relevant task under this category is vision-and-language navigation, where an agent navigates in a visual environment to find a goal location by following linguistic instructions. All the above datasets include visuals, natural language instructions and a set of actions that can be performed to achieve desired goals. Further, ALFRED increased the complexity of level of the VLN task for agents by adding long, compositional tasks. The task comprises of dealing with longer action sequences, complex action space, and language that are closely related to real-world situations.

Good things about the paper (one paragraph)

Major comments

Minor comments

Citation

Incomprehension

Potential future work