Paper Review

Zhuosheng_zhang Multimodal Chain of Thought Reasoning in Language Models 2023

[TOC] Title: Multimodal Chain of Thought Reasoning in Language Models Author: Zhuosheng Zhang et. al. Publish Year: 2023 Review Date: Wed, Feb 8, 2023 url: https://arxiv.org/pdf/2302.00923.pdf Summary of paper Motivation LLMs have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. to elicit CoT reasoning in multimodality, a possible solution is to fine-tune small language models by fusing the vision and language features to perform CoT reasoning. The key challenge is that those language models tend to generate hallucinated reasoning chains that mislead the answer inference. Contribution We propose Mutimodal-CoT that incorporates vision features in a decoupled training framework. The framework separates the rationale generation and answer inference into two stages, the model is able to generate effective rationales that contribute to answer inference. Some key terms Multimodal-CoT ...

Siyuan_wang Unifying Structure Reasoning and Language Model Pre Training for Complex Reasoning 2023

[TOC] Title: Unifying Structure Reasoning and Language Model Pre Training for Complex Reasoning Author: Siyuan Wang et. al. Publish Year: 21 Jan 2023 Review Date: Wed, Feb 8, 2023 url: https://arxiv.org/pdf/2301.08913.pdf Summary of paper Motivation language models still suffer from a heterogeneous information alignment problem and a noisy knowledge injection problem. for complex reasoning, the context contains rich knowledge that typically exists in complex and sparse form. Contribution we propose to unify structure reasoning and language model pre-training identifies four types of elementary knowledge structures from contexts to construct structured queries utilise box embedding method to conduct explicit structure reasoning along query during language modeling Some key terms What is the problem ...

Ekin_akyurek Towards Tracing Factual Knowledge in Language Models Back to the Training Data 2022

[TOC] Title: Towards Tracing Factual Knowledge in Language Models Back to the Training Data Author: Ekin Akyurek et. al. Publish Year: EMNLP 2022 Review Date: Wed, Feb 8, 2023 url: https://aclanthology.org/2022.findings-emnlp.180.pdf Summary of paper Motivation LMs have been shown to memorize a great deal of factual knowledge contained in their training data. But when an LM generates an assertion, it is often difficult to determine where it learned this information and whether it is true. Contribution we propose the problem of fact tracing identifying which training examples taught an LM to generate a particular factual assertion. prior work on training data distribution (TDA) may offer effective tools for identifying such examples, known as “proponent”. We present the first quantitative benchmark to evaluate this we compare two popular families of TDA methods gradient based embedding based Some key terms Training data distribution method (TDA) ...

Danijar_hafner Mastering Diverse Domains Through World Models 2023

[TOC] Title: Mastering Diverse Domains Through World Models Author: Danijar Hafner et. al. Publish Year: 10 Jan 2023 Review Date: Tue, Feb 7, 2023 url: https://www.youtube.com/watch?v=vfpZu0R1s1Y Summary of paper Motivation general intelligence requires solving tasks across many domains. Current reinforcement learning algorithms carry this potential but held back by the resources and knowledge required tune them for new task. Contribution we present DreamerV3, a general and scalable algorithm based on world models that outperforms previous approaches across a wide range of domains with fixed hyperparameters. we observe favourable scaling properties of DreamerV3, with larger models directly translating to higher data-efficiency and final performance. Some key terms World Model learning ...

Yuanhan_zhang What Makes Good Examples for Visual in Context Learning 2023

[TOC] Title: What Makes Good Examples for Visual in Context Learning Author: Yuan Zhang et. al. Publish Year: 1 Feb 2023 Review Date: Mon, Feb 6, 2023 url: https://arxiv.org/pdf/2301.13670.pdf Summary of paper Motivation in this paper, the main focus is on an emergent ability in large vision models, known. as in-context learning this concept has been well-known in natural language processing but has only been studied very recently for large vision models. Contribution we for the first time provide a comprehensive investigation on the impact of in-context examples in computer vision, and find that the performance is highly sensitive to the choice of in-context examples. exposing a critical issue that different in-context examples could lead to drastically different results. Our methods obtain significant improvements over random selection under various problem settings, showing the potential of using prompt retrieval in vision applications with a Model-as-a-Service (MaaS) business structure. we show that a good in-context example should be semantically similar to the query and closer in context. A model that can better balance spatial and se- mantic closedness in feature space would be more ideal for visual in-context learning. yeah, it is because the model is not that smart in a way that it can directly tell the semantic regardless of what the spatial structure looks like Some key terms existing issue of using LLM ...

Jing_yu_koh Grounding Language Models to Images for Multimodal Generation 2023

[TOC] Title: Grounding Language Models to Images for Multimodal Generation Author: Jing Yu Koh et. al. Publish Year: 31 Jan 2023 Review Date: Mon, Feb 6, 2023 url: https://arxiv.org/pdf/2301.13823.pdf Summary of paper Motivation we propose an efficient method to ground pre-trained text-only language models to the visual domain How we keep the language model frozen, and finetune input and output linear layers to enable cross-modality interactions. This allows our model to process arbitrarily interleaved Contribution our approach works with any off-the-shelf language model and paves the way towards an effective, general solution for leveraging pre-trained language models in visually grounded settings. Related work LLMs for vision-and-language ...

Zhenfang_chen See Think Confirm Interactive Prompting Between Vision and Language Models for Knowledge Based Visual Reasoning 2023

[TOC] Title: See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge Based Visual Reasoning Author: Zhenfang Chen et. al. Publish Year: 12 Jan 2023 Review Date: Mon, Feb 6, 2023 url: https://arxiv.org/pdf/2301.05226.pdf Summary of paper Motivation Solving the knowledge-based visual reasoning tasks remains challenging, which requires a model to comprehensively understand image content, connect external world knowledge, and perform step-by-step reasoning to answer the questions correctly. Contribution We propose a novel framework named Interactive Prompting Visual Reasoner (IPVR) for few-shot knowledge based visual reasoning. IPVR contains three stages, see, think, and confirm. The see stage scans the image and grounds the visual concept candidates with a visual perception model. The think stage adopts a pre-trained large language model (LLM) to attend the key concepts from candidates adaptively. It then transforms them into text context for prompting with a visual captioning model and adopts the LLM to generate the answer. The confirm stage further uses the LLM to generate the supporting rational to the answer, verify the generated rationale with a cross-modality classifier and ensure that the rationale can infer the predicted output consistently. Some key terms human process to handle knowledge-based visual reasoning ...

Xiaotian_liu a Planning Based Neural Symbolic Approach for Embodied Instruction Following 2022

[TOC] Title: A Planning Based Neural Symbolic Approach for Embodied Instruction Following Author: Xiaotian Liu et. al. Publish Year: 2022 Review Date: Thu, Feb 2, 2023 url: https://embodied-ai.org/papers/2022/15.pdf Summary of paper Motivation end-to-end deep learning methods struggle at these tasks due to long-horizon and sparse rewards. Contribution Our main innovation relies on combining DL models for perception and NLP with a new egocentric planner based on successive planning problems formulated using the PDDL syntax, both for exploration and task accomplishment. our planning framework can naturally recover from action failures at any stage of the planned trajectory. Some key terms Embodied Instruction Following ...

So_yeon_min Film Following Instructions in Language With Modular Methods 2022

[TOC] Title: FILM: Following Instructions in Language With Modular Methods Author: So Yeon Min et. al. Publish Year: 16 Mar 2022 Review Date: Wed, Feb 1, 2023 url: https://arxiv.org/pdf/2110.07342.pdf Summary of paper Motivation current approaches assume that neural states will integrate multimodal semantics to perform state tracking, building spatial memory, exploration, and long-term planning. in contrast, we propose a modular method with structured representation that build a semantic map of scene and perform exploration with a semantic search policy, to achieve natural language goal. Contribution FILM consists of several modular components that each processes language instructions into structured forms (language processing) converts egocentric visual input into a semantic metric map (Semantic Mapping) predicts a search goal location (Semantic Search Policy) ? subgoal will be plotted as a dot on the semantic top-down map outputs subsequent navigation/interaction actions (Deterministic Policy) Some key terms embodied instruction following ...

Yuki_inoue Prompter Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following 2022

[TOC] Title: Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following Author: Yuki Inoue et. al. Publish Year: 7 Nov 2022 Review Date: Wed, Feb 1, 2023 url: https://arxiv.org/pdf/2211.03267.pdf Summary of paper Motivation we propose FILM++ which extends the existing work FILM with modifications that do not require extra data. furthermore, we propose Prompter, which replace FILM++’s semantic search module with language model prompting. no training is needed for our prompting based implementation while achieving better or least comparable performance. Contribution FILM++ to fill the role of the data efficient baseline. we propose Prompter, which replaces the semantic search module of FILM++ with language prompting, making it even more data efficient. Some key terms Difficulty in converting language into robot controls ...

Kyle_mahowald Dissociating Language and Thought in Large Language Models a Cognitive Perspective 2023

[TOC] Title: Dissociating Language and Thought in Large Language Models a Cognitive Perspective Author: Kyle Mahowald et. al. Publish Year: 16 Jan 2023 Review Date: Tue, Jan 31, 2023 url: https://arxiv.org/pdf/2301.06627.pdf Summary of paper Motivation the author tried to challenge the “good at language $\implies$ good at thought” fallacy. the second fallacy is “bad at thought $\implies$ bad at language” Contribution the author argued that LLMs have promise as scientific models of one piece of the human cognitive toolbox – formal language processing – but fall short of modelling human thought. in section 4, we consider several domains required for functional linguistic competence – formal reasoning, world knowledge, situation modelling and social cognitive abilities Some key terms deep learning models in linguistics ...

Michael_janner Planning With Diffusion for Flexible Behaviour Synthesis 2022

[TOC] Title: Planning With Diffusion for Flexible Behaviour Synthesis Author: Michael Janner et. al. Publish Year: 21 Dec 2022 Review Date: Mon, Jan 30, 2023 Summary of paper Motivation use the diffusion model to learn the dynamics tight coupling of the modelling and planning our goal is to break this abstraction barrier by designing a model and planning algorithm that are trained alongside one another, resulting in a non-autoregressive trajectory-level model for which sampling and planning are nearly identical. Some key terms ideal model-based RL ...

Shailaja_keyur_sampat Reasoning About Actions Over Visual and Linguistic Modalities a Survey 2022

[TOC] Title: Shailaja_keyur_sampat Reasoning About Actions Over Visual and Linguistic Modalities a Survey 2022 Author: Publish Year: Review Date: Fri, Jan 20, 2023 Summary of paper Motivation reasoning about actions & changes has been widely studies in the knowledge representation community, it has recently piqued the interest of NLP and computer vision researchers. Contribution Some key terms Six most frequent types of commonsense knowledge tasks that involve language-based reasoning about actions ...

Xin_wang Reinforced Cross Modal Matching and Self Supervised Imitation Learning for Vision Language Navigation 2019

[TOC] Title: Reinforced Cross Modal Matching and Self Supervised Imitation Learning for Vision Language Navigation 2019 Author: Xin Wang et. al. Publish Year: Review Date: Wed, Jan 18, 2023 Summary of paper Motivation Visual Language Navigation (VLN) presents some unique challenges first, reasoning over images and natural language instructions can be difficult. secondly, except for strictly following expert demonstrations, the feedback is rather coarse, since the “Success” feedback is provided only when the agent reaches a target position (sparse reward) A good “instruction following” trajectory may ended up just stop before you reaching the goal state and then receive zero rewards. existing work suffer from generalisation problem. (need to retrain the agent in new environment) Implementation agent can infer which sub-instruction to focus on and where to look at. (automatic splitting long instruction) with a matching critic that evaluates an executed path by the probability of reconstructing the original instruction from the executed path. P(original instruction | past trajectory) cycle reconstruction: we have P(target trajectory | the instruction) = 1, and we want to measure P(original instruction | past trajectory) this will enhance the interpretability as now you understand how the robot was thinking about

Alekh_agarwal PC-PG Policy Cover Directed Exploration for Provable Policy Gradient Learning 2020

[TOC] Title: PC-PG Policy Cover Directed Exploration for Provable Policy Gradient Learning Author: Alekh Agarwal et. al. Publish Year: Review Date: Wed, Dec 28, 2022 Summary of paper Motivation The primary drawback of direct policy gradient methods is that, by being local in nature, they fail to adequately explore the environment. In contrast, while model-based approach and Q-learning directly handle exploration through the use of optimism. Contribution Policy Cover-Policy Gradient algorithm (PC-PG), a direct, model-free, policy optimisation approach which addresses exploration through the use of a learned ensemble of policies, the latter provides a policy cover over the state space. the use of a learned policy cover address exploration, and also address what is the catastrophic forgetting problem in policy gradient approaches (which use reward bonuses); the on-policy algorithm, where approximation errors due to model mispecification amplify (see [Lu et al., 2018] for discussion) Some key terms suffering from sparse reward ...

Alekh_agarwal on the Theory of Policy Gradient Methods Optimality Approximation and Distribution Shift 2020

[TOC] Title: On the Theory of Policy Gradient Methods Optimality Approximation and Distribution Shift 2020 Author: Alekh Agarwal et. al. Publish Year: 14 Oct 2020 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution and how they cope with approximation error due to using a restricted class of parametric policies. Contribution One central contribution of this work is in providing approximation guarantees that are average case - which avoid explicit worst-case dependencies on the size of state space – by making a formal connection to supervised learning under distribution shift. This characterisation shows an important between estimation error, approximation error and exploration (as characterised through a precisely defined condition number) Some key terms basic theoretical convergence questions ...

Chloe_ching_yun_hsu Revisiting Design Choices in Proximal Policy Optimisation 2020

[TOC] Title: Revisiting Design Choices in Proximal Policy Optimisation Author: Chloe Ching-Yun Hsu et. al. Publish Year: 23 Sep 2020 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation Contribution on discrete action space with sparse high rewards, standard PPO often gets stuck at suboptimal actions. Why analyze the reason fort these failure modes and explain why they are not exposed by standard benchmarks In summary, our study suggests that Beta policy parameterization and KL-regularized objectives should be reconsidered for PPO, especially when alternatives improves PPO in all settings. The author proved the convergence guarantee for PPO-KL penalty version, as it inherits convergence guarantees of mirror descent for policy families that are closed under mixture Some key terms design choices ...

James_queeney Generalized Proximal Policy Optimisation With Sample Reuse 2021

[TOC] Title: Generalized Proximal Policy Optimisation With Sample Reuse 2021 Author: James Queeney et. al. Publish Year: 29 Oct 2021 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while off-policy methods make more efficient use of data through sample reuse. Contribution in this work, we combine the theoretically supported stability benefits of on-policy algorithms with the sample efficiency of off-policy algorithms. We develop policy improvement guarantees that are suitable for off-policy setting, and connect these bounds to the clipping mechanism used in PPO this motivate an off-policy version of the popular algorithm that we call GePPO. we demonstrate both theoretically and empirically that our algorithm delivers improved performance by effectively balancing the competing goals of stability and sample efficiency Some key terms sample complexity ...

Lun_wang Backdoorl Backdoor Attack Against Competitive Reinforcement Learning 2021

[TOC] Title: BackdooRL Backdoor Attack Against Competitive Reinforcement Learning 2021 Author: Lun Wang et. al Publish Year: 12 Dec 2021 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation in this paper, we propose BACKDOORL, a backdoor attack targeted at two player competitive reinforcement learning systems. first the adversary agent has to lead the victim to take a series of wrong actions instead of only one to prevent it from winning. Additionally, the adversary wants to exhibit the trigger action in as few steps as possible to avoid detection. Contribution we propose backdoorl, the first backdoor attack targeted at competitive reinforcement learning systems. The trigger is the action of another agent in the environment. We propose a unified method to design fast-failing agent for different environment We prototype BACKDOORL and evaluate it in four environments. The results validate the feasibility of backdoor attacks in competitive environment We study the possible defenses for backdoorl. The results show that fine-tuning cannot completely remove the backdoor. Some key terms backdoorl workflow ...

Sandy_huang Adversarial Attacks on Neural Network Policies 2017

[TOC] Title: Adversarial Attacks on Neural Network Policies Author: Sandy Huang et. al. Publish Year: 8 Feb 2017 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation in this work, we show adversarial attacks are also effective when targeting neural network policies in reinforcement learning. Specifically, we show existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies. Contribution we characterise the degree of vulnerability across tasks and training algorithm, for a subclass of adversarial example attacks in white-box and black-box settings. ...

Yinglun_xu Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning 2022

[TOC] Title: Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning Author: Yinglun Xu et. al. Publish Year: 30 May 2022 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation we study data poisoning attacks on online deep reinforcement learning (DRL) where the attacker is oblivious to the learning algorithm used by the agent and does not necessarily have full knowledge of the environment. we instantiate our framework to construct several attacks which only corrupts the rewards for a small fraction of the total training timesteps and make the agent learn a low performing policy Contribution result show that the reward attack efficiently poison agent learning with a variety of SOTA DRL algorithm such as DQN, PPO our attack can work on model-free DRL algorithm for all popular learning paradigms, and only assume the learning algorithm to be efficient. large enough reward poisoning attack in the right direction is able to disrupt the DRL algorithm. limitation ...

Young_wu Reward Poisoning Attacks on Offline Multi Agent Reinforcement Learning 2022

[TOC] Title: Reward Poisoning Attacks on Offline Multi Agent Reinforcement Learning Author: Young Wu et. al. Publish Year: 1 Dec 2022 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation Contribution unlike attacks on single-agent RL, we show that the attacker can install the target poilcy as a Markov Perfect Dominant Strategy Equilibrium (MPDSE), which rational agents are guaranteed to follow. This attack can be significantly cheaper than separate single-agent attacks. Limitation ...

Xuezhou_zhang Robust Policy Gradient Against Strong Data Corruption 2021

[TOC] Title: Robust Policy Gradient Against Strong Data Corruption Author: Xuezhou Zhang et. al. Publish Year: 2021 Review Date: Tue, Dec 27, 2022 Summary of paper Abstract Contribution the author utilised a SVD-denoising technique to identify and remove the possible reward perturbations this approach gives a robust RL algorithm Limitation This approach only solve the attack perturbation that is not consistent. (i.e. not stealthy) Some key terms Policy gradient methods ...

Kiarash_banihashem Defense Against Reward Poisoning Attacks in Reinforcement Learning 2021

[TOC] Title: Defense Against Reward Poisoning Attacks in Reinforcement Learning Author: Kiarash Banihashem et. al. Publish Year: 20 Jun 2021 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation our goal is to design agents that are robust against such attacks in terms of the worst-case utility w.r.t. the true unpoisoned rewards while computing their policies under the poisoned rewards. Contribution we formalise this reasoning and characterize the utility of our novel framework for designing defense policies. In summary, the key contributions include ...

Amin_rakhsha Reward Poisoning in Reinforcement Learning Attacks Against Unknown Learners in Unknown Environments 2021

[TOC] Title: Reward Poisoning in Reinforcement Learning Attacks Against Unknown Learners in Unknown Environments Author: Amin Rakhsha et. al. Publish Year: 16 Feb 2021 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation Our attack makes minimum assumptions on the prior knowledge of the environment or the learner’s learning algorithm. most of the prior work makes strong assumptions on the knowledge of adversary – it often assumed that the adversary has full knowledge of the environment or the agent’s learning algorithm or both. under such assumptions, attack strategies have been proposed that can mislead the agent to learn a nefarious policy with minimal perturbation to the rewards. Contribution We design a novel black-box attack, U2, that can provably achieve a near-matching performance to the SOTA white-box attack, demonstrating the feasibility of reward poisoning even in the most challenging black-box setting. limitation ...