Yunhan_huang Manipulating Reinforcement Learning Stealthy Attacks on Cost Signals 2020

[TOC] Title: Manipulating Reinforcement Learning Stealthy Attacks on Cost Signals Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals Author: Yunhan Huang et. al. Publish Year: 2020 Review Date: Sun, Dec 25, 2022 Summary of paper Motivation understand the impact of the falsification of cost signals on the convergence of Q-learning algorithm Contribution In Q-learning, we show that Q-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. and there is a robust region within which the adversarial attacks cannot achieve its objective. The robust region of the cost can be utilised by both offensive and defensive side. An RL agent can leverage the robust region to evaluate the robustness to malicious falsification. we provide conditions on the falsified cost which can mislead the agent to learn an adversary’s favoured policy. Some key terms Stealthy Attacks ...

<span title='2022-12-25 19:12:17 +1100 AEDT'>December 25, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;336 words&nbsp;·&nbsp;Sukai Huang

Vincent_zhuang No Regret Reinforcement Learning With Heavy Tailed Rewards 2021

[TOC] Title: No-Regret Reinforcement Learning With Heavy Tailed Rewards Author: Vincent Zhuang et. al. Publish Year: 2021 Review Date: Sun, Dec 25, 2022 Summary of paper Motivation To the best of our knowledge, no prior work has considered our setting of heavy-tailed rewards in the MDP setting. Contribution We demonstrate that robust mean estimation techniques can be broadly applied to reinforcement learning algorithms (specifically confidence-based methods) in order to provably han- dle the heavy-tailed reward setting Some key terms Robust UCB algorithm ...

<span title='2022-12-25 18:15:53 +1100 AEDT'>December 25, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;225 words&nbsp;·&nbsp;Sukai Huang

Wenshuai_zhao Towards Closing the Sim to Real Gap in Collaborative Multi Robot Deep Reinforcement Learning 2020

[TOC] Title: Towards Closing the Sim to Real Gap in Collaborative Multi Robot Deep Reinforcement Learning Author: Wenshuai Zhao et. al. Publish Year: 2020 Review Date: Sun, Dec 25, 2022 Summary of paper Motivation we introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning we discuss on how both the different types of perturbations and how the number of agents experiencing those perturbations affect the collaborative learning effort Contribution This is, to the best of our knowledge, the first work exploring the limitation of PPO in multi-robot systems when considering that different robots might be exposed to different environment where their sensors or actuators have induced errors ...

<span title='2022-12-25 16:54:11 +1100 AEDT'>December 25, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;365 words&nbsp;·&nbsp;Sukai Huang

Jan_corazza Reinforcement Learning With Stochastic Reward Machines 2022

[TOC] Title: Reinforcement Learning With Stochastic Reward Machines Author: Jan Corazza et. al. Publish Year: AAAI 2022 Review Date: Sat, Dec 24, 2022 Summary of paper Motivation reward machines are an established tool for dealing with reinforcement learning problems in which rewards are sparse and depend on complex sequence of actions. However, existing algorithms for learning reward machines assume an overly idealized setting where rewards have to be free of noise. to overcome this practical limitation, we introduce a novel type of reward machines called stochastic reward machines, and an algorithm for learning them. Contribution Discussing the handling of noisy reward for non-markovian reward function. limitation: the solution introduces multiple sub value function models, which is different from the standard RL algorithm. The work does not emphasise on the sample efficiency of the algorithm. Some key terms Reward machine ...

<span title='2022-12-24 22:36:07 +1100 AEDT'>December 24, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;465 words&nbsp;·&nbsp;Sukai Huang

Oguzhan_dogru Reinforcement Learning With Constrained Uncertain Reward Function Through Particle Filtering 2022

[TOC] Title: Reinforcement Learning With Constrained Uncertain Reward Function Through Particle Filtering Author: Oguzhan Dogru et. al. Publish Year: July 2022 Review Date: Sat, Dec 24, 2022 Summary of paper Motivation this study consider a type of uncertainty, which is caused by the sensor that are utilised for reward function. When the noise is Gaussian and the system is linear Contribution this work used “particle filtering” technique to estimate the true reward function from the perturbed discrete reward sampling points. Some key terms Good things about the paper (one paragraph) Major comments Citation ...

<span title='2022-12-24 19:32:25 +1100 AEDT'>December 24, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;297 words&nbsp;·&nbsp;Sukai Huang

Inaam_ilahi Challenges and Countermeasures for Adversarial Attacks on Reinforcement Learning 2022

[TOC] Title: Challenges and Countermeasures for Adversarial Attacks on Reinforcement Learning Author: Inaam Ilahi et. al. Publish Year: 13 Sep 2021 Review Date: Sat, Dec 24, 2022 Summary of paper Motivation DRL is susceptible to adversarial attacks, which precludes its use in real-life critical system and applications. Therefore, we provide a comprehensive survey that discusses emerging attacks on DRL-based system and the potential countermeasures to defend against these attacks. Contribution we provide the DRL fundamentals along with a non-exhaustive taxonomy of advanced DRL algorithms we present a comprehensive survey of adversarial attacks on DRL and their potential countermeasures we discuss the available benchmarks and metrics for the robustness of DRL finally, we highlight the open issues and research challenges in the robustness of DRL and introduce some potential research directions . Some key terms organisation of this article ...

<span title='2022-12-24 17:06:12 +1100 AEDT'>December 24, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;517 words&nbsp;·&nbsp;Sukai Huang

Zuxin_liu on the Robustness of Safe Reinforcement Learning Under Observational Perturbations 2022

[TOC] Title: On the Robustness of Safe Reinforcement Learning Under Observational Perturbations Author: Zuxin Liu et. al. Publish Year: 3 Oct 2022 Review Date: Thu, Dec 22, 2022 Summary of paper Motivation While many recent safe RL methods with deep policies can achieve outstanding constraint satisfaction in noise-free simulation environment, such a concern regarding their vulnerability under adversarial perturbation has not been studies in the safe RL setting. Contribution we are the first to formally analyze the unique vulnerability of the optimal policy in safe RL under observational corruptions. We define the state-adversarial safe RL problem and investigate its fundamental properties. We show that optimal solutions of safe RL problems are theoretically vulnerable under observational adversarial attacks we show that existing adversarial attack algorithms focusing on minimizing agent rewards do not always work, and propose two effective attack algorithms with theoretical justifications – one directly maximise the constraint violation cost, and one maximise the task reward to induce a tempting but risky policy. Surprisingly, the maximum reward attack is very strong in inducing unsafe behaviors, both in theory and practice we propose an adversarial training algorithm with the proposed attackers and show contraction properties of their Bellman operators. Extensive experiments in continuous control tasks show that our method is more robust against adversarial perturbations in terms of constraint satisfaction. Some key terms Safe reinforcement learning definition ...

<span title='2022-12-22 22:38:13 +1100 AEDT'>December 22, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;532 words&nbsp;·&nbsp;Sukai Huang

Ruben_majadas Disturbing Reinforcement Learning Agents With Corrupted Rewards 2021

[TOC] Title: Disturbing Reinforcement Learning Agents With Corrupted Rewards Author: Ruben Majadas et. al. Publish Year: Feb 2021 Review Date: Sat, Dec 17, 2022 Summary of paper Motivation recent works have shown how the performance of RL algorithm decreases under the influence of soft changes in the reward function. However, little work has been done about how sensitive these disturbances are depending on the aggressiveness of the attack and the learning learning exploration strategy. it chooses a subclass of MDPs: episodic, stochastic goal-only rewards MDPs Contribution it demonstrated that smoothly crafting adversarial rewards are able to mislead the learner the policy that is learned using low exploration probability values is more robust to corrupt rewards. (though this conclusion seems valid only for the proposed experiment setting) the agent is completely lost with attack probabilities higher that than p=0.4 Some key terms deterministic goal only reward MDP ...

<span title='2022-12-17 00:38:35 +1100 AEDT'>December 17, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;383 words&nbsp;·&nbsp;Sukai Huang

Jingkang_wang Reinforcement Learning With Perturbed Rewards 2020

[TOC] Title: Reinforcement Learning With Perturbed Rewards Author: Jingkang Wang et. al. Publish Year: 1 Feb 2020 Review Date: Fri, Dec 16, 2022 Summary of paper Motivation this paper studies RL with perturbed rewards, where a technical challenge is to revert the perturbation process so that the right policy is learned. Some experiments are used to support the algorithm (i.e., estimate the confusion matrix and revert) using existing techniques from the supervised learning (and crowdsourcing) literature. Limitation reviewers had concerns over the scope / significance of this work, mostly about how the confusion matrix is learned. If this matrix is known, correcting reward perturbation is easy, and standard RL can be applied to the corrected rewards. Specifically, the work seems to be limited in two substantial ways, both related to how confusion matrix is learned the reward function needs to be deterministic majority voting requires the number of states to be finite the significance of this work is therefore limited to finite-state problems with deterministic rewards, which is quite restricted. overall, the setting studied here, together with a thorough treatment of an (even restricted) case, could make an interesting paper that inspires future work. However, the exact problem setting is not completely clear in the paper, and the limitation of the technical contribution is somewhat unclear. Contribution The SOTA PPO algorithm is able to obtain 84.6% and 80.8% improvements on average score for five Atari games, with error rates as 10% and 30% respectively Some key terms reward function is often perturbed ...

<span title='2022-12-16 20:48:51 +1100 AEDT'>December 16, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;402 words&nbsp;·&nbsp;Sukai Huang
the belief desire intention model

Jacob_andreas Language Models as Agent Models 2022

[TOC] Title: Language Models as Agent Models Author: Jacob Andreas Publish Year: 3 Dec 2022 Review Date: Sat, Dec 10, 2022 https://arxiv.org/pdf/2212.01681.pdf Summary of paper Motivation during training, LMs have access only to the text of the documents, with no direct evidence of the internal states of the human agent that produce them. (kind of hidden MDP thing) this is a fact often used to argue that LMs are incapable of modelling goal-directed aspects of human language production and comprehension. The author stated that even in today’s non-robust and error-prone models – LM infer and use representations of fine-grained communicative intensions and more abstract beliefs and goals. Despite that limited nature of their training data, they can thus serve as building blocks for systems that communicate and act intentionally. In other words, the author said that language model can be used to communicate intention of human agent, and hence it can be treated as a agent model. Contribution the author claimed that in the course of performing next-word prediction in context, current LMs sometimes infer inappropriate, partial representations of beliefs ,desires and intentions possessed by the agent that produced the context, and other agents mentioned within it. Once these representations are inferred, they are causally linked to LM prediction, and thus bear the same relation to generated text that an intentional agent’s state bears to its communicative actions. The high-level goals of this paper are twofold: first, to outline a specific sense in which idealised language models can function as models of agent belief, desires and intentions; second, to highlight a few cases in which existing models appear to approach this idealization (and describe the ways in which they still fall short) Training on text alone produces ready-made models of the map from agent states to text; these models offer a starting point for language processing systems that communicate intentionally. Some key terms Current language model is bad ...

<span title='2022-12-10 00:47:33 +1100 AEDT'>December 10, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;639 words&nbsp;·&nbsp;Sukai Huang
architecture

Charlie_snell Context Aware Language Modeling for Goal Oriented Dialogue Systems 2022

[TOC] Title: Context Aware Language Modeling for Goal Oriented Dialogue Systems Author: Charlie Snell et. al. Publish Year: 22 Apr 2022 Review Date: Sun, Nov 20, 2022 Summary of paper Motivation while supervised learning with large language models is capable of producing realistic text, how to steer such responses towards completing a specific task without sacrificing language quality remains an open question. how can we scalably and effectively introduce the mechanisms of goal-directed decision making into end-to-end language models, to steer language generation toward completing specific dialogue tasks rather than simply generating probable responses. they aim to directly finetune language models in a task-aware manner such that they can maximise a give utility function. Contribution it seems like the manipulation of training dataset and also the auxiliary objective are the two main “innovations” of the model. Some key terms Dialogue ...

<span title='2022-11-20 16:29:59 +1100 AEDT'>November 20, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;489 words&nbsp;·&nbsp;Sukai Huang

Sanchit_agarwal Building Goal Oriented Dialogue Systems With Situated Visual Context 2021

[TOC] Title: Building Goal Oriented Dialogue Systems With Situated Visual Context 2021 Author: Sanchit Agarwal et. al. Publish Year: 22 Nov 2021 Review Date: Sun, Nov 20, 2022 Summary of paper Motivation with the surge of virtual assistants with screen, the next generation of agents are required to also understand screen context in order to provide a proper interactive experience, and better understand users’ goals. So in this paper, they propose a novel multimodal conversational framework, where the agent’s next action and their arguments are derived jointly conditioned on the conversational and the visual context. The model can recognise visual features such as color and shape as well as the metadata based features such as price or star rating associated with a visual entity. Contribution propose a novel multimodal conversational system that considers screen context, in addition to dialogue context, while deciding the agent’s next action The proposed visual grounding model takes both metadata and images as input allowing it to reason over metadata and visual information Our solution encodes the user query and each visual entities and then compute the similarity between them. to improve the visual entity encoding, they introduced query guided attention and entity self-attention layers. collect the MTurk survey and also create a multimodal dialogue simulator Architecture ...

<span title='2022-11-20 16:29:14 +1100 AEDT'>November 20, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;211 words&nbsp;·&nbsp;Sukai Huang

Yichi_zhang Danli Deliberative Agent for Following Natural Language Instructions 2022

[TOC] Title: DANLI: Deliberative Agent for Following Natural Language Instructions Author: Yichi Zhang Publish Year: 22 Oct, 2022 Review Date: Sun, Nov 20, 2022 Summary of paper Motivation reactive agent simply learn and imitate behaviours encountered in the training data these reactive agents are insufficient for long-horizon complex tasks. To address this limitation, we propose a neuro-symbolic deliberative agent that, while following language instructions, proactively applies reasoning and planning based on its neural and symbolic representations acquired from the past experience. Contribution We show that our deliberative agent achieves greater than 70% improvement over reactive baselines on the challenging TEACh benchmark Some key terms Natural language instruction following with embodied AI agents ...

<span title='2022-11-20 16:28:23 +1100 AEDT'>November 20, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;343 words&nbsp;·&nbsp;Sukai Huang

Xiang_li Diffusion-LM Improves Controllable Text Generation 2022

[TOC] Title: Diffusion-LM Improves Controllable Text Generation Author: Xiang Lisa Li Publish Year: May 2022 Review Date: Mon, Nov 14, 2022 https://arxiv.org/pdf/2205.14217.pdf Summary of paper Motivation can language tokens be represented as floating number? they develop a new non-autoregressive language model based on continuous diffusion Diffusion LM iteratively denoises as sequence of Gaussian vectors into word vectors, yielding a sequence of intermediate latent variable. how to convert from continuous embeddings back to words they used rounding and many other tricks to stabilise the training process Contribution they tried diffusion model for Language Model Incomprehension Not sure if the model is good at text generation. ...

<span title='2022-11-14 16:32:31 +1100 AEDT'>November 14, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;104 words&nbsp;·&nbsp;Sukai Huang
Relatedness and naturalness

Jie_huang Can Language Models Be Specific How 2022

[TOC] Title: Can Language Models Be Specific? How? Author: Jie Huang et. al. Publish Year: 11 Oct 2022 Review Date: Tue, Nov 8, 2022 Summary of paper Motivation they propose to measure how specific the language of pre-trained language models (PLM) is, To achieve this, they introduced a novel approach to build a benchmark for specificity testing by forming masked token prediction tasks with prompts. for instance given “J.K. Rowling was born in [MASK]”, we want to test whether a more specific answer will be better filled by PLMs. e.g., Yate instead of England it is known that if the prediction is more specific, we can retrieve more fine-grained information from language models, and further acquire more information. viewer’s opinion: we are not saying that summarisation is easy or having less useful information, there are cases that abstract info is more useful Contribution although there are works on measuring how much knowledge is stored in PLMs or improving the correctness of the predictions, non attempted to measure or improve the specificity of prediction made by PLMs. Understanding how specific the language of PLMs is can help us better understand the behaviour of language models and facilitate downstream applications such as question answering etc. setup a dataset benchmark for specificity, The quality of the benchmark is high, where the judgment on which answer is more specific is ∼ 97% consistent with humans. Discovery in general, PLMs prefer less specific answers without subjects given, and they only have a weak ability to differentiate coarse-grained/fine-grained objects by measuring their (cosine) similarities to subjects. the results indicate that specificity was neglected by existing research on language models Improving specificity of the prediction few-shot prompting ...

<span title='2022-11-08 20:41:04 +1100 AEDT'>November 8, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;429 words&nbsp;·&nbsp;Sukai Huang

Yizhou_zhao Semantic Aligned Fusion Transformer for One Shot Object Detection 2022

[TOC] Title: Semantic-Aligned Fusion Transformer for One Shot Object Detection Author: Yizhou Zhao et. al. Publish Year: 2022 Review Date: Mon, Oct 24, 2022 https://arxiv.org/pdf/2203.09093v2.pdf Summary of paper Motivation with extreme data scarcity, current approaches, explore various feature fusions to obtain directly transferable meta-knowledge in this paper, they, attribute the previous limitation to inappropriate correlation methods that misalign query-support semantics by overlooking spatial structure and scale variances. ...

<span title='2022-10-24 19:14:34 +1100 AEDT'>October 24, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;67 words&nbsp;·&nbsp;Sukai Huang
architecture

Ting_i_hsieh One Shot Object Detection With Co Attention and Co Excitation 2019

[TOC] Title: One-Shot Object Detection With Co-Attention and Co-Excitation Author: Ting-I Hsieh et. al. Publish Year: Nov 2019 Review Date: Mon, Oct 24, 2022 https://arxiv.org/pdf/1911.12529.pdf Summary of paper Motivation this paper aims to tackle the challenging problem of one-shot object detection, Given a query image patch whose class label is not included in the training data, To this end, they developed a novel co-attention and co-excitation (CoAE) framework that makes contributions in three key technical aspects first, use the non-local operation to explore the co-attention embodied in each query-target pair and yield region proposals accounting for the one-shot situation. second, we formulate a squeeze-and-co-excitation scheme that can adaptively emphasise correlated feature channels to help uncover relevant object proposals and eventually the target objects third, we design a margin-based ranking loss for implicitly learning a metric to predict the similarity of a region proposal to the underlying query, no matter its class label is seen or unseen training. ...

<span title='2022-10-24 19:13:10 +1100 AEDT'>October 24, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;158 words&nbsp;·&nbsp;Sukai Huang
architecture

Ayan_kumar_bhunia a Deep One Shot Network for Query Based Logo Retrieval 2019

[TOC] Title: A Deep-One Shot Network for Query-Based Logo Retrieval Author: Ayan Kumar Bhunia et. al. Publish Year: Jul 2019 Review Date: Mon, Oct 24, 2022 https://arxiv.org/pdf/1811.01395.pdf Summary of paper Motivation Existing general purpose just cannot handle unseen new logos (not labelled logos) in this work, they developed an easy-to-implement query based logo detection and localisation system by employing a one-shot learning technique using off-the-shelf neural network components. ...

<span title='2022-10-24 19:12:22 +1100 AEDT'>October 24, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;258 words&nbsp;·&nbsp;Sukai Huang
overall architecture

Yuetian_weng an Efficient Spatio Temporal Pyramid Transformer for Action Detection 2022

[TOC] Title: An Efficient Spatio-Temporal Pyramid Transformer for Action Detection Author: Yuetian Weng et. al. Publish Year: Jul 2022 Review Date: Thu, Oct 20, 2022 Summary of paper Motivation the task of action detection aims at deducing both the action category and localisation of the start and end moment for each action instance in a long, untrimmed video. it is non-trivial to design an efficient architecture for action detection due to the prohibitively expensive self-attentions over a long sequence of video clips To this end, they present an efficient hierarchical spatial temporal transformer for action detection Building upon the fact that the early self-attention layer in Transformer still focus on local patterns. Background to date, the majority of action detection methods are driven by 3D convolutional neural networks (CNNs), e.g., C3D, I3D, to encode video segment features from video RGB frames and optical flows however, the limited receptive field hinders the CNN-based models to capture long-term spatio-temporal dependencies. alternatively, vision transformers have shown the advantage of capturing global dependencies via the self-attention mechanism. Hierarchical ViTs divide Transformer blocks into several stages and progressively reduce the spatial size of feature maps when the network goes deeper. but having self-attention over a sequence of images is expensive also they found out that the global attention in the early layers actually only encodes local visual pattens (i.e., it only attends to its nearby tokens in adjacent frames while rarely interacting with tokens in distance frames) Efficient Spatio-temporal Pyramid Transformer ...

<span title='2022-10-20 19:06:41 +1100 AEDT'>October 20, 2022</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;649 words&nbsp;·&nbsp;Sukai Huang
MEME agent network architecture

Steven_kapturowski Human Level Atari 200x Faster 2022

[TOC] Title: Human Level Atari 200x Faster Author: Steven Kapturowski et. al. DeepMind Publish Year: September 2022 Review Date: Wed, Oct 5, 2022 Summary of paper https://arxiv.org/pdf/2209.07550.pdf Motivation Agent 57 came at the cost of poor data-efficiency , requiring nearly 80,000 million frames of experience to achieve. this one can achieve the same performance in 390 million frames Contribution Some key terms NFNet - Normalisation Free Network https://towardsdatascience.com/nfnets-explained-deepminds-new-state-of-the-art-image-classifier-10430c8599ee Batch normalisation – the bad it is expensive batch normalisation breaks the assumption of data independence NFNet applies 3 different techniques: Modified residual branches and convolutions with Scaled Weight standardisation Adaptive Gradient Clipping Architecture optimisation for improved accuracy and training speed. https://github.com/vballoli/nfnets-pytorch Previous Non-Image features ...

<span title='2022-10-05 23:22:01 +1100 AEDT'>October 5, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;357 words&nbsp;·&nbsp;Sukai Huang
CoBERL architecture

Andrea_banino Coberl Contrastive Bert for Reinforcement Learning 2022

[TOC] Title: CoBERL Contrastive BERT for Reinforcement Learning Author: Andrea Banino et. al. DeepMind Publish Year: Feb 2022 Review Date: Wed, Oct 5, 2022 Summary of paper https://arxiv.org/pdf/2107.05431.pdf Motivation Contribution Some key terms Representation learning in reinforcement learning motivation: if state information could be effectively extracted from raw observations it may then be possible to learn from there as fast as from states. however, given the often sparse reward signal coming from the environment, learning representations in RL has to be achieved with little to no supervision. approach types class 1: auxiliary self-supervised losses to accelerate the learning speed in model-free RL algorithm class 2: learn a world model and use this to collect imagined rollouts, which then act as extra data to train the RL algorithm reducing the samples required from the environment CoBERL is in class 1 ​ it uses both masked language modelling and contrastive learning RL using BERT architecture – RELIC ...

<span title='2022-10-05 23:04:49 +1100 AEDT'>October 5, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;258 words&nbsp;·&nbsp;Sukai Huang
3D U-Net

Jonathan_ho Video Diffusion Models 2022

[TOC] Title: Google Video Diffusion Models Author: Jonathan Ho et. al. Publish Year: 22 Jun 2022 Review Date: Thu, Sep 22, 2022 Summary of paper Motivation proposing a diffusion model for video generation that shows very promising initial results Contribution this is the extension of image diffusion model they introduce a new conditional sampling technique for spatial and temporal video extension that performs better. Some key terms Diffusion model A diffusion model specified in continuous time is a generative model with latents Training diffusion model ...

<span title='2022-09-22 20:40:21 +1000 AEST'>September 22, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;471 words&nbsp;·&nbsp;Sukai Huang
architecture diagram

Dongwon Fire Burns Sword Cuts Commonsense Inductive Bias for Exploration in Text Based Games 2022

[TOC] Title: Fire Burns, Sword Cuts: Commonsense Inductive Bias for Exploration in Text Based Games Author: Dongwon Kelvin Ryu et. al. Publish Year: ACL 2022 Review Date: Thu, Sep 22, 2022 Summary of paper Motivation Text-based games (TGs) are exciting testbeds for developing deep reinforcement learning techniques due to their partially observed environments and large action space. A fundamental challenges in TGs is the efficient exploration of the large action space when the agent has not yet acquired enough knowledge about the environment. So, we want to inject external commonsense knowledge into the agent during training when the agent is most uncertain about its next action. Contribution In addition to performance increase, the produced trajectory of actions exhibit lower perplexity, when tested with a pre-trained LM, indicating better closeness to human language. Some key terms Exploration efficiency ...

<span title='2022-09-22 19:38:56 +1000 AEST'>September 22, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;276 words&nbsp;·&nbsp;Sukai Huang
model structure

Wenlong_huang Language Models as Zero Shot Planners Extracting Actionable Knowledge for Embodied Agents 2022

[TOC] Title: Language Models as Zero Shot Planners: Extracting Actionable Knowledge for Embodied Agents Author: Wenlong Huang et. al. Publish Year: Mar 2022 Review Date: Mon, Sep 19, 2022 Summary of paper Motivation Large language models are learning general commonsense world knowledge. so this paper, the author investigate the possibility of grounding high-level tasks, expressed as natural language (e.g., “make breakfast”) to a chosen set of action steps (“open fridge”). Contribution they found out that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into mid-level plans without any further training. they proposed several tools to improve executability of the model generation without invasive probing or modifications to the model. Some key terms What is prompt learning ...

<span title='2022-09-19 21:55:13 +1000 AEST'>September 19, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;253 words&nbsp;·&nbsp;Sukai Huang
add object detection pretrain model

Pengchuan_zhang Vinvl Revisiting Visual Representations in Vision Language Models 2021

[TOC] Title: VinVL: Revisiting Visual Representations in Vision Language Models Author: Pengchuan Zhang et. al. Publish Year: 10 Mar 2021 Review Date: Sat, Sep 3, 2022 Summary of paper Motivation In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model Oscar. And utilise an improved approach OSCAR + to pretrain the VL model Contribution has a bigger Object Detection model with larger amount of training data, called “ResNeXt-152 C4” Some key terms Vision Language Pretraining ...

<span title='2022-09-03 17:17:47 +1000 AEST'>September 3, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;332 words&nbsp;·&nbsp;Sukai Huang