architecture

Alex_petrekno Sample Factory Asynchronous Rl at Very High Fps 2020

[TOC] Title: Sample Factory: Asynchronous Rl at Very High FPS Author: Alex Petrenko Publish Year: Oct, 2020 Review Date: Sun, Sep 25, 2022 Summary of paper Motivation Identifying performance bottlenecks RL involves three workloads: environment simulation inference backpropagation overall performance depends on the lowest workload In existing methods (A2C/PPO/IMPALA) the computational workloads are dependent -> under-utilisation of the system resources. Existing high-throughput methods focus on distributed training, therefore introducing a lot of overhead such as networking serialisation, etc. ...

September 25, 2022 · 1 min · 154 words · Sukai Huang
3D U-Net

Jonathan_ho Video Diffusion Models 2022

[TOC] Title: Google Video Diffusion Models Author: Jonathan Ho et. al. Publish Year: 22 Jun 2022 Review Date: Thu, Sep 22, 2022 Summary of paper Motivation proposing a diffusion model for video generation that shows very promising initial results Contribution this is the extension of image diffusion model they introduce a new conditional sampling technique for spatial and temporal video extension that performs better. Some key terms Diffusion model A diffusion model specified in continuous time is a generative model with latents Training diffusion model ...

September 22, 2022 · 3 min · 471 words · Sukai Huang
architecture diagram

Dongwon Fire Burns Sword Cuts Commonsense Inductive Bias for Exploration in Text Based Games 2022

[TOC] Title: Fire Burns, Sword Cuts: Commonsense Inductive Bias for Exploration in Text Based Games Author: Dongwon Kelvin Ryu et. al. Publish Year: ACL 2022 Review Date: Thu, Sep 22, 2022 Summary of paper Motivation Text-based games (TGs) are exciting testbeds for developing deep reinforcement learning techniques due to their partially observed environments and large action space. A fundamental challenges in TGs is the efficient exploration of the large action space when the agent has not yet acquired enough knowledge about the environment. So, we want to inject external commonsense knowledge into the agent during training when the agent is most uncertain about its next action. Contribution In addition to performance increase, the produced trajectory of actions exhibit lower perplexity, when tested with a pre-trained LM, indicating better closeness to human language. Some key terms Exploration efficiency ...

September 22, 2022 · 2 min · 276 words · Sukai Huang
model structure

Wenlong_huang Language Models as Zero Shot Planners Extracting Actionable Knowledge for Embodied Agents 2022

[TOC] Title: Language Models as Zero Shot Planners: Extracting Actionable Knowledge for Embodied Agents Author: Wenlong Huang et. al. Publish Year: Mar 2022 Review Date: Mon, Sep 19, 2022 Summary of paper Motivation Large language models are learning general commonsense world knowledge. so this paper, the author investigate the possibility of grounding high-level tasks, expressed as natural language (e.g., “make breakfast”) to a chosen set of action steps (“open fridge”). Contribution they found out that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into mid-level plans without any further training. they proposed several tools to improve executability of the model generation without invasive probing or modifications to the model. Some key terms What is prompt learning ...

September 19, 2022 · 2 min · 253 words · Sukai Huang
add object detection pretrain model

Pengchuan_zhang Vinvl Revisiting Visual Representations in Vision Language Models 2021

[TOC] Title: VinVL: Revisiting Visual Representations in Vision Language Models Author: Pengchuan Zhang et. al. Publish Year: 10 Mar 2021 Review Date: Sat, Sep 3, 2022 Summary of paper Motivation In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model Oscar. And utilise an improved approach OSCAR + to pretrain the VL model Contribution has a bigger Object Detection model with larger amount of training data, called “ResNeXt-152 C4” Some key terms Vision Language Pretraining ...

September 3, 2022 · 2 min · 332 words · Sukai Huang
illustration of Oscar model

Xiujun_li Oscar Object Semantic Aligned Pro Training for Vision Language Tasks 2020

[TOC] Title: Oscar: Object Semantic Aligned Pro Training for Vision Language Tasks Author: Xiujun Li et. al. Publish Year: 26 Jul 2020 Review Date: Sat, Sep 3, 2022 Summary of paper Motivation Existing method simply concatenates image region features (patch features) and text features as input to the model to be pre-trained and use self-attention to learn image-text semantic alignments in a brute force manner. the lack of explicit alignment information between the image regions and the text poses alignment modelling a weakly-supervised learning task. ...

September 3, 2022 · 3 min · 462 words · Sukai Huang
Illustration of DiffCSE

Yung_sung_chuang Diffcse Difference Based Contrastive Learning for Sentence Embeddings 2022

[TOC] Title: DiffCSE: Difference Based Contrastive Learning for Sentence Embeddings Author: Yung-Sung Chuang et. al. Publish Year: 21 Apr 2022 Review Date: Sat, Aug 27, 2022 Summary of paper Motivation DiffCSE learns sentences that are sensitive to the difference between the original sentence and and edited sentence. Contribution we propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings Some key terms DiffCSE this is an unsupervsied contrastive learning framework rather than model architecture Contrastive learning in single modality data ...

August 27, 2022 · 2 min · 351 words · Sukai Huang
Different architectures for image and text retrieval

Gregor_geigle Retrieve Fast Rerank Smart Cooperative and Joint Approaches for Improved Cross Modal Retrieval 2022

[TOC] Title: Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval Author: Gregor Geigle et. al. Publish Year: 19 Feb, 2022 Review Date: Sat, Aug 27, 2022 Summary of paper Motivation they want to combine the cross encoder and the bi encoder advantages and have a more efficient cross-modal search and retrieval efficiency and simplicity of BE approach based on twin network expressiveness and cutting-edge performance of CE methods. Contribution We propose a novel joint Cross Encoding and Binary Encoding model (Joint-Coop), which is trained to simultaneously cross-encode and embed multi-modal input; it achieves the highest scores overall while maintaining retrieval efficiency ...

August 27, 2022 · 3 min · 453 words · Sukai Huang
MP-Net structure

Kaitao_song Mpnet Masked and Permuted Retrain for Language Understanding 2020

[TOC] Title: MPNet: Masked and Permuted Pre-training for Language Understanding Author: Kaitao Song et. al. Publish Year: 2020 Review Date: Thu, Aug 25, 2022 Summary of paper Motivation BERT adopts masked language modelling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT is all attention block and the positional embedding is the only info that care about the ordering, BERT neglects dependency among predicted tokens ...

August 25, 2022 · 2 min · 378 words · Sukai Huang
multimodal framework

Sergios_karagiannakos Vision Language Models Towards Multimodal Dl 2022

[TOC] Title: Vision Language Models Towards Multimodal Deep Learning Author: Sergios Karagiannakos Publish Year: 03 Mar 2022 Review Date: Tue, Aug 9, 2022 https://theaisummer.com/vision-language-models/

August 9, 2022 · 1 min · 24 words · Sukai Huang
learnable codebook

Jiali_duan Multimodal Alignment Using Representation Codebook 2022

[TOC] Title: Multi-modal Alignment Using Representation Codebook Author: Jiali Duan, Liqun Chen et. al. Publish Year: 2022 CVPR Review Date: Tue, Aug 9, 2022 Summary of paper Motivation aligning signals from different modalities is an important step as it affects the performance of later stage such as cross-modality fusion. since image and text often reside in different regions of the feature space, directly aligning them at instance level is challenging especially when features are still evolving during training. Contribution in this paper, we treat image and text as two “views” of the same entity, and encode them into a joint vision-language coding space spanned by a dictionary of cluster centres (codebook). to further smooth out the learning process, we adopt a teacher-student distillation paradigm, where the momentum teacher of one view guides the student learning of the other. Some key terms Types of Vision language pre-training tasks ...

August 9, 2022 · 3 min · 513 words · Sukai Huang

A preliminary idea about using instruction following as a intermediate training step towards a general learning-based agent

This page is not completed yet You need password to access to the content, go to Slack *#phdsukai to find more. ...

August 7, 2022 · 5 min · Sukai Huang

Supplementary explanations for proposed methods and PhD thesis structure

You need password to access to the content, go to Slack *#phdsukai to find more. ...

August 4, 2022 · 11 min · Sukai Huang

Younggyo_seo Masked World Models for Visual Control 2022

[TOC] Title: Masked World Models for Visual Control 2022 Author: Younggyo Seo et. al. Publish Year: 2022 Review Date: Fri, Jul 1, 2022 https://arxiv.org/abs/2206.14244?context=cs.AI https://sites.google.com/view/mwm-rl Summary of paper Motivation TL:DR: Masked autoencoders (MAE) has emerged as a scalable and effective self-supervised learning technique. Can MAE be also effective for visual model-based RL? Yes! with the recipe of convolutional feature masking and reward prediction to capture fine-grained and task-relevant information. Some key terms Decouple visual representation learning and dynamics learning ...

July 1, 2022 · 2 min · 227 words · Sukai Huang

A Brief Overview of Rank Based Prioritized Experience Replay 2016

[TOC] Title: Prioritised Experience Replay Author: Neuralnet.ai Publish Year: 25 Feb, 2016 Review Date: Thu, Jun 2, 2022 https://www.neuralnet.ai/a-brief-overview-of-rank-based-prioritized-experience-replay/ Replay memory is essential in RL Replay memory has been successfully deployed in both value based and policy gradient based reinforcement learning algorithms, to great success. The reasons for this success cut right to the heart of reinforcement learning. In particular, replay memory simultaneously solves two outstanding problems with the field. ...

June 2, 2022 · 2 min · 365 words · Sukai Huang

Deepmind Flamingo a Visual Language Model for Few Shot Learning 2022

[TOC] Title: Flamingo: a Visual Language Model for Few-Shot Learning Author: Jean-Baptiste Alayrac et. al. Publish Year: Apr 2022 Review Date: May 2022 Summary of paper Flamingo architecture Pretrained vision encoder: from pixels to features the model’s vision encoder is a pretrained Normalizer-Free ResNet (NFNet) they pretrain the vision encoder using a contrastive objective on their datasets of image and text pairs, using the two term contrastive loss from paper “Learning Transferable Visual Models From Natural Language Supervision” ...

May 11, 2022 · 3 min · Sukai Huang

Angela_fan Augmenting Transformer With Knn Composite Memory for Dialog 2021

[TOC] Title: Augmenting Transformers with KNN-based composite memory for dialog Author: Angela Fan et. al. Publish Year: 2021 Review Date: Apr 2022 Summary of paper Motivation The author proposed augmenting generative Transformer neural network with KNN based Information Fetching module Each KIF module learns a read operation to access fix external knowledge (e.g., WIKI) The author demonstrated the effectiveness of this approach by identifying relevant knowledge required for knowledgeable but engaging dialog from Wikipedia, images and human-written dialog utterances. ...

April 21, 2022 · 3 min · Sukai Huang

Hao_hu Generalisable Episodic Memory for Drl 2021

[TOC] Title: Generalisable episodic memory for Deep Reinforcement Learning Author: Hao Hu et. al. Publish Year: Jun 2021 Review Date: April 2022 Summary of paper Motivation The author proposed Generalisable Episodic Memory (GEM), which effectively organises the state-action values of episodic memory in a generalisable manner and supports implicit planning on memorised trajectories. so compared to traditional memory table, GEM learns a virtual memory table memorized by deep neural networks to aggregate similar state-action pairs that essentially have the same nature. ...

April 7, 2022 · 2 min · Sukai Huang

Ilya_kostrikov Offline Rl With Implicit Q Learning 2021

[TOC] Title: Offline Reinforcement Learning with Implicit Q-learning Author:Ilya Kostrikov et. al. Publish Year: 2021 Review Date: Mar 2022 Summary of paper Motivation conflict in offline reinforcement learning offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behaviour policy (old policy) that collected the dataset while at the same time minimizing the deviation from the behaviour policy so as to avoid errors due to distributional shift (e.g., obtain out of distribution actions) -> the challenge is how to constrain those unseen actions to be in-distribution. (meaning there is no explicit Q-function for actions, and thus the issue of unseen action is gone) all the previous solutions like 1. limit how far the new policy deviates from the behaviour policy and 2. assign low value to out of distribution actions impose a trade-off between how much the policy improve and how vulnerable it is to misestimation due to distributional shift. ...

March 22, 2022 · 4 min · Sukai Huang

Qinqing_zheng Online Decision Transformer 2022

[TOC] Title: Online Decision Transformer Author: Qinqing Zheng Publish Year: Feb 2022 Review Date: Mar 2022 Summary of paper Motivation the author proposed online Decision transformer (ODT), an RL algorithm based on sequence modelling that blends offline pretraining with online fine-tuning in a unified framework. ODT builds on the decision transformer architecture previously introduced for offline RL quantify exploration compared to DT, they shifted from deterministic to stochastic policies for defining exploration objectives during the online phase. They quantify exploration via the entropy of the policy similar to max-ent RL frameworks. ...

March 21, 2022 · 4 min · Sukai Huang

Sebastian_borgeaud Improving Language Models by Retrieving From Trillions of Tokens 2022

[TOC] Title: Improving language models by retrieving from trillions of tokens Author: Sebastian Borgeaud et. al. Publish Year: Feb 2022 Review Date: Mar 2022 Summary of paper Motivation in order to decrease the size of language model, this work suggested retrieval from a large text database as a complementary path to scaling language models. they equip models with the ability to directly access a large dataset to perform prediction – a semi-parametric approach. ...

March 21, 2022 · 2 min · Sukai Huang

Machel_reid Can Wikipedia Help Offline Rl 2022

[TOC] Title: Can Wikipedia Help Offline Reinforcement Learning Author: Machel Reid et. al. Publish Year: Mar 2022 Review Date: Mar 2022 Summary of paper Motivation Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Moreover, when the model is trained from scratch, it suffers from slow convergence speeds In this paper, they look to take advantage of this formulation of reinforcement learning as sequence modelling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when fine tuned on offline RL tasks (control, games). ...

March 16, 2022 · 2 min · Sukai Huang

Stephen_cresswell Generalised Domain Model Acquisition From Action Traces 2013

[TOC] Title: Generalised Domain Model Acquisition from Action Traces (LOCM2) Author: Stephen Cresswell et. al. Publish Year: 2013 Review Date: Mar 2022 Summary of paper Motivation One approach to the problem of formulating domain models for planning is to learn the models from example action sequences. This work extended LOCM by allowing multiple parameterised state machine to represent a single object. In other words, it is possible to automatically infer the underlying transition system from sample action sequences of the domain. Using such an approach removes the necessity for the domain expert to also be an expert at modelling transition systems. ...

March 15, 2022 · 2 min · Sukai Huang

Wenfeng_feng Extracting Action Sequences From Texts by Rl

[TOC] Title: Extracting Action Sequences from Texts Based on Deep Reinforcement Learning Author: Wenfeng Feng et. al. Publish Year: Mar 2018 Review Date: Mar 2022 Summary of paper Motivation the author want to build a model that learns to directly extract action sequences without external tools like POS tagging and dependency parsing results… Annotation dataset structure example Model they exploit the framework to learn two models to predict action names and arguments respectively. ...

March 15, 2022 · 1 min · Sukai Huang

Shivam_miglani Nltopddl Learning From Nlp Manuals 2020

[TOC] Title: NLtoPDDL: One-Shot Learning of PDDL Models from Natural Language Process Manuals Author: Shivam Miglani et. al. Publish Year: 2020 Review Date: Mar 2022 Summary of paper Motivation pipeline Pipeline architecture Phase 1 we have a DQN that learns to extract words that represent action name, action arguments, and the sequence of actions present in annotated NL process manuals. (why only action name, do we need to extract other information???) Again, why this is called DQN RL? is it just normal supervised learning… (Check EASDRL paper to understand Phase 1) ...

March 14, 2022 · 2 min · Sukai Huang