Allen Z Ren Robots That Ask for Help Uncertainty Alignment 2023

[TOC] Title: Robots That Ask for Help: Uncertainty Alignment for Large Language Model Planners Author: Allen Z. Ren et. al. Publish Year: 4 Sep 2023 Review Date: Fri, Jan 26, 2024 url: arXiv:2307.01928v2 Summary of paper Motivation LLMs have various capabilities but often make overly confident yet incorrect predictions. KNOWNO aims to measure and align this uncertainty, enabling LLM-based planners to recognize their limitations and request assistance when necessary. Contribution built on theory of conformal prediction Some key terms Ambiguity in NL...

<span title='2024-01-26 17:29:29 +1100 AEDT'>January 26, 2024</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;510 words&nbsp;·&nbsp;Sukai Huang

Harsh_jhamtani Natural Language Decomposition and Interpretation of Complex Utterances 2023

[TOC] Title: Natural Language Decomposition and Interpretation of Complex Utterances Author: Jacob Andreas Publish Year: 15 May 2023 Review Date: Mon, May 22, 2023 url: https://arxiv.org/pdf/2305.08677.pdf Summary of paper Motivation natural language interface often require supervised data to translate user request into structure intent representations however, during data collection, it can be difficult to anticipate and formalise the full range of user needs we introduce an approach for equipping a simple language to code model to handle complex utterances via a process of hierarchical natural language decomposition....

<span title='2023-05-22 09:54:04 +1000 AEST'>May 22, 2023</span>&nbsp;·&nbsp;10 min&nbsp;·&nbsp;2088 words&nbsp;·&nbsp;Sukai Huang

Siddharth_karamcheti Language Driven Representation Learning for Robotics 2023

[TOC] Title: Language-Driven Representation Learning for Robotics Author: Siddharth Karamcheti et. al. Publish Year: 24 Feb 2023 Review Date: Fri, Mar 3, 2023 url: https://arxiv.org/pdf/2302.12766.pdf Summary of paper Motivation recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks. leveraging methods such as masked autoencoding and contrastive learning, these representations exhibit strong transfer to policy learning for visuomotor control but robot learning encompasses a diverse set of problems beyond control including grasp affordance prediction, language-conditioned imitation learning, and intent scoring for human-robot collaboration amongst others....

<span title='2023-03-03 16:16:19 +1100 AEDT'>March 3, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;463 words&nbsp;·&nbsp;Sukai Huang
Relatedness and naturalness

Jie_huang Can Language Models Be Specific How 2022

[TOC] Title: Can Language Models Be Specific? How? Author: Jie Huang et. al. Publish Year: 11 Oct 2022 Review Date: Tue, Nov 8, 2022 Summary of paper Motivation they propose to measure how specific the language of pre-trained language models (PLM) is, To achieve this, they introduced a novel approach to build a benchmark for specificity testing by forming masked token prediction tasks with prompts. for instance given “J.K. Rowling was born in [MASK]”, we want to test whether a more specific answer will be better filled by PLMs....

<span title='2022-11-08 20:41:04 +1100 AEDT'>November 8, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;429 words&nbsp;·&nbsp;Sukai Huang
overall architecture

Yuetian_weng an Efficient Spatio Temporal Pyramid Transformer for Action Detection 2022

[TOC] Title: An Efficient Spatio-Temporal Pyramid Transformer for Action Detection Author: Yuetian Weng et. al. Publish Year: Jul 2022 Review Date: Thu, Oct 20, 2022 Summary of paper Motivation the task of action detection aims at deducing both the action category and localisation of the start and end moment for each action instance in a long, untrimmed video. it is non-trivial to design an efficient architecture for action detection due to the prohibitively expensive self-attentions over a long sequence of video clips To this end, they present an efficient hierarchical spatial temporal transformer for action detection Building upon the fact that the early self-attention layer in Transformer still focus on local patterns....

<span title='2022-10-20 19:06:41 +1100 AEDT'>October 20, 2022</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;649 words&nbsp;·&nbsp;Sukai Huang
add object detection pretrain model

Pengchuan_zhang Vinvl Revisiting Visual Representations in Vision Language Models 2021

[TOC] Title: VinVL: Revisiting Visual Representations in Vision Language Models Author: Pengchuan Zhang et. al. Publish Year: 10 Mar 2021 Review Date: Sat, Sep 3, 2022 Summary of paper Motivation In our experiments we feed the visual features generated by the new object detection model into a Transformer-based VL fusion model Oscar. And utilise an improved approach OSCAR + to pretrain the VL model Contribution has a bigger Object Detection model with larger amount of training data, called “ResNeXt-152 C4” Some key terms Vision Language Pretraining...

<span title='2022-09-03 17:17:47 +1000 AEST'>September 3, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;332 words&nbsp;·&nbsp;Sukai Huang
Illustration of DiffCSE

Yung_sung_chuang Diffcse Difference Based Contrastive Learning for Sentence Embeddings 2022

[TOC] Title: DiffCSE: Difference Based Contrastive Learning for Sentence Embeddings Author: Yung-Sung Chuang et. al. Publish Year: 21 Apr 2022 Review Date: Sat, Aug 27, 2022 Summary of paper Motivation DiffCSE learns sentences that are sensitive to the difference between the original sentence and and edited sentence. Contribution we propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings Some key terms DiffCSE this is an unsupervsied contrastive learning framework rather than model architecture Contrastive learning in single modality data...

<span title='2022-08-27 16:03:42 +1000 AEST'>August 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;351 words&nbsp;·&nbsp;Sukai Huang
Different architectures for image and text retrieval

Gregor_geigle Retrieve Fast Rerank Smart Cooperative and Joint Approaches for Improved Cross Modal Retrieval 2022

[TOC] Title: Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval Author: Gregor Geigle et. al. Publish Year: 19 Feb, 2022 Review Date: Sat, Aug 27, 2022 Summary of paper Motivation they want to combine the cross encoder and the bi encoder advantages and have a more efficient cross-modal search and retrieval efficiency and simplicity of BE approach based on twin network expressiveness and cutting-edge performance of CE methods....

<span title='2022-08-27 00:31:38 +1000 AEST'>August 27, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;453 words&nbsp;·&nbsp;Sukai Huang
MP-Net structure

Kaitao_song Mpnet Masked and Permuted Retrain for Language Understanding 2020

[TOC] Title: MPNet: Masked and Permuted Pre-training for Language Understanding Author: Kaitao Song et. al. Publish Year: 2020 Review Date: Thu, Aug 25, 2022 Summary of paper Motivation BERT adopts masked language modelling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT is all attention block and the positional embedding is the only info that care about the ordering, BERT neglects dependency among predicted tokens...

<span title='2022-08-25 12:24:55 +1000 AEST'>August 25, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;378 words&nbsp;·&nbsp;Sukai Huang
learnable codebook

Jiali_duan Multimodal Alignment Using Representation Codebook 2022

[TOC] Title: Multi-modal Alignment Using Representation Codebook Author: Jiali Duan, Liqun Chen et. al. Publish Year: 2022 CVPR Review Date: Tue, Aug 9, 2022 Summary of paper Motivation aligning signals from different modalities is an important step as it affects the performance of later stage such as cross-modality fusion. since image and text often reside in different regions of the feature space, directly aligning them at instance level is challenging especially when features are still evolving during training....

<span title='2022-08-09 07:26:46 +1000 AEST'>August 9, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;513 words&nbsp;·&nbsp;Sukai Huang

Younggyo_seo Masked World Models for Visual Control 2022

[TOC] Title: Masked World Models for Visual Control 2022 Author: Younggyo Seo et. al. Publish Year: 2022 Review Date: Fri, Jul 1, 2022 https://arxiv.org/abs/2206.14244?context=cs.AI https://sites.google.com/view/mwm-rl Summary of paper Motivation TL:DR: Masked autoencoders (MAE) has emerged as a scalable and effective self-supervised learning technique. Can MAE be also effective for visual model-based RL? Yes! with the recipe of convolutional feature masking and reward prediction to capture fine-grained and task-relevant information. Some key terms Decouple visual representation learning and dynamics learning...

<span title='2022-07-01 12:03:57 +1000 AEST'>July 1, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;227 words&nbsp;·&nbsp;Sukai Huang

Deepmind Flamingo a Visual Language Model for Few Shot Learning 2022

[TOC] Title: Flamingo: a Visual Language Model for Few-Shot Learning Author: Jean-Baptiste Alayrac et. al. Publish Year: Apr 2022 Review Date: May 2022 Summary of paper Flamingo architecture Pretrained vision encoder: from pixels to features the model’s vision encoder is a pretrained Normalizer-Free ResNet (NFNet) they pretrain the vision encoder using a contrastive objective on their datasets of image and text pairs, using the two term contrastive loss from paper “Learning Transferable Visual Models From Natural Language Supervision”...

<span title='2022-05-11 16:35:03 +1000 AEST'>May 11, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

Shivam_miglani Nltopddl Learning From Nlp Manuals 2020

[TOC] Title: NLtoPDDL: One-Shot Learning of PDDL Models from Natural Language Process Manuals Author: Shivam Miglani et. al. Publish Year: 2020 Review Date: Mar 2022 Summary of paper Motivation pipeline Pipeline architecture Phase 1 we have a DQN that learns to extract words that represent action name, action arguments, and the sequence of actions present in annotated NL process manuals. (why only action name, do we need to extract other information?...

<span title='2022-03-14 15:08:45 +1100 AEDT'>March 14, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang