Xiujun_li Oscar Object Semantic Aligned Pro Training for Vision Language Tasks 2020
[TOC] Title: Oscar: Object Semantic Aligned Pro Training for Vision Language Tasks Author: Xiujun Li et. al. Publish Year: 26 Jul 2020 Review Date: Sat, Sep 3, 2022 Summary of paper Motivation Existing method simply concatenates image region features (patch features) and text features as input to the model to be pre-trained and use self-attention to learn image-text semantic alignments in a brute force manner. the lack of explicit alignment information between the image regions and the text poses alignment modelling a weakly-supervised learning task. ...