VLM/LLM for Embodied Agents, LLMs working as part of the policy

The study in this field is very messy I should say, a lot of researchers coming from different background and most of them try to publish their own embodied environments and baseline models. There is a lack of systematic study in this field. Most importantly, their model are really difficult to reproduce. In fact, there is no standard phrase for this research field. Some people call it instruction following with LM, some people call it language grounding in embodied environments, some people call it instruction-following with RL and all the papers in this area did not even try to reproduce other’s work and compare with each other. So, I want to say be careful to enter this area. ...

March 1, 2025 · 3 min · 444 words · Sukai Huang

Neuro Symbolic Works From A.Prof. Hamid @ Monash

There are three papers from A.Prof. Hamid Rezatofighi :https://vl4ai.erc.monash.edu/positions.html NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning Multi-Objective Multi-Agent Planning for Discovering and Tracking Multiple Mobile Objects 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 # input bibtex here 1. @misc{cai2025naverneurosymboliccompositionalautomaton, title={NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning}, author={Zhixi Cai and Fucai Ke and Simindokht Jahangard and Maria Garcia de la Banda and Reza Haffari and Peter J. Stuckey and Hamid Rezatofighi}, year={2025}, eprint={2502.00372}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.00372}, } 2. @article{DBLP:journals/corr/abs-2409-10196, author = {Zhixi Cai and Cristian Rojas Cardenas and Kevin Leo and Chenyuan Zhang and Kal Backman and Hanbing Li and Boying Li and Mahsa Ghorbanali and Stavya Datta and Lizhen Qu and Julian Gutierrez Santiago and Alexey Ignatiev and Yuan{-}Fang Li and Mor Vered and Peter J. Stuckey and Maria Garcia de la Banda and Hamid Rezatofighi}, title = {{NEUSIS:} {A} Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex {UAV} Search Missions}, journal = {CoRR}, volume = {abs/2409.10196}, year = {2024} } 3. @inproceedings{DBLP:conf/eccv/KeCJWHR24, author = {Fucai Ke and Zhixi Cai and Simindokht Jahangard and Weiqing Wang and Pari Delir Haghighi and Hamid Rezatofighi}, title = {{HYDRA:} {A} Hyper Agent for Dynamic Compositional Visual Reasoning}, booktitle = {{ECCV} {(20)}}, series = {Lecture Notes in Computer Science}, volume = {15078}, pages = {132--149}, publisher = {Springer}, year = {2024} } 4. @article{DBLP:journals/tsp/NguyenVVRR24, author = {Hoa Van Nguyen and Ba{-}Ngu Vo and Ba{-}Tuong Vo and Hamid Rezatofighi and Damith C. Ranasinghe}, title = {Multi-Objective Multi-Agent Planning for Discovering and Tracking Multiple Mobile Objects}, journal = {{IEEE} Trans. Signal Process.}, volume = {72}, pages = {3669--3685}, year = {2024} } NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding (https://arxiv.org/abs/2502.00372) ...

February 26, 2025 · 4 min · 678 words · Sukai Huang

Awesome LLMs Reasoning Abilities Papers

Demystifying Long Chain-of-Thought Reasoning in LLMs This study systematically investigate the mechanics of long CoT reasoning, identifying the key factors that enable models to generate long CoT trajectories and providing practical guidance for optimizing training strategies to enhance long CoT reasoning in LLMs https://arxiv.org/pdf/2502.03373.pdf DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning This work introduces first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL and achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. ...

February 9, 2025 · 2 min · 350 words · Sukai Huang