There are three papers from A.Prof. Hamid Rezatofighi :https://vl4ai.erc.monash.edu/positions.html
- NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning
- NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
- HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
- Multi-Objective Multi-Agent Planning for Discovering and Tracking Multiple Mobile Objects
|
|
NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding
(https://arxiv.org/abs/2502.00372)
Motivation
Abstract part: This paper explores VG beyond basic perception, highlighting challenges for methods that require reasoning like human cognition. Recent advances in large language methods (LLMs) and Vision-Language methods (VLMs) have improved abilities for visual comprehension, contextual understanding, and reasoning. These methods are mainly split into end-to-end and compositional methods, with the latter offering more flexibility.
Limitation of the existing work: Compositional approach still struggle with complex reasoning with language-based logical representations.
Contribution
Propose a deterministic finite-state automation-based system that dynamically transitions between state based on intermediate results, incorporating self-correction mechanism at each step to improve robustness.
NAVER Pipeline
Concerns
ProbLog code was designed back to 2007. It seems that LLMs in the Logic Generation Module were only used for translation (semantic parsing) rather than performing actual reasoning process.
NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions
(https://arxiv.org/abs/2409.10196)
Motivation
The pipeline of the proposed neuro-symbolic system, NEUSIS, designed for interpretable UAV search and navigation in realistic scenarios.
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
(https://arxiv.org/abs/2403.12884)
Motivation
Potential future direction
- How to extend from single step decision making problem to long-term sequential decision making problems?
- How can we better elicit the reasoning capabilities of LLMs? Right now, they are primarily used as code generators or semantic parsers for natural language queries. This approach misses the opportunity to leverage their rich commonsense knowledge embedded in LLMs for decision-making.
- Nevertheless, LLMs are prone to so called “Hallucination”, we need to discover how to improve the reliability of LLM for reasoning before we go into this direction.
Multi-Objective Multi-Agent Planning for Discovering and Tracking Multiple Mobile Objects
Shall relate this to Adaptive Temporal Planning for Multi-Robot Systems in Operations and Maintenance of Offshore Wind Farms
Check this survey paper: https://arxiv.org/abs/2411.15296