Abstract: Multimodal large language models (MLLMs) have demonstrated strong language understanding and generation capabilities, excelling in visual tasks like referring and grounding. However, due to ...
Abstract: Monocular 3D object detection has gained considerable attention because of its cost-effectiveness and practical applicability, particularly in autonomous driving and robotics. Most of ...