Proceedings of the 3rd Cognitive Mobility Conference (COGMOB 2024) / 25 February 2025
Dynamic Prompt-Based Approach for Open Vocabulary Multi-Object Tracking
Multi-Object Tracking (MOT) remains a significant challenge in computer vision. While deep learning has improved accuracy, challenges persist. Traditional methods rely on extensive datasets and can only detect classes included in their training, limiting their ability to track new objects. Annotating tracking datasets is complex, requiring unique identification for each object across frames, adding to the difficulty. This research explores integrating open-set object detection with MOT. Using language-based prompts, the system can recognize and track objects beyond predefined classes, adapting to new environments and objects described in natural language. This approach allows users to generate their own datasets for any desired objects. Unlike previous methods, which require significant user interaction, this research offers a user-friendly solution with high accuracy.