Proceedings of the 3rd Cognitive Mobility Conference (COGMOB 2024) / 25 February 2025
Real-Time Media Synthesis from Speech: A New Era in Passenger Entertainment
This research explores the development and implementation of a software application that utilizes live human speech to generate dynamic visual media, aiming to revolutionize smart passenger entertainment. In order to bridge the gap between spoken and visual content creation, this innovative approach introduces a novel method for media production. The software employs advanced voice recognition and synthesis algorithms to convert spoken words into visually engaging and animated representations. Key aspects and algorithms necessary for accurate analysis and interpretation are examined. Deep learning models are used to extract linguistic information from speech input, enabling real-time processing and interpretation. The extracted text is then transformed into moving media content using a modified image synthesis model, resulting in visually dynamic outputs from spoken input.