Széchenyi Plan Plus | Government of Hungary. Funded by the European Union. NextGeneration EU.

EN HU
  • Discover
    • News
    • Events
    • Report
  • Research & development
    • Areas of application
    • Research topics
  • Resources
    • Publications
    • Lead researchers
  • Partners
    • Consortium members
    • International partners
    • Industry contacts
    • University contacts
  1. Home
  2. Publications
Proceedings of the 3rd Cognitive Mobility Conference (COGMOB 2024) / 25 February 2025

Real-Time Media Synthesis from Speech: A New Era in Passenger Entertainment

This research explores the development and implementation of a software application that utilizes live human speech to generate dynamic visual media, aiming to revolutionize smart passenger entertainment. In order to bridge the gap between spoken and visual content creation, this innovative approach introduces a novel method for media production. The software employs advanced voice recognition and synthesis algorithms to convert spoken words into visually engaging and animated representations. Key aspects and algorithms necessary for accurate analysis and interpretation are examined. Deep learning models are used to extract linguistic information from speech input, enabling real-time processing and interpretation. The extracted text is then transformed into moving media content using a modified image synthesis model, resulting in visually dynamic outputs from spoken input.

Url
https://doi.org/10.1007/978-3-031-81799-1_24
Authors
Csippán, Gy.
Kővári, B.
Bécsi, T.
Leginusz, L.
Institutes

Kapcsolat

Prof. Dr. Péter Gáspár

H-1111 Budapest, Kende u. 13-17.

+36 1 279 6000

autonom@nemzetilabor.hu

© 2020-2023 National Laboratory for Autonomous Systems, Budapest