Captions now adorn more than a billion YouTube videos, and the video site is making sure to keep its text add-ons up to date. A blog post reveals that YouTube’s automatic captions are now able to recognize three common sound effects and include them in its translations for hard-of-hearing viewers.
The sound effects in question are applause, laughter, and music, which YouTube chose to encode within its automatic caption system because they are common and because their respective meanings are unambiguous. “While the sound space is obviously far richer and provides even more contextually relevant information than these three classes,” reads YouTube’s introductory blog post, “the semantic information conveyed by these sound effects in the caption track is relatively unambiguous, as opposed to sounds like [RING] which raises the question of ‘what was it that rang – a bell, an alarm, a phone?’”
YouTube was able to add sound effects to its automatic captions thanks to the use of a Deep Neural Network that galvanized the machine-learning process. If you’re the kind of person who can understand the technical details of that branch of programming (I, unfortunately, am not) you can glean more information about YouTube’s process by reading its blog post. The rest of us will have to be content to watch the automatic sound effect captions in action. They show up in the below video offered by YouTube:
For years, Netflix has wanted to make its name as the home of ultra-premium content.…
'Tis the season for festive holiday beverages, and some of YouTube's biggest channels are raising…
Does generative AI represent the future of the film world, or is it an existential…
In its latest deal with TikTok, Universal Music Group said it's all about "[promoting] human…
Each week, we handpick a selection of stories to give you a snapshot of trends,…
Platforms like Patreon and OnlyFans let creators distribute paywalled videos that can only be watched…