Captions now adorn more than a billion YouTube videos, and the video site is making sure to keep its text add-ons up to date. A blog post reveals that YouTube’s automatic captions are now able to recognize three common sound effects and include them in its translations for hard-of-hearing viewers.
The sound effects in question are applause, laughter, and music, which YouTube chose to encode within its automatic caption system because they are common and because their respective meanings are unambiguous. “While the sound space is obviously far richer and provides even more contextually relevant information than these three classes,” reads YouTube’s introductory blog post, “the semantic information conveyed by these sound effects in the caption track is relatively unambiguous, as opposed to sounds like [RING] which raises the question of ‘what was it that rang – a bell, an alarm, a phone?’”
YouTube was able to add sound effects to its automatic captions thanks to the use of a Deep Neural Network that galvanized the machine-learning process. If you’re the kind of person who can understand the technical details of that branch of programming (I, unfortunately, am not) you can glean more information about YouTube’s process by reading its blog post. The rest of us will have to be content to watch the automatic sound effect captions in action. They show up in the below video offered by YouTube:
Footballco is betting on the growth of soccer in the United States. Over the past few…
As the co-host of the Creators in Fashion show that took place on April 25, Matthew Patrick (a.k.a. MatPat)…
Welcome to Millionaires, where we profile creators who have recently crossed the one million follower…
Alphabet's earnings report for the first quarter of 2024 sent its stock price soaring sky-high.…
Snap has had a rocky couple of years: several quarters of flat growth or declines,…
Welcome to On the Rise, where we find and profile breakout creators who are in…