Captions now adorn more than a billion YouTube videos, and the video site is making sure to keep its text add-ons up to date. A blog post reveals that YouTube’s automatic captions are now able to recognize three common sound effects and include them in its translations for hard-of-hearing viewers.
The sound effects in question are applause, laughter, and music, which YouTube chose to encode within its automatic caption system because they are common and because their respective meanings are unambiguous. “While the sound space is obviously far richer and provides even more contextually relevant information than these three classes,” reads YouTube’s introductory blog post, “the semantic information conveyed by these sound effects in the caption track is relatively unambiguous, as opposed to sounds like [RING] which raises the question of ‘what was it that rang – a bell, an alarm, a phone?’”
YouTube was able to add sound effects to its automatic captions thanks to the use of a Deep Neural Network that galvanized the machine-learning process. If you’re the kind of person who can understand the technical details of that branch of programming (I, unfortunately, am not) you can glean more information about YouTube’s process by reading its blog post. The rest of us will have to be content to watch the automatic sound effect captions in action. They show up in the below video offered by YouTube:
Hey YouTubers! Do you want to be rid of the pesky chore of actually appearing…
Each week, we handpick a selection of stories to give you a snapshot of trends,…
Back in 2024, the National Association of Broadcasters recognized the importance of content creators by…
Too much screen time can be a dangerous thing, and Hoorae is taking that idea literally. The…
The latest product backed by Night's venture studio emerged out of a partnership between the creator…
Indie animation is flourishing on YouTube. From the pop culture juggernaut that is The Amazing…