Apple Devices Feature Impressive Speech-to-Text Transcription in Developer Betas

0
59
Apple Devices Feature Impressive Speech-to-Text Transcription in Developer Betas

If you ever need to convert audio or video into text, many of the latest apps utilize OpenAI’s Whisper model. Chances are, if you’re using apps like MacWhisper to transcribe lectures or meetings, or to create subtitles for YouTube content, you’re leveraging this model.

However, iOS 26 and Apple’s other developer betas have introduced their own transcription frameworks, with tests indicating they can achieve similar accuracy to Whisper while operating at over twice the speed.

For those familiar with the built-in dictation features on Apple devices, this function is managed through Apple’s proprietary speech framework. In the latest betas, there are beta versions of SpeechAnalyzer and SpeechTranscriber available for developers to incorporate into their applications.

Utilize the Speech framework to recognize spoken words in either recorded or live audio. The dictation support in the keyboard relies on speech recognition to convert audio into text. This framework operates similarly but can function independently of the keyboard.

For instance, you could implement speech recognition for verbal command recognition or handle text dictation in various sections of your app. The framework offers a class, SpeechAnalyzer, along with several modules that can be integrated to perform specific types of analysis and transcription. Many applications only require the SpeechTranscriber module, which handles speech-to-text conversions.

MacStories’ John Voorhees asked his son to develop a command-line tool to explore this new functionality, and he was highly impressed by the results.

I inquired about the effort needed to build a command-line tool for transcribing audio and video files using SpeechAnalyzer and SpeechTranscriber. He estimated it would take around 10 minutes, and he was quite close. Ultimately, it took me longer to install macOS Tahoe post-WWDC than it took Finn to create Yap, a straightforward command-line utility that takes audio and video files as input and produces SRT- and TXT-formatted transcripts.

Using a 34-minute video for testing, he compared it against MacWhisper and VidCap, two leading transcription applications. He found that Apple’s modules matched the accuracy of these alternatives, but performed more than twice as fast as the most efficient existing application, MacWhisper with the Large V3 Turbo model:

Application Transcription Duration
Yap (utilizing Apple’s framework) 0:45
MacWhisper (Large V3 Turbo) 1:41
VidCap 1:55
MacWhisper (Large V2) 3:55

He asserts that while this may appear as a minor enhancement for singular tasks, the cumulative benefits will be significant during batch transcriptions or when needing to regularly transcribe files, such as students with lecture notes.

If you are using the macOS Tahoe developer beta, you can download Yap from GitHub to experience it for yourself.

Featured Accessories

Image: DMN capture of a YouTube video subtitle file