Decent day today. I spent most of the afternoon experimenting with a new dictation method. As you all know, I have been experimenting with the Whisper speech-to-text model by OpenAI. I have had a working application since September. I have figured out a way to use Whisper to dictate nonfiction over two times faster. It involves using Whisper for transcription, Microsoft Word macros for editing, and a little ingenuity.
I walked my dog this evening and did a 30-minute dictation session. Normally, if I were using Dragon, I would expect around 1000 words, maybe more or less depending on the content. Using Whisper, I clocked 2000 words–all clean. Then, I cleaned it up further with my Grammar Assistant app. Honestly, it's pretty amazing.
Whisper is great because you don't have to speak punctuation. It is trained on natural human speech. It recognizes where periods should go 100% of the time, and that's not an exaggeration. Since I began using Whisper in September, I have not had to fix a single period or question mark. It gets those right every single time. By not having to speak punctuation, you can speak much, much faster.
Whisper doesn't handle commas well, though. It's not bad, but it's not amazing. It doesn't generate any commas that are grammatically incorrect, but there are some sentence structures where adding comma would improve clarity. Whisper gets the transcription right but frequently misses the comma. Grammarly does a good job of cleaning up these sentences, so it's not a huge deal, but it is a glaring gap for now.
There are some other issues with Whisper–namely that it doesn't handle punctuation well at all. If you accidentally speak a comma or a period, or heaven forbid, a quotation mark sequence like “hello, how are you?”, Whisper makes a mess. If I dictated the previous sentence, Whisper would transcribe it like this: “If you accidentally speak a comma, or a period. Or heaven forbid a quotation mark like open quote, hello, comma, how are you question mark close quote, comma, then Whisper makes a mess, period.” Yikes.
Whisper also doesn't understand the command “new line”, which, if you understand anything about dictation, that's a non-starter. Like I said, an absolute mess. To be more precise, Whisper's punctuation transcription is a grease fire inside a dumpster fire.
However, I can fix a lot of that with Microsoft Word macros. I worked with a developer to identify most of these weird issues so that we can fix them and get the output very similar to if I had used Dragon. While it's not perfect, I'd give it around a B in most cases, sometimes a B+.
Anyhoo, this is why I'm only testing nonfiction right now with Whisper. Hopefully with future updates, Whisper will do punctuation marks better. When that happens, it will be a gamechanger.
But with nonfiction, I can start recording, speak naturally without having to worry about punctuation. Whisper will put the periods and question marks in the right places, get most of the commas correct, and then I can use Word macros to clean up remaining punctuation issues. This results in being able to speak approximately 2x faster. A 30-minute walk resulted in me hitting 65% of my daily quota. A 1-hour walk would have resulted in 3600 words, or 130% of my quota.
While this method is not perfect, it's worth exploring.
This isn't something I plan on sharing. At some point, I'd expect Microsoft to integrate Whisper into Microsoft Word. Out of all the announcements that OpenAI made last week, I think the creation of the Whisper API was way more important than the ChatGPT API. Whisper is going to change things in a big way by the end of the year. Don't sleep on it.
Have I mentioned that I love AI?