Crazy day today but I kept things moving.

I am going to let the cat out of the bagel on one (just one) of three secret projects I’m working on.

TECHNICAL WARNING ALERT…YOU’VE BEEN WARNED

As many of you know, I use dictation and transcription as my main writing method. I speak my stories into a voice recorder, use Dragon to transcribe the audio, and then use a home-brew of Microsoft Word macros to clean up the text.

A few days ago, my friend Nat Connors (founder of KindleTrends) sent me a note telling me to check out OpenAI’s new Whisper speech transcription model. I hadn’t heard of it. Check it out here: https://huggingface.co/spaces/openai/whisper

Essentially, Whisper takes audio and uses AI to create very accurate transcriptions. It can even transcribe YouTube videos.

I played with a few demos and I was blown away. It rivaled Dragon and, in some cases, was better. And, it is free (if you’re techy and know how to use it. Most would have no clue where to start).

Everyone I’ve seen on the net so far has given a thumbs-up to Whisper and has advised that it is quite good and can only get better.

There are a few things that I immediately noticed about Whisper:

#1: Whisper transcribes punctuation. It puts the commas and periods in all the right places most of the time, even though the speaker doesn’t speak the punctuation. This is huge. Most programmers I’ve talked to about NLP have lamented how difficult this is. Whisper does it easily.

#2: If one could use Whisper for transcription, they effectively would not need to speak punctuation as often, therefore increasing their speed. It would also remove the need to speak “Dragonese.” You still have to speak quotes and parentheses, but not periods or commas.

/#3: This is precisely the viable alternative that Mac users have been waiting for. Dragon has always been a Windows app (the Mac version was discontinued). But now Mac users may have a better option, especially those people who don’t want to buy a Windows computer or use Parallels.

So what exactly am I doing…?

I am working with a data scientist to develop a personal application (for me only) that I can run Whisper on. I want to see how accurate it is for my dictation style. I think I have worked out a way to use it (without speaking punctuation) along with my dictation macro in Microsoft Word.

This is quite a technical project and while the application won’t be difficult to develop (it’s shockingly easy), there are all sorts of little problems to solve that make this one of the more advanced projects I’ve taken on lately. One of those problems is how to handle other types of punctuation, especially dialogue, because Whisper does not do them well. I think I have a solution to that—but it lies in the realm of VBA and Microsoft Word macros, and/or Python automation.

My initial impression with Whisper is that it’s not consumer-ready, but it is there for the taking if you know how to navigate Python and Microsoft Word macros. I might even be able to surpass Dragon’s quality in many instances.

There are a lot of good reasons to use Whisper, but Whisper alone is not a suitable replacement for Dragon. You have to take the additional steps of applying Word macros and/or Python automation. That’s where the trouble is, but fortunately, I’ve spent the last year developing my expertise in Word macros, so I’m up to the job…

Assuming I can get this to work and assuming it produces workable results:

/#1: I can share with the community how I did it so that so anyone who wanted to could replicate it. Or, a smart programmer could develop a solution. (I’m not about to get into the business of software development. I’d rather be writing and I I’ve learned that it is way too much for me.)

#2: I will improve the quality of my transcription, which will lead to fewer errors.

#3: I will have an application that may be better than Dragon. If Dragon ever becomes deprecated (which it might), I can continue dictating and transcribing with no impacts to my writing workflow. In other words, this is an insurance policy. Remember that Microsoft acquired Nuance software last year, and the writing is on the wall for Dragon. It may be rolled into Office (Word has a dictation button now and it’s quite good), but there’s still big question mark on whether Microsoft will also roll in the TRANSCRIPTION model. I’m getting the sinking feeling that Microsoft may not as most everyday users will not have a need for it.

We’ll see what happens. There are always tradeoffs—for instance it may take extra time for Whisper to run. Or, I may need to use a cloud GPU to decrease processing time, which may cost money. But the point is that I am giving myself options, and I’m exploring what’s possible with AI.

In any case, this could burn up in flames too. Honestly, if that happens, I will learn a LOT, so either way, I win. I can only count on one hand the people delving into this right now, and that’s exciting.

Anyway, that’s one of the secret projects I’m working. Not so secret now…

YTD Word Count: 245,800

Plan: 1,252,000

Words Left to Write 1,006,200

Words Over/-Under Plan: 12,050

Days Ahead/-Behind: 4.38

Projected Annual Word Count: 1,055,494

Projected Decade Word Count: 10,554,941

Deadline: 12/31/2023

Days to Go Until 12/31/23: 406

Word Count Average: 2,892

Help a brother out and share this content