Health IT and Electronic Health Activate your FREE membership today |  Log-in

Community Blog

Mar 19 2012   11:45AM GMT

Training, training and hardware: Keys to success for medical voice recognition

Posted by: Jenny Laurello
CLP, NLP, Voice recognition

Guest post by: Dr. Andres Jimenez, CEO, ImplementHIT

Voice recognition is a powerful tool, and I have literally seen it transform the lives of physicians with one month’s worth of unsigned notes after a single training session.  However, as good as voice recognition software is these days, a lot can go awry if the following considerations aren’t made from an IT implementation perspective.


I have studied clinical adoption of health IT very carefully over the last six years as part of my Ph.D. work in the area, as well as in my former role of clinical director of content and training for Allscripts, and today in my current role as CEO of ImplementHIT.  What we have found is that training quality that can typically make up 20-25% of the total implementation costs is indirectly related to productivity loss, which often accounts for another 20% of total implementation costs if not more.  As training quality goes up, productivity loss goes down.  It seems logical, but notice I said training quality, not training time, which does not have the same relationship.  In fact, I have commonly seen too much training be the main reason for poor implementations of voice recognition technology.  Why?  Because the key barrier to voice recognition adoption is getting past the accuracy hurdle, and knowing how to leap that hurdle is rarely retained when too much training is provided to users of voice recognition prior to implementation. 

Several large academic institutions that have successfully rolled out voice recognition to 1000 or more users have shown that 99% accuracy is possible, rather consistently across large user bases.  Although with the latest versions of voice recognition software you can achieve 95% accuracy with about 60 minutes of training, a 5% error rate will disable a clinical practice. Furthermore, because checking for 5% worth of errors while dictating can cause physicians to say a couple of words at a time, check the output of text, dictate another few words, check the output of text; you get poorer overall recognition accuracy.  This is because the latest versions of voice recognition software rely on the context of spoken sentences to select each word correctly.  Speaking fluently in full sentences actually improves accuracy whereas constant error checking even 5% of the time worsens accuracy. 

So what should you do? It’s simple — stick to the basics!  In an ambulatory setting voice recognition can easily be the tool to help your organization achieve the dream of surpassing pre-implementation productivity levels.  This occurs when physicians use advanced features such as macros; however, when they are first getting started, only show them how to train the voice recognition software to learn how to interpret correctly when it makes mistakes.  Have the clinicians repeat correcting commonly misinterpreted words 20 or 30 times in a row until they automate the basic sequence of steps (usually 5-6).  Then, once they go live, if they notice a mistake they don’t have to search for the manual to remember how to correct Dragon. 

Correcting your voice recognition software consistently for about 2-3 weeks will get your users to 99% accuracy.  After those 2-3 weeks not only have they passed the accuracy hurdle, they can easily learn more intermediate and advanced features of their software and you are home sweet home!


I am not an electrical engineer, nor a recording studio professional, but I know enough about the process of getting the sound of your voice into the processing engine in your computer.  The two main components are the microphone, and the connection of the microphone to your computer (USB, Bluetooth) or analog via a sound card.  Voice recognition software must process a lot of information from your voice to achieve high accuracy levels, and wired USB is certainly preferred. 

This works fine when using a fixed work station, but is probably a JCAHOTM violation when using a mobile device and roaming from exam room to exam room.  New laptops with the latest Bluetooth technology can support enough throughput from a high quality microphone to the computer to allow high accuracy levels.  However, when hospitals are purchasing laptops or tablets in bulk without having voice recognition requirements in mind, the latest Bluetooth is one of the first things to get cut to minimize costs.  The sound card is usually another area of savings, although when it comes to wired solutions analog connections are typically getting phased out for USB connections to the computer. 

Another important hardware consideration when it comes to voice recognition software is PC RAM and processor speed.  Obviously, the more the better, as it can decrease latency times between your spoken words and text production by the software.  Although I have not seen many studies related to this, I do believe high latencies cause disruption in the dictation flow from physicians. 

Speaking from experience in my own clinical practice, it is easier to flow your dictation when you see text-produced sentences at a time as opposed to paragraphs because of a slow computer.  Remember, as discussed above, disruptive dictations and speaking in short phrases versus sentences negatively impacts overall accuracy.  If you want the minimum technical requirements, you should have an Intel Pentium4 or later or AMD Athlon 64 1 GHz or later processor.  At minimum 1 GB of RAM for Windows Vista, or 2GB RAM for Windows 7 32-bit [BC1] and 64-bit.  Lastly, at least 512 KB of L2 Cache is needed, which is essentially memory separate from the processor chip and holds commonly used data that is accessible faster than main memory.  I can’t emphasize enough that these are minimum requirements!

There are several other important considerations from an IT implementation perspective that should be made when looking into voice recognition in health care, such as clinical workflow and EHR integration.  Overlooking any of these can result in low utilization and ultimately failure. Hopefully, the tips and advice posted above can help you prevent such an outcome.  Stay tuned on tips and advice on clinical workflow and EHR integration to come in a future blog post.

Dr. Andres Jimenez is CEO of ImplementHIT, a leading Health IT training firm and creators of the OptimizeHIT training platform, which is rapidly becoming the “new standard in health IT training”.  Dr. Jimenez is a Nuance physician advocate, and is still clinically active using voice recognition for all his clinical documentation.

Comment on this Post

Leave a comment:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: