Amy Monagan, EHRtv correspondent interviews Richard Wolniewicz, Ph.D. of 3M.
Amy Monagan: Hi, I am Amy Monagan with EHRtv. We are at HIMSS 2012 in Las Vegas and today I am with Dr. Richard Wolniewicz from 3M. He is the Divisional Scientist for Natural Language Processing. Thank you for being with us today.
Richard Wolniewicz:My pleasure. Thank you for having me.
Amy Monagan: Can you spend a little bit of time and tell us about what Natural language processing is?
Richard Wolniewicz: So Natural language processing is the processing of human-written text or language by computers. So we are all very familiar with Natural Language Processing in say, Email Spam Detection. So I am old enough to remember the days where I had to use – got into the email system and say, oh this email is spam. So block this under, teach my email system who is valid, who is not. Now-a-days, using Gmail we don’t even think about it. All right, it’s all hidden, it does a great job but it processes the text and understands which is content that is legitimate and which is content that it is not. We also see Natural language processing in Google when we have a link that says, translate this page. That’s a different kind of Natural language processing. It is doing machine translation but it’s again using same technologies. IBM Watson was a Question Answering system. We all saw IBM Watson perform so well in Jeopardy and in the healthcare space Auto-Coding is another application of Natural language processing as is speech recognition. So we all see Dragon Naturally Speaking from Nuance is a really good speech recognition algorithm and it is widely used in healthcare.
Amy Monagan: So given the different kinds of NLP, where does Auto-Coding fit?
Richard Wolniewicz: So Auto-Coding is lot like Email Spam Detection. We are trying to classify a document, an email as spam or not spam in Auto-Coding as having or not having this code, having or not having that code. Of course, it is much more complicated with ICD-9 having tens of thousands of codes not just one flag spam or not spam.
Amy Monagan: Correct, yes.
Richard Wolniewicz: ICD-10 has hundreds of thousands of codes, so much, much more complicated but it’s that kind of classification system and in fact a lot of our lessons from Email Spam Detection apply quite well in Healthcare to what it’s going to look like with Natural language processing. It also sort of illustrates that Healthcare is 10 to 15 years behind other industries in adopting Natural language processing. A lot of the questions we get these days are questions that other industries answered about NLP but we are addressing it for the first time in Healthcare, but much like email I expect that we will see Natural language processing disappear into the background as it becomes much more and more effective.
Amy Monagan: Can you give me an example of how NLP has evolved?
Richard Wolniewicz: So NLP has evolved over time. We can see that in email and spam detection. It began with a very much in your face, have to configure it all upfront, very rules-based. You create a rule that said, oh if the sender is from this domain, then reject it, if it’s from this domain, accept it. Over time, it became more and more subtle and more and more effective. Modern email spam detection systems are very hidden. You can see the email; of course it’s been put in spam folder or not but you don’t even think usually about whether it goes somewhere other than flagging the occasional mistake and that’s because it’s statistically driven. So a company like Google can look at the millions and millions of users of Gmail and see which emails they are flagging as spam and build statistical models that learn very, very quickly which email is like the spam, which is not. We can see the same thing happening with the evolution of Auto-Coding in healthcare where initial Auto-Coding systems were very manual, very rules driven, required a lot of attention from the coders. Over time, we are seeing more and more automation and more and more background work. We saw the long ways to go.
Amy Monagan: Okay. Can you tell me a little bit about what this means for healthcare?
Richard Wolniewicz: So what it means for healthcare is that as this technology gets better we can return more and more to clinicians using language to speak with other clinicians. We have gone on this exercise of trying to get clinicians to talk to computers and turning physicians into coders and all the other issues that of course generate a lot of concern and conflict because people want to be talking to each other. The fundamental purpose of a clinical note is for a human to talk to a human and when we turn it into a human talk to a computer, we degrade the project communication. Natural language processing holds the promise of humans being able to talk to humans, but the computers following the human language and then getting the benefit of all the analytics and reporting and everything else which you can do with the powerful EHR and other reporting technologies without having to force humans to talk to computers in a coding language.
Amy Monagan: What are the important components of an NLP system?
Richard Wolniewicz: So a Natural language processing system is made up of lots of pieces and you have got of course a portion of the system that’s thinking about language and a lot of it is the same stuff we learned back in grade school, diagramming sentences, when is a word in negation, telling whether a word is an adjective or a word an adverb. All of these are separate components inside of an NLP engine. A second important piece of an NLP engine is a dictionary. So if you don’t have a dictionary, you don’t know what the text means. You have to have something to refer to. You have to know that a dog is a canine to be able to do the kind of reasoning you are going to do. So you have to have some definitional logic there. You also need a lot of expertise in the actual final product. So if you are using NLP for Auto-Coding, you need to have a lot of coding knowledge and expertise to apply the coding rules from the clinical concepts to the final output. So it is really three major components. We’ve got the language processing piece, the dictionary and then the mapping, and semantic processing that goes from clinical concepts to codes, documentation improvement queries to whatever your target is.
Amy Monagan: What are the important things to consider when evaluating an NLP system?
Richard Wolniewicz: So there is a great quote I like from one of the top textbooks in Natural language processing which goes, and this is from Jurafsky and Martin, “To evaluate a language model, you need to embed it within an application and evaluate the total application performance”. Just like with Email Spam Detection, we don’t ask our – we don’t pick whether we are going to use Yahoo mail or Gmail based on the vendor saying, “Our spam detection is 97.3% accurate and built on top of rules. Instead we use it and we assess how it performs in our overall workflow. Are we getting lot of spam, are we not and for an Auto-Coding system or for documentation improvement system that’s using NLP, the question is, “Are my clinicians still focused on providing care instead of coding” “Am I getting high productivity out of my coding staff and Documentation Improvement staff and “Am I maintaining my reimbursement model?” Am I getting the reimbursements I am entitled to? That’s the overall application and that’s the purpose and there is a lot of complexity and frankly lot of fear, uncertainty and doubt that can be cast in the complex technologies evaluating Natural language processing system but pulling it back up to the level that we all understand as users of the reimbursement system and Documentation Management System is the best way to evaluate whether it’s delivering value.
Amy Monagan: Okay, well I really appreciate you taking the time and letting us know about this. Thank you. This is Amy Monagan with EHRtv with Dr. Richard Wolniewicz from 3M. Thank you.