Exploring the healthcare data mine

The healthcare sector is primed to jump on the 'big data' bandwagon as vendors sharpen their focus on unstructured data analytics.

The provision of healthcare is associated with the generation of ever-increasing amounts of data, the vast majority of it unstructured. Until recently, the focus of analytical software has been to scrutinise structured data, which only represents an estimated 10 to 20 per cent of generated medical data.New natural language processing (NLP) tools could be the key to analysing the masses of healthcare data that falls outside the capabilities of existing tools. IBM has launched a solution supporting the analysis of unstructured and structured medical data.

Healthcare providers cannot afford to ignore the richness of knowledge hiding in medical data and we expect to see a plethora of user cases in the near future, especially on the topic of hospital readmissions in the US, as well as a growing number of solution providers.

Looking past the easy targets

Most healthcare analytic tools have focused on the easiest targets, such as electronic health records (EHRs) or billing data, which only represent a small part of overall medical data. Examples include insurers utilising analytics to formulate health programs to prevent the onset of certain types of illnesses, or to identify fraud in healthcare claims.

So-called “unstructured” (or variably structured) data has been estimated to account for 80 to 90 per cent of all medical data today. Such data is mainly found in written text such as doctors' or nurses' notes, discharge or referral letters, patient history, pathology reports, tweets, patient surveys, text messages, claims and case management data, and emails.

For medical information that is already in digital format, a wealth of data is at the disposal of healthcare practitioners, insurers, public planning authorities, and others. This data could help generate valuable insights into how to improve clinical and operational outcomes.

The deployment of this type of analytics can touch on both clinical and operational outcomes. In the clinical space, it has the power to support the diagnosis, help better clinical interventions, foresee (detect and predict) the early on-set of disease and condition deterioration, and improve disease management. In the administrative space, predictive analytics has the potential to help prevent readmissions and support patient discharge, follow-up care, and claims management. However, its usage has almost been untapped so far.

In the US, the looming penalties for hospital readmission within a 30 day-period, which will become effective as of 2013, pose an important use case for analytics, particularly as, according to the New England Journal of Medicine, approximately 20 per cent of readmissions are preventable. The usefulness of this tool is certainly not limited to determining the causality of hospital readmission across diverse illnesses or across the US. The leveraging of unstructured data can be used to prevent and detect fraud, or to determine which patients would do best in certain wellness programs. It can also be used to facilitate patient discharge and follow-up measures, to better target clinical care interventions, and to understand the spread of hospital-born infections.

Until recently, the wealth of unstructured data could not be analysed due to a lack of adequate tools; this has changed with the availability of NLP.

Natural language processing

At the end of 2011, IBM launched a solution that is capable of analysing unstructured or variably structured data in digital format. At the heart of this offering sits the natural language processing capability that is inherent in IBM's Watson supercomputer. This solution has been coined “IBM Content and Predictive Analytics” (ICPA). Rather than simply answering specific questions with weighted probabilities based on data already stored in Watson, ICPA deploys analytics to reveal and visualize trends, patterns, deviations, anomalies, and unknown relationships in structured and unstructured data. In this way, predictive insights, such as future, negative health outcomes, could be generated. ICPA also preserves the context of the original data to enable advanced analytics.

Although IBM has been a leading force in this development, it is not the only player to pursue healthcare analytics on unstructured or variably structured data. InterSystems is about to launch its version of an analytics tool to evaluate unstructured data, based on its health information exchange offering. As this is mostly untapped territory, applications for analysing unstructured data are vast. A successful business case: Seton Health cuts congestive heart failure readmissions

The Seton health case study

IBM's ICPA solution has already been met with a lot of interest. Seton Health, based in Texas, is a healthcare provider with five medical centres and four smaller hospitals in the area. It is part of Ascension Health, a catholic healthcare provider active across the US and open to innovative use of healthcare IT. Seton deployed ICPA for a period of eight weeks to gain a better understanding of why patients suffering from congestive heart failure (CHF) had been readmitted.

After analysing 113 indicators (structured and unstructured), Seton found that in this case, structured data sources were less reliable and revealing than unstructured. This is due to the rigid way questions were asked and the range of answers allowed. Seton's analysis revealed that social factors were contributing to whether or not a patient was readmitted, namely assisted living and drug and alcohol abuse. This information, which only exists in the form of unstructured data, emerged as a key indicator for readmission. Based on these insights, Seton will be able to identify patients likely to be readmitted and initiate measures to cut costs, improve the quality of life, and even reduce mortality rates among its CHF patients.

Cornelia Wels-Maug is a senior analyst Ovum and works for Ovum's Public Sector & Healthcare Life Sciences team. Read more of her posts here.