The 4 Challenges of Working with Data and How Cloud Can Solve Them
Providers like AWS are working with industries that juggle large amounts of sensitive and complex data to make administrators' lives a little easier.
In this article, I will discuss issues that are -- and will be -- adversely affecting health care data and analytics, their role in improving patient treatment, and the way cloud services like Amazon Web Services (AWS) can help address them for health care providers. My discussion will focus on large genomics datasets, since that's what I work with, but I realize that is only a fraction of the data that will be used by health care providers in the future.
Below, I present four areas that highlight challenges related to health care data for the next five to 10 years: scalability, sharing, standardization and security. Although this column focuses on my particular vertical, which is health care, you'll find that many aspects of it can also apply to other industries.
This is particularly pertinent to genomics with the decrease in cost and increase in speed at which genomic data can be provided. Think of both pure storage (for the raw and processed data) and compute power (which is how we make that raw data into something interpretable). Along with that, consider the U.S. federal mandate that hospitals adopt electronic health records systems (EHRs). The amount of data stored electronically has and will continue to grow.
On top of that, various other types of biological data, images, audio and streaming data from handheld devices are now generating large and diverse datasets. Scale is a problem. How can I run my searches, queries and algorithms against these massive datasets? Traditional storage, compute and database systems are struggling to keep up.
Cloud technologies like AWS are a logical answer to this issue, as the core value of cloud services is in adding the ability to scale rapidly and on-demand. The cost of using cloud for storage and compute continues to go down. Cloud also alleviates the need for large upfront capital expenditures, and the pay-per-use model is an effective way to make efficient cost decisions.
Another U.S. federal mandate is for health systems to be interoperable, enabling the sharing of data about patients who might be seen and treated at multiple hospitals. In research, there is a constant need to share data and give access to others, including relevant sets of data that would complement, validate or enhance your own research. Moving multiple terabytes (soon petabytes) across the Internet is not feasible at multiple levels, which means a central repository is not an option.
Again, cloud services seem to be a logical solution to data sharing. Sharing data securely (see more on this below) requires a large amount of infrastructure and setup. Cloud providers like AWS still require proper setup, but alleviate much of the procurement of physical resources that can complicate this further. It also provides a standard way to consolidate this sharing across various secured environments.
The more data is generated, the more we see that the movement of the data is a major bottleneck, and it is harder to build and buy more bandwidth than it is to allow more people to access data where it resides.
Health care data is extremely complicated. Quite a bit of effort has gone into the standardization of data and how it is categorized and shared. (For a simple example of a scenario in which this becomes important, think of two hospital doctors exchanging information about a common patient.) Many industries are required to make this effort toward standardization, although health care is one of the most challenging, if not the most challenging. The complexities of the human body and the environment we live in are not trivial.
There are multiple standards for health care data, depending on the use or topic area. If coding for laboratory results, it's the Logical Observation Identifiers Names and Codes (LOINC). For diagnostics, it's the International Classification of Diseases (ICD). For mental health, it's the Diagnosis and Statistical Manual (DSM). There are dozens, if not hundreds, of these standards. Some of the data being generated is very new (genomics, proteomics, images and others) and standardization is still in progress. That data is new and layered, and there are dozens of publications coming out monthly, as well as international consortia, that are helping to define them.
The complexity of definitions and implementations is quite daunting. Cloud providers will not be able to solve data standardization complexities. However, they can provide standard and reusable services and infrastructure to implement them. Once data is defined, the cloud can allow the sharing to be simple, standard and centralized per provider, eliminating another layer of complexity. We are starting to see this with packages and APIs created by those working on cloud services.
Lastly, and layered across all three of the previous S-es, is security. Health care deals with sensitive data about patients. The more data that is stored, transformed and shared, the more complicated this becomes. The federal government has created various vehicles to create standards about protected health information, the most well-known being Health Insurance Portability and Accountability Act (HIPAA).
Data security, in particular, is seen as a deterrent to using the cloud. The perception is that since you can touch the server where your data resides, there is more risk. This is an area in which most cloud providers like AWS are working to educate and partner with health care providers, as well as with vendors who are flocking to cloud to provide their software and platforms to health care consumers. In the past week, AWS has released a new whitepaper detailing more of its services that are covered under its Business Associates Agreement, which includes its Oracle and MySQL databases and its DynamoDB service.
In previous columns, I have given more detail on this subject, but due diligence must be done by each health care provider.
In summary, AWS and other cloud services are positioning themselves to become key players in addressing some major concerns about the growing infrastructure and data needs of health care providers -- or, indeed, the growing needs of many other industries that juggle large amounts of sensitive and complex data. In particular, cloud-enabled scalability and sharing are natural solutions for alleviating the pressures felt by health care systems to address U.S. government mandates for storing and sharing patient data. At the same time, cloud providers are putting a lot of effort to address risk concerns and help alleviate complicated security and standardization needs.
If you're in an industry like health care that has similar needs, it's time to look into the cloud.
Aaron Black is the director of informatics for the Inova Translational Medicine Institute (ITMI), where he and his team are creating a hybrid IT architecture of cloud and on-premises technologies to support the ever-changing data types being collected in ITMI studies. Aaron is a certified Project Management Professional (PMP) and a Certified Scrum Master (CSM), and has dozens of technical certifications from Microsoft and accounting software vendors. He can be reached at @TheDataGuru or via LinkedIn.