The UPS of Data -- AWSInsider

The UPS of Data

How to choose the right data-delivery method for the right kind of data, and designing the cloud infrastructure to make that happen.

By Aaron Black
04/14/2015

I don't know of many kids who dream of working at UPS to deliver packages. Growing up in a small town in Ohio, I would have laughed at the thought that I would be in charge of a package-delivery business. In reality, that is what I am. The package my team delivers is data.

When UPS started in 1907 (as the American Messenger Company), did its founders think they were building a billion-dollar business? Did they think they would scale and compete with the U.S. Postal Service? How many of us take for granted how cheaply and efficiently we can get packages delivered to us overnight? Packages delivered by UPS are small and large, hot and cold. There are so many obscure and sensational things that have been shipped by UPS, some so custom that it's hard to put a price tag on the service.

In our data-driven world, there are many delivery mechanisms to help move our large, small, intricate, private and sensational data from our electronic warehouses to our waiting (and often anxious) consumers. In my current role, is the problem is not a lack of options to store and deliver data -- it's choosing the correct mechanism. Can we think of IT departments as the UPS of data? Can we build and deliver data and information that were never dreamed of when we first started?

What we have found in health care is that there is a diverse set of needs when it comes to data delivery. For patient care, the data needs to be delivered with security, accurancy and high reliability. For research data, the need is for high speed and performance, with massive datasets spinning on high IOPS disks. The results need to be returned back to researchers as quickly as possible. For exchanging patient data, it's high security as well as reliability that are most important.

If there were unlimited IT budgets, then all data would be delivered on the fastest components of hardware, network, databases and applications. For Inova, we first concentrate on gathering, storing and tranforming secure and reliable patient data. We do this with the solid and consistent systems and architecture of our health system. This secure environment is where we gather, review and de-identify our patient information before we move it to our research environments. Our first priority of delivery is security and reliability of the data.

With aggregate data sizes starting to measure in petabytes, we need to think about tiered disk storage based on cost and preformance. We think about data as a series of temperatures: cold, medium and hot. For our research data, the cold data is large in size, mostly unprocessed and rarely accessed. This data can be placed on storage that is cost-efficient but durable.

We use Amazon Web Services (AWS) for most of our long-term storage needs, and we continue to build Amazon S3 policies to automate the movement of our files to longer-term Amazon Glacier class storage. This significantly reduces cost versus on-premise storage options. However, there is cost to move this data out of Glacier storage. For us, this data is rarely accessed, and so the cost is minimal

Our medium data is accessed more frequently, but not every day. This is a quite a bit smaller than the cold data, but still can be terabytes in size. We need this data to be accessed faster and moved with little latency. We expect faster response times for accessing the data compared to the cold tier, but we are willing to wait seconds for it to be put in motion. We have solid state drives for this, which is costlier than the cold tier. We use Amazon EC2 and on-premises disks to store and deliver this data.

Lastly, the hot data is smaller than the medium-tier data. This data has been further refined by extract transfer load (ETL) or larger analysis by our scientific teams, and we expect this data to be able to be moved and queried quickly. This is where we want the data and its manipulation to be done in milliseconds, so the user interaction latency is almost non-existent. We use various places -- both on-premises and cloud infrastructure -- to store and deliver this datain a unique combination of high-speed disks and databases.

The ability to tier this data into cold, medium and hot has allowed us to spend our IT budget more effectively. Our goal as a "delivery" business is to allow our data consumers to get the most efficient use of their data and improve the way we allow them to request and interact with it.

How have you been able to deliver your data to your consumers? How can you utilize storage and delivery mechanisms to enhance their experience? Is this enabling them to do their work jobs better? Leave a comment below.

About the Author

Aaron Black is the director of informatics for the Inova Translational Medicine Institute (ITMI), where he and his team are creating a hybrid IT architecture of cloud and on-premises technologies to support the ever-changing data types being collected in ITMI studies. Aaron is a certified Project Management Professional (PMP) and a Certified Scrum Master (CSM), and has dozens of technical certifications from Microsoft and accounting software vendors. He can be reached at @TheDataGuru or via LinkedIn.

Featured

Subscribe on YouTube

AWS Cloud Report

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

4-Hour Hands-on Workshop: MCP Demystified
June 30, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Securing IT in the AI Era
July 23, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

TechMentor @ Microsoft HQ
August 11-15, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Microsoft 365 Security Masterclass
August 25-26, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

TechMentor Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free Whitepapers

> More TechLibrary

Free Webcasts

> More Webcasts

The UPS of Data

Featured

AWS Improves Backup Restorations with Custom Volume Support

AWS Uses Advanced LLMs to Simplify Agentic AI with Open Source Strands SDK

AI-Focused Data Security Report Finds Thousands of Risky AWS Policies Per Account

AI-Focused Data Security Report Finds Thousands of Risky AWS Policies Per Account

AWS Uses Advanced LLMs to Simplify Agentic AI with Open Source Strands SDK

Firms Seize on AWS Policy Change Disrupting Cloud FinOps

AWS Embraces Model Context Protocol for Agentic AI Development

Multi-Agent Collaboration Goes GA in Amazon Bedrock

AI-Focused Data Security Report Finds Thousands of Risky AWS Policies Per Account

AWS Uses Advanced LLMs to Simplify Agentic AI with Open Source Strands SDK

Firms Seize on AWS Policy Change Disrupting Cloud FinOps

AWS Embraces Model Context Protocol for Agentic AI Development

Multi-Agent Collaboration Goes GA in Amazon Bedrock

Upcoming Training Events

Free Whitepapers

Free Webcasts