AWS Launches Public Data Lake of COVID-19 Statistics
To facilitate research in the ongoing COVID-19 pandemic, Amazon Web Services (AWS) has created a data lake of publicly accessible information related to the disease.
The AWS COVID-19 data lake, which is readable to the public at no cost, currently displays "case tracking data" from The New York Times and Johns Hopkins University, data from Definitive Healthcare on hospital bed availability, and a vast library of coronavirus-related research articles (over 45,000 as of this writing) from the Allen Institute for AI. AWS plans to keep updating the data lake as new and reliable information surfaces.
The purpose of the data lake is to give data scientists, researchers, public health officials and other organizations a centralized trove of vetted information about COVID-19, with the aim of helping them develop policies that curb the spread of the disease.
"The AWS COVID-19 data lake allows experimenters to quickly run analyses on the data in place without wasting time extracting and wrangling data from all the available data sources," AWS said in its announcement last week.
Users can perform "trend analysis, do keyword search, perform question/answer analysis, build and run machine learning models, or run custom analyses" on the data using third-party solutions or AWS tools like Amazon Athena, Amazon QuickSight and Amazon Redshift Spectrum. There is no extra cost to access the data lake; users only pay the normal costs of whatever AWS services they use to work with the data. Users can choose to work solely within the data lake or combine it with their proprietary data. They also have the option to subscribe to the COVID-19 data sources directly via the AWS Data Exchange.
"We imagine local health authorities could build dashboards to track infections and collaborate to efficiently deploy vital resources like hospital beds and ventilators. Or epidemiologists could complement their own models and datasets to generate better forecasts of hotspots and trends," AWS said. A sample dashboard made with QuickSight is available here showing active coronavirus cases by region, U.S. testing data and hospital capacity.
An early user of the AWS COVID-19 data lake is the Chan Zuckerberg Biohub, a medical research nonprofit backed by Facebook CEO Mark Zuckerberg. "Our team of researchers is now analyzing trends in disease spread, its geography, and time evolution by leveraging datasets from the AWS COVID-19 data lake, combined with our own data, in order to better predict COVID epidemiology," said Jim Karkanias, head of Data Science and Information Technology at the Biohub.