AWS How-To: Moving Large Volumes of Data to Amazon S3 with Snowball -- AWSInsider

Moving Large Volumes of Data to Amazon S3 with Snowball

By Brien Posey
11/14/2016

It has been said that one of the most compelling reasons to use the public cloud is that public cloud providers are well-suited to running really large-scale workloads.

That's all well and good, but what do you do if you have a really big workload running in your datacenter and you want to move it to the Amazon Web Services (AWS) cloud? It isn't exactly practical to transfer petabytes of data across an Internet connection.

If you need to move large amounts of data to AWS, then you don't necessarily have to transfer that data across the Internet. AWS offers a service called Snowball, which it first unveiled at the 2015 AWS re:Invent conference, that exists solely to help AWS subscribers with large-scale data migrations. As AWSInsider reported when Snowball debuted:

[Snowball] gives organizations a literal briefcase to physically transfer large amounts of data to AWS, a process that could take days or months if done over the air. Snowball appliances are ruggedized storage containers that organizations can order through their AWS Management Console. The appliances, which start at 50TB, feature "tamper-proof" closures, impact-resistant materials, built-in cabling, and digital shipping labels to reduce the likelihood of shipping mistakes.

The first thing that you need to know about Snowball is that it isn't suitable for every situation. If you need to transfer several terabytes of data, then you may be better off uploading that data in the usual way because of the costs involved in using Snowball. Snowball should be reserved for data transfer jobs that are just too big to do in any other way.

To use Snowball, you will need to log into the AWS Management Console, choose the Snowball option from the home screen, and click Get Started. This will cause AWS to launch a wizard that walks you through the process of planning a Snowball job. As you can see in Figure 1, a Snowball job can be used to move data to Amazon Simple Storage Service (S3) or, since a feature update in March this year, to export data from S3.

**[Click on image for larger view.]** *Figure 1: You can use Snowball to move data to or from Amazon S3.*

In the case of an import operation, you will need to create an import job. This process consists of providing your shipping information and some details about the data that is to be transferred. Upon doing so, AWS will ship you a physical appliance that you can connect to your network. In most cases, organizations choose to transfer data to the Snowball appliance by using the Snowball client. It is possible, however, to transfer data programmatically. Once the transfer is complete, you can ship the appliance back to AWS and its contents will be transferred to S3.

Now that I have explained the basics of using Snowball, there are a few important things that you need to know. First, like any physical appliance, Snowball does have a capacity limit. In the United States, Snowball appliances come in two different sizes -- 50TB and 80TB. In other regions, Snowball appliances can accommodate 80TB of data.

In spite of the Snowball appliance's capacity limit, it is possible to transfer really big data sets. The transfer operation must be divided into a series of jobs, with each job corresponding to one Snowball appliance. Thus, transferring large datasets involves the use of multiple appliances. Keep in mind, however, that AWS' default service limit only allows an organization to possess one Snowball appliance at a time. It you need to use multiple appliance simulations, then you will need to contact AWS to make arrangements.

Perhaps the most significant obstacle to using Snowball is the requirement that data remain static during transfer operations. If the data is modified during the copy process, then the data is invalidated and cannot be copied to S3. This is obviously a significant issue because any data set that is large enough to warrant using Snowball is probably being actively used.

As such, organizations will either have to plan to take the workload offline for a period of time, or come up with a way to ensure that the data will not be modified during the transfer. For example, an organization might create a storage snapshot prior to copying data to Snowball.

This brings up another important point: Transferring data to Snowball requires the use of a workstation. AWS recommends that you use the data's local host as a workstation (for the purposes of the file transfer). The reason for this is that the network can become a serious bottleneck. It is possible to transfer data to Snowball over a network link, but large frames are not supported, which can further diminish performance.

Regardless of which computer you choose to use as a workstation, the workstation needs to have plenty of memory, processing power and throughput if it is to transfer the data efficiently.

As you can see, there are a number of logistical considerations that should be taken into account before you create a Snowball job. You will want to be sure to work out the details ahead of time, because there is a time limit of 90 days. AWS provides a number Snowball best practices here.

About the Author

Brien Posey is a 22-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.

Featured

Subscribe on YouTube

AWS Cloud Report

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

4-Hour Hands-on Workshop: MCP Demystified
June 30, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Securing IT in the AI Era
July 23, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

TechMentor @ Microsoft HQ
August 11-15, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Microsoft 365 Security Masterclass
August 25-26, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

TechMentor Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free Whitepapers

> More TechLibrary

Free Webcasts

> More Webcasts

Moving Large Volumes of Data to Amazon S3 with Snowball

Featured

Fortifying Your Cloud: Essential AWS Security Strategies from an Expert Cloud Strategist

AWS Improves Backup Restorations with Custom Volume Support

AWS Uses Advanced LLMs to Simplify Agentic AI with Open Source Strands SDK

Fortifying Your Cloud: Essential AWS Security Strategies from an Expert Cloud Strategist

AWS Improves Backup Restorations with Custom Volume Support

AWS Uses Advanced LLMs to Simplify Agentic AI with Open Source Strands SDK

AI-Focused Data Security Report Finds Thousands of Risky AWS Policies Per Account

Dynamically Configuring Windows Instances in AWS EC2, Part 1

Fortifying Your Cloud: Essential AWS Security Strategies from an Expert Cloud Strategist

AWS Improves Backup Restorations with Custom Volume Support

AWS Uses Advanced LLMs to Simplify Agentic AI with Open Source Strands SDK

AI-Focused Data Security Report Finds Thousands of Risky AWS Policies Per Account

Dynamically Configuring Windows Instances in AWS EC2, Part 1

Upcoming Training Events

Free Whitepapers

Free Webcasts