AWS Step-by-Step

Using the AWS CLI to Upload Files to S3

For the past three months, I have been spending nearly every waking moment developing a series of video-based courses related to PowerShell. The end result was nearly 70 hours of unedited video, amounting to hundreds of gigabytes. With all of the video recorded, the last step was to upload all of that video to a dedicated S3 bucket supplied by the publisher.

You would think that this would be a simple process, but it proved to be anything but. I tried several popular GUI-based tools. One tool seemed to work, but the file transfers were painfully slow and the tool ended up crashing after a few hours. Other tools experienced authentication problems or general stability issues. Ultimately, I decided to see if I could upload my project from the AWS CLI.

Backing Up the Existing Configuration
Before attempting to connect to the publisher's S3 environment, I decided to back up my existing AWS configuration. After all, uploading the video files should be a one-time task and I did not want to have to manually reconfigure my machine to put everything back to normal.

To back up your existing configuration (on a Windows machine) go to C:\Users\<your username>\.AWS. There should be two files in this folder -- config and credentials. The Config file contains your region and output format. The credentials file contains your access key ID and secret access key ID. Both of these files need to be backed up before continuing.

Configuring the CLI
With the existing configuration backed up, the next step in the process is to configure the AWS CLI. To do so, just enter:

aws configure

The AWS Configure process asks you to supply four pieces of information -- the AWS Access Key ID, the AWS secret access key, the default region name, and the default output format. The default region name and the default output format are optional, so I left them blank.

Setting the S3 Parameters
Being that I had a huge number of files to upload, I decided to create a new Config file that would allow for concurrent S3 requests. Doing so makes the upload process much faster since multiple files can be uploaded at once. Here is the text that I used for my new Config file:

[default]
s3 =
    max_concurrent_requests = 20
    max_queue_size = 10000

Testing Access to the Bucket
Once you have configured the CLI environment, you should be able to access the bucket. To test your access, enter a command like this one:

aws s3 ls

In my case, this command failed because the bucket was shared among multiple authors. The publisher had set up a series of folders for me, so rather than attempting to access the bucket as a whole, my next move was to try to access a single folder. The command looked something like this:

aws s3 ls "s3://authors/Brien Posey/Folder1"

It is worth noting that the folder path that I listed is unique to my own situation. However, the basic technique that I described should be universally applicable.

So now that I had managed to gain access to the folder, the next step was to attempt to upload the course files. The upload process involves using the aws s3 sync command. However, if you are used to working with a conventional file system, such as the Windows NTFS file system, the way in which you perform the upload process might seem a little bit strange.

In a Windows environment, file copy operations normally involve specifying a source and a destination. However, things don't work that way when you are copying from the AWS CLI. What you have to do is to navigate to the folder containing the items that you want to upload and then use the AWS S3 SYNC command. The command assumes that you want to upload the files from whatever folder you are currently in.

Incidentally, one thing that I did in my own environment to simplify the upload process is that I created a series of folders on my hard disk and copied the video files from my network to the local machine. The reason why I did this was because the network path was long and convoluted and would have been tedious to type. By making a local copy of the files, I was able to change the folder names to something short and easy to type.

So when all is said and done, the copy command is:

aws s3 sync . "S3://<path>"

If you want to test your ability to upload a single file before uploading large numbers of files, you can use a command like this one:

aws s3 cp test.txt "S3://<path>"

This command copies a file called test.txt from the current folder to the specified S3 path. If that process works, then a larger-scale copy operation should work as well.

About the Author

Brien Posey is a 22-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.

Featured

Subscribe on YouTube