Data is one of the most valuable commodities in the world, and it’s not hard to see why. From marketing to genomics, the analysis of large sets of data leads to predictive models, which steer to favorable outcomes for the business. The more data you use, the better those models are, which means the better outcomes they can produce. Of course, this means that moving data from one place to another is a crucial skill to have for any engineer, but it’s not always as easy as it sounds.
For example, if you use AWS S3 bucket storage, then moving data to another S3 bucket is a single CLI command,
aws s3 cp s3://SourceBucket/* s3://DestinationBucket/. Moving those same files to a different cloud provider, like Microsoft Azure or Google Cloud Platform, requires an entirely different tool.
By the end of this tutorial, you’ll be able to sync files from an AWS S3 bucket to an Azure blob storage container using rclone, an open-source data synchronization tool that works with most cloud providers and local file systems.
To follow along, you’ll need the following:
- An AWS S3 bucket
- An Azure blob storage container
- AWS access keys and Azure storage account access keys
- A computer running any modern operating system
- Screenshots are from Windows 10 with WSL
- Some files to copy
How to Set Up rclone
Installing rclone is different for each operating system, but once it’s installed, the instructions are the same: run
Running the config command will prompt you to link the accounts of your cloud providers to rclone. The rclone term for this is a remote. When you run the config command, enter
n to create a new remote. You’ll need one for both AWS and Azure, but there are several other providers to choose from as well.
After choosing Azure blob storage, you’ll need:
- A name for the remote. (In this demo, it’s “Azure.”)
- The storage account’s name
- One of the storage account access keys
You’ll be prompted for a Shared Access Signature URL, and while it’s possible to set up using that, this demo is just using an access key. After entering default for the rest of the values by hitting Enter through the rest of the setup, you should be able to start using your remote.
To list the remotes configured on your system, enter
rclone listremotes, which will show the remotes available. You can also list any blob storage containers by running
rclone lsd <remote_name>:. Make sure to include a
: at the end of the remote when running these commands because that is how rclone determines if you want to use a remote or not. You can run
rclone --help at anytime to get the list of available commands.