Wednesday, March 6, 2013

Amazon Web Services

Amazon Web Services (AWS) is a service Amazon provides for hosting or storing data on the Internet.  Overall, it seems like a pretty nice (and affordable system).

One of their services, Glacier, is for very long-term storage of data that doesn't need to be accessed frequently (such as backups).  Seriously, they mean infrequently - just running the command to get a directory of your files appears to have a turnaround time of a few hours.

Unfortunately, the only interface for accessing the service requires using a programming language - there is no standard tool or web interface.

Fortunately, they supply several APIs - I am using the Java version.  They supply fragments of source code, though it is scattered throughout their slightly opaque documentation.

I have gathered together all of the relevant bits and made it into a reasonable command line application.  Since I was doing this in Linux, all of these instructions are for this system, though it should adapt to any platform.

To get it to work, you will need to:

  1. Have an AWS account
  2. Have a working Java compiler
  3. Create a working folder ('aws')
  4. Download the AWS Java SDK
    1. Extract it to 'aws/sdk' - you should get an 'aws/sdk/lib' directory, among others
  5. Download my Glacier app code
    1. Extract it to the working folder ('aws')
    2. You will need to edit two lines in 'aws/src/glacier.java'
      1. Replace the text "**UNIQUE ID HERE**" with unique strings
  6. Run the "build.sh" script to rebuild the code
    1. This includes several third party libraries included in the AWS Java SDK
  7. Edit the 'bin/AwsCredentials.properties' file
    1. Add your personal account info from http://aws.amazon.com/security-credentials
  8. Run 'glacier' to see a list of commands
The included 'glacier' script just passes your command line arguments to the Java application:
  • create_vault - create a new vault, same as you can do from the web interface
  • delete_vault - delete a vault (note that it needs to be completely empty first)
  • describe - display info about a vault
  • list - display info about all vaults
  • upload - upload a file to a vault
  • download - download a file from a vault
  • delete - delete a file from a vault
  • dir - list the files in a vault
Note that some of these operations ('download' and 'dir' in particular) can take several hours to complete.  Also note that uploads to a vault can take up to 24 hours before they are reflected in the vault info.

Referring to any file in a vault requires its ID.  This will be displayed when the upload completes - you may wish to record this ID yourself, as listing the files in a vault with 'dir' can take a very long time.

This script could certainly be improved, but this is already many times easier than what is provided by Amazon.  I hope it is of some help!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.