How To Get Big Data Into The Cloud, Without SneakerNet

An Amazon AWS blog post a few months ago discussed a AWS Import/Export service which allows customers to physically ship their data to Amazon by Fedexing their hard drives. Known tongue-in-cheek in circles as “SneakerNet”. According to the Amazon Import/Export calculator, SneakerNet provides an approximate 50% savings to the standard S3 Data Transfer-in charges.

Additionally, Werner Vogels, CTO of Amazon’s on the same topic referred to a table highlighting the time taken to transfer a terabyte of data over a range of network bandwidths into S3. Because RainStor (disclosure, my current company) customers deal with massive structured data volumes on an ongoing basis, they benefit from RainStor’s patented data value and pattern deduplication which compresses the data on average 40-1. Cost wise lets take a look at how a 40 to 1 compression ratio of the data affects transfer rates:

Network Connection Uncompressed upload time for 1TB Compressed (40:1) upload time for 1TB
DSL 166 days 4 days
T1 82 days 2 days
10 Mbps 13 days 8 hours
T3 3 days 2 hours
100 Mbps 1-2 days 1 hour
1 Gbps Less than a day Less than an hour


RainStor’s patented compression architecture operates in four ways. First, we exploit the duplication of individual structured data values within each column of a table. In our scheme, such duplicated values are only stored once. Second, combinations of field values across rows are recognised as patterns, and are again de-duplicated. Third, semantic compression techniques are applied to store the data in a way that is most amenable to the fourth level of compression. This final deflation step operates at the byte level, exploiting the statistical redundancy present at this scale.

RainStor provides a software only VM-based appliance which runs in the user’s own local network environment thereby pre-compressing and encrypting the data, for significant cost savings prior to automatically transferring into the cloud. In some cases, depending on the duplicate data values the compression can be even greater than 40 to 1!

In summary, if you are looking to upload big data into the cloud and still want the data to be queriable using standard SQL, RainStor gives you just that with added cost savings in both the transmission and resulting storage.

Leave a Reply