{"id":3334,"date":"2012-03-08T15:03:09","date_gmt":"2012-03-08T23:03:09","guid":{"rendered":"http:\/\/www.ramonchen.com\/?p=3334"},"modified":"2023-05-29T10:32:24","modified_gmt":"2023-05-29T18:32:24","slug":"3334","status":"publish","type":"post","link":"https:\/\/www.ramonchen.com\/?p=3334","title":{"rendered":"How Much Is That Hadoop Cluster Really Costing You?"},"content":{"rendered":"<p>Also published at\u00a0http:\/\/rainstor.com\/how-much-is-that-hadoop-cluster-really-costing-you\/<\/p>\n<p>Last month when we released our RainStor for\u00a0<a href=\"http:\/\/rainstor.com\/technology\/hadoop\/\" target=\"_blank\" rel=\"noopener\">Big Data Analytics product edition that runs natively on Hadoop<\/a>, we raised a lot of eyebrows with two of the points that we were making:<\/p>\n<ol>\n<li>Compression can dramatically reduce the TCO of Hadoop nodes needed<\/li>\n<li>SQL access to the compressed data in HDFS can be achieved without having to transfer the data out of Hadoop or use specialized tools<\/li>\n<\/ol>\n<p>In my post\u00a0<a href=\"http:\/\/rainstor.com\/feeding-the-elephant-peanuts-and-making-pig-fly\/\" target=\"_blank\" rel=\"noopener\">\u201cFeeding the Elephant Peanuts and Making Pig Fly\u201d<\/a>\u00a0I talked how we could achieve massive compression, give SQL-92 access and boost the performance of MapReduce jobs. This post revisits the first point around TCO. I\u2019ll cover the second point in a future blog post.<\/p>\n<p>The reason that I decided to go over the TCO point again is because I had the pleasure of chatting with David Merrill, Hitachi Data Systems Chief Economist (@StoragEcon) on this very topic. I have been a fan of his white papers and noted that he had started writing about\u00a0<a href=\"http:\/\/blogs.hds.com\/david\/2012\/03\/big-data-storage-economics-case-study-1.html\" target=\"_blank\" rel=\"noopener\">Big Data Storage Economics<\/a>\u00a0on his blog titled,\u00a0<a href=\"http:\/\/blogs.hds.com\/david\/\" target=\"_blank\" rel=\"noopener\">The Storage Economist<\/a>. For those of you who are unfamiliar with his work, a good example is his white paper\u00a0<a href=\"http:\/\/www.hds.com\/assets\/pdf\/four-principles-for-reducing-total-cost-of-ownership.pdf\" target=\"_blank\" rel=\"noopener\">Four Principles For Reducing Total Cost of Ownership<\/a>providing a pragmatic and quantifiable look at all of the factors that contribute towards operating and running different types of storage.<\/p>\n<p>We talked about his research and analysis and how from purely a bare-metal CPU, disk and component perspective, commodity clusters such as Hadoop can appear to provide lower TCO from\u00a0<em>a cost per usable Tb perspective<\/em>. However as\u00a0<a href=\"http:\/\/blogs.hds.com\/david\/2012\/03\/big-data-storage-economics-case-study-1.html\" target=\"_blank\" rel=\"noopener\">his research showed\u00a0<\/a>, when\u00a0<em>cost per written to Tb\u00a0<\/em>is used, the equation is turned completely upside down. As David concluded,\u00a0<em>\u201cdon\u2019t confuse price and cost, and look at a longer time horizon when planning and building big data storage infrastructures.\u201d<\/em><\/p>\n<p>In our January release on Hadoop\u00a0<a href=\"http:\/\/rainstor.com\/compression-tames-big-data-on-hadoop\/\" target=\"_blank\" rel=\"noopener\">we had an example in an infographic<\/a>\u00a0illustrating how RainStor\u2019s compression can significantly drive down the physical storage and therefore the number of nodes required. We used a simple\u00a0<strong>operating cost metric of $3000 per node<\/strong>\u00a0(containing 12Tb of raw disk) that resulted in a TCO (buying and operating the cluster) savings of over\u00a0<strong>$1M over 3 years for storing 300Tb<\/strong>\u00a0of user data. If you take a look at David\u2019s numbers\u00a0<a href=\"http:\/\/rainstor.com\/compression-tames-big-data-on-hadoop\/\" target=\"_blank\" rel=\"noopener\">in his post<\/a>\u00a0he has it at a low of around\u00a0<strong>$3000<\/strong>\u00a0<strong>per usable Tb\u00a0<\/strong>for DAS to a cumulative high cost of\u00a0<strong>$45,000 per written-to Tb!\u00a0<\/strong>Granted the research was done in 2009 and acquisition costs have plummeted since, but since the price of floor space, heating, cooling etc. just continues to grow, it demonstrates that $3k per node can be considered reasonably conservative. His post also pointed out that in general the server CPUs with DAS were tasked with a lot of \u201cmundane tasks\u201d.\u00a0 As part of our conversation, I detailed how RainStor\u2019s unique value and pattern de-duplication process leveraged CPU cycles up front to build highly compressed partitions which not only saved on physical disk space, but used collected metadata to make data access more intelligent and efficient, as well as magnifying the performance of the commodity disks by retrieving more data per block when required. This means using more CPUs on load and improving performance overall upon access. All of this reflects the savings and the impact of baseline storage and access costs, and doesn\u2019t yet add the cost of administration (both software and personnel) of the nodes and cluster, as well as any development and integration costs (which I will cover in my next post).<\/p>\n<p>Bottom line, David\u2019s research and white papers over the years have contributed greatly to the overall TCO of storage and the benefits of widely adopted technologies such as thin provisioning. Now he is pointing out that Big Data Hadoop clusters have hidden hardware operating costs and it\u2019s best to go into such endeavors with eyes (and pocketbooks) wide open. Meanwhile, here at RainStor we continue to focus on drive down the TCO of your choice of storage and configuration, through our database. In the end, David and I both agreed that services aside, the best TCO lies between the efficient selection and implementation of the hardware and software for the given use case, and that the right combination is what will make Big Data manageable and affordable.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Also published at\u00a0http:\/\/rainstor.com\/how-much-is-that-hadoop-cluster-really-costing-you\/ Last month when we released our RainStor for\u00a0Big Data Analytics product edition [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[32,449,428,210,300],"tags":[281,447,179,446,181,429,448],"class_list":["post-3334","post","type-post","status-publish","format-standard","hentry","category-all","category-bigdata-2","category-data-retention","category-databases","category-hadoopmr","tag-bigdata","tag-davidmerrill","tag-hadoop","tag-hds","tag-mapreduce","tag-storage","tag-tco"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.ramonchen.com\/index.php?rest_route=\/wp\/v2\/posts\/3334","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ramonchen.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ramonchen.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ramonchen.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ramonchen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3334"}],"version-history":[{"count":6,"href":"https:\/\/www.ramonchen.com\/index.php?rest_route=\/wp\/v2\/posts\/3334\/revisions"}],"predecessor-version":[{"id":5806,"href":"https:\/\/www.ramonchen.com\/index.php?rest_route=\/wp\/v2\/posts\/3334\/revisions\/5806"}],"wp:attachment":[{"href":"https:\/\/www.ramonchen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3334"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ramonchen.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3334"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ramonchen.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3334"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}