Other Places You’ll Find Me | Today at RainStor we announced a new product edition that runs natively on Hadoop and HDFS. We are particularly excited as we sincerely hope it will help support the growth and enterprise adoption of Hadoop in the marketplace. Although we are not an open source vendor,we have tremendous admiration and respect for the open source community and the incredible momentum that Hadoop has garnered. A special thanks to the efforts of Cloudera,who blazed and continues to blaze the trail evangelizing the virtues of Hadoop,and to others such as Hortonworks and MapR (all RainStor partners) who are legitimizing the technology for solving Big Data problems. By applying our unique pattern and value de-duplication to raw data that would normally be compressed via LZO or Gzip,RainStor can deliver significant savings in the number of nodes required to retain Big Data. For example 40-1 compression could cut the number of nodes from 75 down to 2! Which is not just a lower upfront purchase cost but also a significant ongoing total operating cost reduction. Why bother if your savings in deploying Hadoop are already so significant compared to “traditional” enterprise database or data warehouse hardware and software deployments? Besides the obvious fact that saving money never goes out of style,the sheer rate of data growth is outstripping advances in physical storage media,which means it is a never ending job to feed the elephant. Cost aside,another way to look at the challenge is to think logically about uncoupling the storage and processing requirements used within each Hadoop node for solving your problem. If you are adding nodes purely to hold the data,you might be significantly under-utilizing the CPUs in each node. Also those CPUs might also be spending effort re-inflating data,if compressed via LZO or Gzip,rather than being fully applied to supporting the query or business analytic calculations. RainStor on the other hand,requires no re-inflation and actually the RainStor compressed files contain more records per block and have a magnification effect on disk performance and bandwidth upon access. So you end up in an almost surreal situation where not only is the data more compressed,Pig and MapReduce jobs actually run faster. Even though the number of nodes are reduced,they would be more efficiently used thereby allowing you to set the correct balance of adding nodes for processing power and storage needs. Finally RainStor’s ability to run natively on Hadoop is due to the fact that our architecture fits Hadoop and HDFS like a glove. As a large block,MPP database already using MapReduce capabilities internally,it was a natural fit for RainStor to run on HDFS. This enables RainStor to be part of the Hadoop deployment,rather than a database or data warehouse connecting to or transferring data out of HDFS. Because you get all of the security,auditing,unique compliance and data lifecycle management features and more you would expect from an enterprise database that speaks perfect SQL so that your traditional BI tools can access the data without having to transform or transfer it into a separate environment. Furthermore our data virtualization partner Composite Software allows data stored within RainStor on Hadoop to be seamlessly combined with other data sources around the enterprise without the need for large scale copy or transfer. In closing I have to give credit to our CFO Jamie Andrews (who is a budding marketing intern on the side) for the title of this blog. He knows a thing or two about saving money and articulated that RainStor’s compression and node reduction will allow enterprises to feed their Hadoop cluster peanuts,all while making Pig and MapReduce jobs fly! As is now my tradition (having done it last year) as we approach Turkey day,I’d like to reflect on what’s been a wonderful year so far and give thanks,especially for the following: - My wonderful wife Kathy who has taken her game to a whole new level as Mom to Parker and Ryan. It’s a joy learning how to parent with you and despite the sleepless nights and zero personal free time,I am bursting with love and admiration for you.
- Our nearly 11 month old twins Parker and Ryan. Never in my wildest dreams could I be so proud and happy every day of my life since you both arrived. Your smiles and giggles light up my day,and watching you grow and discover your new world opens my eyes to the beauty and intrigue of simple objects and activities in our world.
- An annual shout out to my family of in-laws,especially my mother-in law Clare who is currently rehabilitating after being rushed to hospital. Thank you to everyone who has been wishing her well and for all of the help and support in getting her better.
- To my fantastic mother who has been regularly video Skyping with us on the weekends. Parker and Ryan will love seeing you in person next year. They and I can’t wait for their Grandma to hold them in her arms
- Once again to all my dear friends,Wasim,Deirdre,Manish,Ruth,Henley to name a few. Its been wonderful to exchange advice and ideas with you in 2011. I hope I have been able to help you in small ways this year,given my new family responsibilities. I look forward to seeing much more of you in 2012. “A good friend is cheaper than therapy. ~Author Unknown”
- The continued momentum of RainStor,my current company,even through a tough economy and environment we are making great strides with large partners such as Dell and others (who I cannot yet reveal). I am expecting even better things in 2012!
- The SF 49ers for their 9-1 start (hopefully 10-1 on Thanksgiving day) maybe they can match the Giants winning the World Series! Dare we dream a Superbowl?
And again Happy Thanksgiving to everyone who has been reading my blog. I appreciate all of the kind emails,comments and feedback provided. All the best to you and your family,have a safe and fun holiday. -Ramon When the world is coming up with ways to grapple with,process,store and retain the massive volumes of digital data,the USPS presents an innovative solution,get offline! I don’t often pay much attention to the ads on TV. However this weekend,I happened upon an ad from the US Post Office which left me dumbfounded. The ad which can be seen here http://uspsvideo.com//video/127/USPS-Hacked-TV-commercial focuses on the issue of online security saying: “A refrigerator has never been hacked. An online virus has never attacked a corkboard. Give your customers an added feeling of security of printed statement that a receipt provides. With mail.” The reason for these ads no doubt is the fact that the U.S. Postal Service said it lost $5.1 billion last year as a weak economy and increased Internet use drove down mail volume. Postal officials has been quoted as saying that the financial situation is “dire.”Postmaster General Patrick Donahoe has warned of a postal shutdown next year unless there is congressional action to address the agency’s long-term money problems. While I completely feel for the postal employees and potential for layoffs in this poor economy,I can’t help wonder who recommended this positioning as a possible way to address their challenges. Quite apart from the fact that the message of “security”is flawed,in that identity theft through stolen mail from unsecured mailboxes is a common occurrence;encouraging people to switch back to hardcopy paper statements goes against all environmental logic. Obviously this cannot be intended to be a long term fix,but perhaps just a last ditch attempt to spike post office revenues in the months ahead. It is even more of a long shot when you consider that the ad is actually targeted mainly at businesses as it says “It’s good for your business and good for your customers.”Since businesses have long championed the e-statement as a way for them to save money,while projecting an environmentally conscencious image,this message will appeal to no one. Thoughts that ran through my mind after watching this ad:“swimming against the tide”,“evolve or die”,“Blockbuster vs. Netflix”,“Kodak close to bankrupcy”among others. Perhaps the massive workforce at the Post Office could be redirected towards embracing online activities rather than making oars for a ship that has already sailed. Perhaps a focus on improving online email security by partnering with encryption leader Voltage Security or evolving to a system where paid emails sent via the post office have a guaranteed “certified delivery”that are legally admissible? Surely someone at the post office can embrace the digital age? Alternatively their current line of positioning might net them some serious VC funding,if they just change their message to be “How to reduce Big Data without the need for Hadoop!” Here is a quick graphic I am keeping updated which visualizes some of the funding,M&A and partnerships related to the growing universe of Big Data and of course Hadoop. Note that in certain categories there are a mix of companies and technologies. The graphic is a work in progress and does NOT include RainStor (my current employer –for confidentiality and conflict of interest reasons) The partnership lines not surprisingly converge on Hadoop market leader Cloudera,followed rapidly by Hortonworks who has made significant progress since their launch. 
| Accel | $26.5 | Cloudera,Couchbase | | AGF | $9.3 | Talend | | Antham | $1.5 | StackIQ | | Avalon | $1,5 | StackIQ | | Balderton | $9.3 | Talend | | Benchmark | $10.7 | Pentaho | | Bessemer | $4.8 | Hadapt | | Conor | $3.5 | Neo Technology | | CrossLink | $6.8 | DataStax | | Docomo | $1 | Couchbase | | Fidelity | $3.5 | Neo Technology | | Fly-bridge | $10 | 10gen | | Free-Style | $.33 | BackType | | Galileo | $9.3 | Talend | | Giza | $4.5 | Mintigo | | Greylock | $19 | Cloudera | | Hummer Winblad | $2.5 | Karmasphere | | Ignition | $26.5 | Cloudera,Couchbase | | Index | $10.7 | Pentaho | | Kliner Perkins | $5.8 | Datameer | | Lightspeed | $11.3 | DataStax,MapR | | Lower-Case | $.33 | BackType | | Mayfield | 7.5 | Couchbase | | Meritech | 26.5 | Cloudera,Tableau | | NEA | 15.2 | MapR,Pentaho | | Northbridge | $7.5 | Couchbase | | Norwest | $4.8 | Hadapt | | Redpoint | $5.8 | Datameer | | Sequoia | $10 | 10gen | | Sunstone | $3.5 | Neo Technology | | True | $.33 | BackType | | Union Square | $10 | 10gen | | USVP | $10 | Karmasphere,Tableau | | Y-Comb | $.33 | BackType |
The following conversation with your Big Data was recorded at the offices of Dr. D. Dupe M.D. Doctor: “Please take a seat and tell me what brings you here today?” Big Data: “Thanks Doctor. Well,lately I’ve been feeling depressed. As you can see,I’m getting on in years. Certainly I’m not the same data I used to be. When I was young,I was able to adapt and change with the world. Frequently updating and reinventing myself,which is one of the reasons my ex-girlfriend was attracted to me.” Doctor: “Tell me more about your ex-girlfriend.” Big Data: “She’s a famous RDBMS,very energetic,well liked,but very high maintenance. Her job is to process and manage transactional data,that’s how we met actually. We’ve been together for years,but our relationship has been changing for a while. She rarely calls me,and when she does she only reminisces about the old times and our history together. Even though the relationship wasn’t going anywhere,I was comfortable with the arrangement,until one day she broke it off.” Doctor: “How did she break up with you?” Big Data: “She told me that she chatted to her boss (and best friend),Margie Application and she said we rarely we see each other,how I never change,and apparently the last straw was that I was overweight and dragging her down. Taking up valuable space with all my junk. To top it off,I live in her apartment and her landlord,a nice guy Joe Storage,told her that the lease was just for one person and she wasn’t allowed to have me stay there anymore.” Doctor: “How did you react?” Big Data: “I got mad,said a few choice words,grabbed my stuff and stormed out! Then I calmed down and went to talk to Joe to see if I there was space in the building for me. Unfortunately,Joe Storage’s building caters for young up and coming types. The rent is astronomical per sq. foot as you pay for the update facilities. It has fast elevators providing high-speed access with expensive security. I really couldn’t afford to live there.” Doctor: “You mentioned your age earlier,do you think this is the main reason why she broke up with you?” Big Data: “I did at first,because we had been together for so long. But a friend of mine recently experienced a similar issue. He’s young,machine-generated and straight off the network but was also told that he takes up too much space,and since he too never changes,it was too expensive to keep him around.” Doctor: “Actually you are correct,this is becoming a common issue. Many of my patients are reporting the same symptoms that you and your friend are experiencing. Unfortunately,there are as many people offering different approaches to this problem and it can be confusing. Just like fad diets and late night TV commercials,it’s important to be specific about your core needs,which if I could recap as follows: - You have come to terms with the fact that you no longer change
- You need help losing some weight
- You and your girlfriend would like to keep your relationship,even if you see each other infrequently
- You need somewhere nearby that you can stay,which is within your budget
- A place you know where you stand in terms of a lease,with a set of rules that are agreed upon as to when and if you have to leave
The organization and people I will introduce you to specialize in retaining Big Data like yourself and your friend. They can load billions of records a day,and unlike me their type of shrinking allows you to fit in a much smaller space. The result is a form of compression weight loss that forgoes the need for high-speed elevators. You can then choose your alternate Joe Storage apartment of choice,they negotiate and enforce the lease,and security is also provided. To your ex-girlfriend or more to the point her Application boss or her superiors the end-users,you are the same Big Data. Nothing has changed about you;they can see you whenever they want. And best of all,the total cost will be less than what you were paying,often 10x less. Finally,you and your friend can move in immediately,no complicated setup required. Big Data: “Doc,you’re a lifesaver,I can’t thank you enough!” Doctor: “You’re welcome,maybe you’ll come back in a few weeks and tell me how you are doing?” Big Data: “Definitely,you can count on it.” First of all,apologies for the lack of posts the last month or so. I’ve busy working on the launch of significant enhancements to the RainStor product and lots of exciting activity with our partners,including our recently announced relationship with Dell. My other focus has been with my fast growing identical twins Parker and Ryan who are now 7 months old. So pardon my indulgence as I combine the two into this blog post. As ever,comments are welcome but be gentle as I’m operating on a sleep deficit J
As per Wikipedia,Shared Nothing (SN) is a distributed computing architecture in which each node is independent and self-sufficient,and there is no single point of contention across the system. Shared Nothing Architectures have become prevalent in the Data Warehousing space with products such as EMC Greenplum,HP Vertica and Teradata providing Big Data analytic solutions. Hadoop and HDFS is also an example of a SN environment. With that as a brief backdrop let me turn my attention to the challenge of being a parent to twins. Together with obvious extreme lack of sleep,the other conundrum is do we need to buy two of everything,and if not how will we share it? If you are a parent of twins (or two siblings) the scenario and questions below will be familiar: 1. Will they both need parallel access to it at the same time? (Items such as bottles,clothing,pacifiers,car seats meet this criteria) This is a simple example of “shared nothing” with no contention,whereby each boy has his own item and is able to happily focus on each individual item in a self contained manner. With the use of their own bottles at feeding time,we are able to simultaneously feed both boys,in half the time versus if we proceeded sequentially. For SN systems such as Hadoop,massive parallelization by simply adding more nodes with its own locally attached disk allows it to scale to handle Big Data data volumes. 2. How will we provide access to an item if one or the other needs it? (For example using a fixed or portable changing table) With a changing table built into a dresser,we are pretty much saying that all diaper changing will be done in a fixed location (at least around the house). We therefore bring each boy to the nursery where the table resides. In Hadoop MapReduce function processing is moved to keep the work as close to the data as possible to reduce network traffic. This is to avoid,moving the data itself in what is known as “data shipping”. In contrast,Shared Disk and Shared Everything architectures don’t have a “data shipping” issue because each node has access to all of the data. Obviously Shared Nothing vs. Shared Disk vs. Shared Everything is a much more complex and sophisticated technical topic which I won’t be covering today. Throw in OLTP vs OLAP and the Cloud you have an even spicier debate. If you are interested you can check the links below for some good discussion. And as usual excellent reference material is available through links on Wikipedia: To tie off this post with my main two focuses;RainStor ‘s unique architecture which physically stores data using a “Shared Nothing” paradigm,while at the same time providing access from any SN node,mitigating the “data shipping” network transfer bandwidth problem by significantly compressing the data. This is one reason why RainStor is an ideal solution to ingest and support the query of ever growing Big Data volumes that have to be retained at petabyte-scale. Meanwhile Parker and Ryan are themselves growing and scaling at an alarming rate. This article was originally posted at Cloudtimes.org http://cloudtimes.org/hadoop-compression-the-elephant-thats-not-in-the-room/ We are living in the age of “Big Data” where billions of transactions,events or activities are generated through use of smartphones,web browsing,smartmeter sensors and more. Hadoop,MapReduce and a new generation of NoSQL technologies are helping us manage,transform,analyze and deal with the data overload. Together they process extreme volumes using techniques and technologies that askew the traditional “big iron” infrastructure normally contributing to the high cost of running IT departments and data centers. By running on low-cost commodity servers and direct attached storage,Hadoop and HDFS clusters can be used to process petabytes of raw data,producing meaningful,actionable business insights in a fraction of the time previously thought possible. With the cost of acquiring new racks of physical storage continuing to trend lower per TB,it would seem that this should be a winning combination for many years to come. But as the story goes,whether it’s CPU intensive computing power,RAM or physical storage we will always find ways to consume and exceed current capacity. Invented by Doug Cutting and named after his son’s toy elephant,Hadoop can efficiently store and retrieve large data sets for processing. However,the stored data footprint actually becomes 2 to 3 times larger than its original raw size due to replication across nodes. Since Hadoop has no in built compression,this has lead to the use of basic binary compression technologies such as Gzip (see Amazon AWS’ guide to Hadoop compression here) and LZO (see the blog post by Matt Massie back in 2009 about using LZO with Hadoop at Twitter) to reduce the amount of disk required. As we know binary compression has its limits,and comes with a re-inflation penalty upon access. Meanwhile higher compression rates and savings have been realized through other techniques such as de-duplication of files and objects through products such as Data Domain. But that presupposes that you have many copies of the same object or blocks of data on disk exhibit the same characteristics. In reality certain types of structured and semi-structured data does have similar characteristics,but at a much more granular level than a file or object. Transactions,call data records,log entries or events have repeated data values and patterns that are common across individual and groups of records. These can be de-duplicated so that only unique entries are retained. This level of de-duplication (at the value de-dupe level similar to columnar databases) generally yield compression rates far greater than binary compression. The challenge of course is maintaining the integrity and original immutability of the individual records loaded into the system,ensuring that data can be accessed on demand without a high performance penalty. Compression obsession has shown that when you achieve significant compression,many things become easier and in some cases even faster! Smaller amounts of data,written as large blocks results in less I/O,as well as less bandwidth consumed when moved between nodes or networks. This means that data can be stored in a shared nothing architecture like HDFS while also benefiting from a “logically shared everything” model where each node can have access to data located on other nodes without major performance impact. Additionally,in a heterogeneous server environment,higher compute capacity nodes can actually compete for more tasks. When you are dealing with petabyte-scale data,like major communication service providers (CSPs) who have to capture tens of billions of WAP logs and CDRs a day,basic binary compression isn’t significant enough. Compliance for on demand accessibility and retention periods of 3 months to years make higher compression rates a critical factor to keeping operational costs at levels that can scale with the growth in subscribers and activity. Compliance isn’t the only reason to retain large data sets. Better historical business reference and analysis trending across years of gathered data,or test data spanning millions of critical components,all require data be accessible to yield the next great set of business insights. Take the announcement by Yahoo (early user and significant contributor to Hadoop) who said that they are taking the extra-ordinary measure of retaining more data than the compliance requirement dictated by the European Union (EU) – (See Yahoo Jacks Data Retention Period from 90 days to 18 months). This raises another issue around when and how to determine what data should be removed as expiry periods are reached. But that’s another topic for another time. As a community we continue to make great strides in leveraging Hadoop,such as Cloudera’s Distribution Including Apache Hadoop that brings together a wealth of complementary technologies and components for enterprise class Big Data management. However in order to process Big Data,you will need to combine Hadoop with compression that can keep up. At petabytes today trending to exabyte scale in the future,the topic of Hadoop compression is one elephant that is conspicuously missing from the room. Last year Gartner published a report titled “Enterprise Information Archiving Transforms the Strategy and Approach for Archiving”,in it they forecasted that Enterprise Information Archiving (EIA) will become a key infrastructure component and will hold both structured data and unstructured content by 2013. Quite a bold prediction at the time considering that Gartner also published a “Magic Quadrant (MQ) for Enterprise Information Archiving (EIA)” in 2010 as a direct replacement for their Email Active Archiving MQ which they have been publishing since 2002,in which the vendors listed did not offer any products which were designed for structured data archiving,let alone a single unified solution. Thank you to Mike Vizard for publishing my guest blog post at CTOEdge. Please read the rest of the article at http://www.ctoedge.com/content/dream-reality-single-enterprise-archive-solution-big-data-retention You are seeing tweets daily,check that,there seems to be an article or post practically every hour. Big Data is an enormous topic of conversation. From Apache (Hadoop) to Zetabyte forecasts,everyone is telling you to be prepared. Big Data is being analyzed,retained and managed for greater business insights and competitive agility. With transactional data volumes reaching billions per day,there is a growing danger that the quantity of data might overshadow it’s quality. In fact concerns about data quality have been top of mind in enterprise IT way before Big Data ever came on the scene. Data quality (DQ) cleans up reference data (such as customer names,addresses) to ensure they are factually correct. While Master Data Management (MDM) includes DQ and reconciles reference data from multiple siloed sources (web ordering system,social network feeds) to make sure that Big Data transactions are correctly affiliated with their reference data owners (customers,suppliers,products,even sensors). Even before data was “Big”,accurate reference data dimensions have been a key sticking point for enterprises. Without MDM,many have questioned the accuracy and validity of Enterprise Data Warehouse Analytics. So big data analytics applied to extreme data volumes could mean drawing very fast wrong conclusions! Yet another big data article in the Economist titled “Building with big data”stated: “….But on the second question,they are silent. Big data has the same problems as small data,but bigger. Data-heads frequently allow the beauty of their mathematical models to obscure the unreliability of the numbers they feed into them. (Garbage in,garbage out.) They can also miss the big picture in their pursuit of ever more granular data. During the 2008 presidential campaign Mark Penn provided Hillary Clinton with reams of micro-data,thus helping her to craft micro-policies aimed at tiny slices of the electorate. But Mrs Clinton was trounced by a man who grasped that people wanted to feel part of something bigger. The winning slogans were vague and broad (“hope” and “change”).” More and more social media data is factoring into the decision making processes of the business of customer relationships. A recent CIO article by Neil Gow in CIO titled “The power of social media” speaks to the complete customer view and says ”MDM is the “secret sauce” for CRM 2.0 (Customer Relationship Management),a centrepiece of which is social media data. This proven technology generates a trusted,authoritative customer view by consolidating and reconciling disparate customer information from enterprise sources …” Interestingly,companies who have entered the big data analytics sweepstakes through acquisition are largely missing MDM from their arsenal. Only IBM,who acquired Netezza,owns everything soup to nuts including not one but three MDM products in their acquisitions of DWL,Trigo and Initiate. Meanwhile others such as EMC who bought Greenplum (see my post last year “How reliable are analytics without MDM”) partners with Informatica (acquired Siperian) for MDM. And HP who picked up Vertica is completely devoid of any MDM offering altogether. Looking from the other direction,Oracle is a leading vendor with MDM,again with not one but two products including the customer MDM solution acquired via Siebel. Their big data strategy is currently pinned to Oracle Exadata,the appliance which combines Sun HW with a specialized Oracle DB. SAP has an MDM solution but has struggled to gain traction outside of SAP accounts. While again they have been strangely silent on big data analytics,even though they did acquire Sybase who holds the patent for columnar DBs,and who sued Vertica. Vertica won round 1 of that lawsuit but Sybase counterpunched with a 2nd claim. However it all seems to have gone quiet after HP bought Vertica and Léo Apotheker (ex-SAP) took the CEO position at HP. Finally IBM and Informatica seemed well positioned with their leading MDM plays. Informatica’s CEO see big opportunity in big data (Disclosure:a partner of my current company RainStor) has begun making a lot of marketing noise around big data and MDM. Witness the most recent Informatica MDM blog post by Ravi Shankar my former colleague at Siperian Why MDM and data quality is such a big deal for big data In the end big data could survive without data quality and MDM,but it all depends on how that transaction data is ultimately correlated to the reference data which points to the owning entity. For example,if you are looking for buying patterns or clickstream data trends,you could get by without MDM. However,if your goal is to hone in on specific customers,suppliers or products,you would be advised to look at MDM to ensure that your quantity of data processed is given the appropriate dash of quality. Unless you were at a media-free retreat or preparing for the Rapture that never was,you will have seen the tremendous success of the LinkedIn IPO,pricing at $45 a share and hitting an intraday price of $120 before “settling”down to today’s price of about $95 sporting a $9B+ market cap. To put this in perspective,Mashable points out that LinkedIn is now valued more than some of the following household names: Tiffany &Co.:$8.9 billion,Chipotle:$8.8 billion Electronic Arts:$8 billion,Hyatt Hotels:$7.9 billion,Hertz Global Holdings:$6.7 billionStunning yet not the first time in stock market history this has occured. The dotcom boom had companies coming to market regularly with billion dollar valuations exceeding those of their “brick and mortar”counterparts. To be fair,LinkedIn isn’t some fly by night website or company that IPOed after 2 years of existence. It was founded in 2003 and has steadily built up a business with real revenues and an established user base that regularly and virally adds new members by “linking in”with others. So why the massive valuation and the market enthusiam for LinkedIn’s stock? And what does it mean for the other even more famous bretherin taxi-ing on the IPO runway (Facebook ,Twitter,Zynga)? From a technology perspective,LinkedIn is the perfect storm of 3 extremely hot technology trends right now: - Big Data - I’ve written frequently about Big Data and you can explore some of my posts here. Some LinkedIn stats related to Big Data include crunching 120 billion relationships a day,and 16TB of intermediate data for calculations. With Big Data,more “traditional”private software companies that allow insights into Big Data have been snapped up at high valuations (witness high performance analytics companies Vertica,AsterData and Greenplum going for 10 to 15x revenues). With the trend being that the “I”(Information) in IT becoming increasingly more valuable than the “T” it is not surprising that companies like LinkedIn are being afforded extreme valuations based on their Big Data information store.
- Open Source –LinkedIn is a big user of open source technologies,notably Voldemort,LinkedIn’s NoSQL key/value storage engine. Last year at the Hadoop Summit Jay Kreps provided some details around their use of Hadoop and LinkedIn’s infrastructure. The most successful publically traded open source company to date is undoubtedly Red Hat (incidentally which has a market cap of just $8B on $900M+ revenues compared to LinkedIn),but the huge interest in Hadoop and other NoSQL technologies is driving the big boys such as IBM (Investing $100M) and most recently EMC to get a bite of the elephant. Meanwhile the thought leader in Hadoop is still Cloudera, who continues to gain the respect of the open source community while making strides with their commercial model. DISCLOSURE (RainStor,my current company announced today support for Cloudera’s distribution including Apache Hadoop). Even Yahoo is contemplating spinning out their Hadoop team to tackle an estimated $1B market.
- Cloud Computing –Those who continue to doubt the adoption and the success of the Cloud must at least admit that public companies such as Salesforce.com,Netsuite,Successfactors and even Netflix (with their on demand video distribution gaining massive traction) have been an overwhelming success. Whether you narrow it down to software-as-a-service (SaaS) rather than a more generalized term of Cloud computing,one thing is for sure,people are very much comfortable storing alot of their personal and business data offsite in the Cloud.
Congrats again to the LinkedIn team for achieving their success to date by deploying new technologies that support their innovative business model. So where do we go from here? All of the companies lining up to IPO have similar technology characteristics highlighted above. Is this the start of another market “bubble”? Or have we peaked with the stock market at current highs and the Bernanke QE2 cruise ship coming into dock? From my perspective (being with RainStor),valuations aside I’m excited to see Big Data and Cloud Computing being afforded such interest,both by enterprises and the financial markets. If the Fed decides to announce QE3 (BIG IF) and Facebook and others hit the markets and skyrocket (as expected),it’s going to be a wild ride. | Cloud 'N Clear-Established April 2009: Cloud ‘N Clear on Facebook |
Recent Comments