There’s been a lot of discussion around which industries are the “hottest” in terms of Big Data. Gartner in July published the heat map below in their report Market Trends: Big Data Opportunities in Vertical Industries
With RED=Very hot it’s easy to see that the industries that the four that have the highest volume are 1. Banking & Securities, 2. Communications, Media and Services, 3. Government, 4. Manufacturing and Natural Resources.
It’s probably also no surprise that Communications, Media and Services and Government are called out to have the widest variety of data with video for entertainment and surveillance being prime drivers for unstructured data objects. Whereas Banking & Securities have generally well organized datasets due to standards, governance.
Finally it is clearly reflected by the RED under the Software category that Banking & Securities, Communications Media and Services are the most interested and willing to invest in software to tackle their Big Data problems.
So it may come to no surprise that my company RainStor, that offers a Big Data database announced today that we received $12M in C round funding from prominent industry leaders Credit Suisse in Banking and Securities, and Rogers Venture Partners related to the Communications, Media and services.
Over the last 2 years, RainStor has been fortunate enough to provide solutions to the largest banks and telcos in the world. We are excited that this investment re-enforces our market opportunity and look forward to doing great things for companies in these industries and beyond.
BTW, we are hiring so please do not hesitate to reach out to me if you are interested.
I recently read the book Revenue Disruption by Phil Fernandez, the CEO of Marketo. It was a good read, and quite entertaining in places. As a marketing professional I felt very aligned with the evolution of marketing strategy and techniques described in the book.
Specifically the notion of marketing and sales alignment caught my attention because my friend and author herself Christine Crandell has been talking about for years. Here is a link to an interesting article she wrote in Forbes describing ways of measuring marketing and sales alignment. Christine was quoted in the book BTW and was interviewed by Phil on the Marketo blog back in 2010.
As you might have guessed, I wholeheartedly agree with both Phil and Christine’s thinking and viewpoints. Marketing and Sales have to be aligned in order for an organization to scale. Those who have worked with and for me will attest that I measure and track every dollar spent by Marketing, and how it relates back to revenue-generated.
I am also a great believer in “walking a mile in someones shoes”, not just because you’ll be a mile away from them and you’ll also have their shoes. When I joined RainStor, my current company, in addition to traditional marketing activities, I took on the role of developing business and partnerships looking to OEM or resell our database. I was fortunate to find, develop and nurture a major alliance with Dell, but I also took on quota to sell. I was able to work with Dell field reps in jointly selling, deliver technical presentations, architecture discussions and ultimately the pricing, contract negotiation all the way to close.
Obviously as anyone already in sales knows, it ain’t easy! I know I’ve been guilty (and jealous) like many looking at top salespeople who pull in big $$ with large deals, and thinking that really is a piece of cake. Now having not only “carried a bag” but also faced all of the obstacles: political, financial, competitive, roadmap needs, prospect re-org, unreasonable terms and more, I am now in a much better place to provide the content, service, leads and support to the great men and women of this country that proudly go out and SELL SELL SELL!
If you are in marketing and have never been in sales, you need to at least make a point of accompanying a sales person (if they will let you) on some calls. Or truly listen to them when they have an issue that they need help with. If you are doing your best, and your executive management is bought into marketing and sales alignment, their cries for help out in the field should be respected and it’s your job to help. In this brave new world that Phil and Christine have articulated so well, we are all in it together to generate revenue, that’s the only way the company can succeed.
The best things in life are free, but you can give them to the birds and bees … I want money – Lyrics to Money, by The Flying Lizards 1979
I read an article in the WSJ titled “When Freemium Fails”. Note that the article focuses very much on B2C offerings. My thoughts are very much around how this can be successfully applied to a B2B business. Highlights that stood out:
- Freemium only makes sense for businesses that eventually reaches a significant number of users. Typically only 1% of users will upgrade to a paid product (as born out by the S1 of companies like Splunk)
- Paid users generally expect to get better or different versions of what they’ve already received free of charge
- Enterprise clients typically have budgets for buying goods and services, so they aren’t necessarily drawn to free products.Although Open source and the success of MySQL, NoSQL databases and the Hadoop wave would seem to contradict this paradigm
- Freemium needs time to work. Fewer than 1% of users of Evernote (a company cited) become paying customers within a month, compared with 12% after two years
So will Freemium work for you? Having put a lot of thought into this myself and analyzed several successful and failed businesses using Freemium, I say it depends on you! Like any Marketing initiative, you must put in the necessary thought, analysis and planning before embarking on such an endeavor. Here are some considerations framed in the classic P’s of marketing:
- Will your offering have the same Product Features, limited only by some throttle/volume?
- Is your product ready? How much work is it to package up? Obviously if you are Cloud/SaaS you should be okay
- How will you enforce licensing? Time-based, throttle limits, lite version?
- How will you deliver upgrades/fixes? (if you are non-SaaS)
- Eventual pricing upon conversation? Same volume based metric?
- For enterprise (non-SaaS) Perpetual vs. Subscription pricing?
- Impact on existing/future indirect OEM Pricing and partners?
- Go the “open source” route and charge only for support (ala RedHat or Cloudera model)
- Time-bound intro offers to accelerate uptake? E.g. Limited to first xxx customers
- Conversion process follow-up (at 1 to 3% conversion rate)
- Promote successes feature profiles of enthusiasts as a branding by product
- Accessible only via SaaS/Public Cloud? Downloadable enterprise versions?
- North America only to start? Or Worldwide?
- Community supported only for free model? Discussion groups, staffed by your support experts
- Who is the target user and persona? What do they care about?
- What are their primary use cases?
- Are there specific industries that you can focus on?
- What is the competition doing? Are there other Freemium/community/Open source offerings
Are you considering Freemium for your B2B offering? Wikipedia actually has great resources around this topic http://en.wikipedia.org/wiki/Freemium
. Drop me a line if you want to exchange ideas!
Last year Matt Aslett of the 415 Group published an blog post titled “What we talk about when we talk about NewSQL”. In it describes NewSQL as “a loosely-affiliated group of companies … what they have in common is the development of new relational database products and services designed to bring the benefits of the relational model to distributed architectures, or to improve the performance of relational databases to the extent that horizontal scalability is no longer a necessity.”
He then followed-up a week later with a definition of SPRAIN which he defined as:
- Scalability – hardware economics
- Performance – MySQL limitations
- Relaxed consistency – CAP theorem
- Agility – polyglot persistence
- Intricacy – big data, total data
- Necessity – open source
And 451 published a report available for 451 clients, from both the Information Management and Open Source practices (non-clients can apply for trial access). The database landscape diagram is perhaps the most interesting in that it covers a wide variety of databases, placing them neatly into functional and architectural buckets:
What made the blog post even more interesting was the comments made to the article debating the scalability of long time RDBMS Open Source favorite (now owned by Oracle via Sun), MySQL. The Q&A site Quora for example has a question posted reflecting why Quora itself uses MySQL successfully and at high scale. Matt and 451 then followed up this January with an interesting MySQL, NoSQL and NewSQL survey, in which he summarizes the results here. The full slideshare can be Greenplum (acquired by EMC) and AsterData (acquired by Teradata) both started with vanilla PostgreSQL and turned themselves into a shared-nothing, MPP analytical DBMS. AsterData had the additional benefit of adding their flavor of MapReduce in the form of SQL/MR. More recently Hadapt is using PostgreSQL to bridge the gap between SQL and Hadoop, allowing both SQL and MapReduce to be run respectively against the data stored in their separate repositories.
There are also a new breed of databases that focus on Cloud and SaaS, a full list and excellent summary by GigOMs Derek Harris can be found here in the article titled “Cloud Databases 101”.
Finally there are those whose fundamental core is architect from scratch. Let’s face it building a completely new database is hard! And starting from a clean slate brings true innovation. NuoDB is one such company currently in beta. They have a patent for a multi-user, elastic, on-demand, distributed relational database management system, that they tout as everything Oracle is not. VoltDB by Mike Stonebraker (founder of Ingres, father of PostgreSQL and CTO/founder at Vertica before starting VoltDB) is another leading contender in the so called NewSQL camp (incidentally Mike has been quite against the NoSQL movement). RainStor (my current company) by the way is one such database, architected from the ground up to handle Big Data, it is NoSQL in its patented storage mechanisms but it presents a completely relational SQL-92 front end for user access. It can therefore scale like a NewSQL database, but its primary use case is not transactional, but focused on mainly static/read only Big data sets. It also plays nicely with Hadoop running natively on HDFS and supports both MapReduce and PIG access, together with ad-hoc SQL-92 queries.
With the database market seeing more action in the last 2 years than it ever has, and the market estimated at 100 Billion and growing, we will likely see more contenders coming and more contenders forming. Right now there are lots of flavors of databases to choose from solving a variety of use cases, using new and old customized technology. It’s a fun time to be in the database space!
A couple of days ago I participated in a DMRadio panel titled “How Big Is Big? Why Big Data Comes in Various Sizes”. Prior to the show I listed my thoughts about the topic in this post titled It’s (not) just size that matters.
As with all the DMRadio segments I’ve taken part in, Hosts Eric Kavanagh and Jim Ericson did a great job of weaving together viewpoints from disparate vendors and technologies into compelling show.
The other panelists (in order of appearance on the show) including myself, were Philip Russom of TDWI, Isai Shenker of Connotate Elif Tutuk of QlikTech. If you are interested in listening to the full show, you can access it here (registration may be required). If you just want to hear my segment and the RainStor perspective with the final roundtable, click here.
If you don’t have time, here is a summary of what was discussed:
- Phil Russom provided some excellent statistics about data volumes and growth, mainly from a DW perspective. He said the PB club is now the defining standard whereas in the 90s people were barely pushing a TB
- Isai Shenker of Connotate discussed the challenges of pulling together data from disparate data sources and formats
- Elif Tutuk of QlikTech talked about analytics and visualization being “the last mile” of Big Data, meaning the delivery to the end-user is key, and the ability to ask questions you were not able to previously, vs. the questions you generally ask are the added value of Big Data
There was also a lively discussion about how Moore’s Law and lowered cost of processing and storage did not necessarily help the Big Data problem, because as Phil Russom said “A file system is not a database management system, which you need to make sense of the data through SQL.”
During the call as I was listening to each vendor presenting their points of view, naturally leaning towards what they offered from a Big Data perspective, it became obvious that the eclectic mix of technologies being discussed would be very compelling for a wide variety of audiences. However, very few would really care about all of the topics being discussed, in relation to their needs and roles around Big Data for their organization.
Hence the title of this post: Your Big Data Perspective. If you are reading this, you are no doubt primarily interested in keeping lots of data efficiently, cost-effectively and providing accessibility to meet your compliance and business goals. Products such as Connotate, QlikTech and hundreds of others may also have a place depending on your use cases and Big Data objectives. So it was very appropriate that we closed the discussion with the notion of an overarching Big Data Platform, and whether that would be or is reality today. The conclusion: one size does not fit all, and innovation and solutions to different Big Data problems come from a variety of best-of-breed technologies. So its not likely the deafening volume of Big Data marketing will go away soon, in fact it may just get louder, but it all depends on YOUR Big Data perspective.
Somewhat regular readers of my blog will note that I like me a good analogy, especially one related to popular culture and in which some good laughs that can be had while getting a technically valid point across. See If Seinfeld evaluated MDM vendors, 10 Marketing lessons from @Shitmydadsays Tweets, and Hadoop – The most interesting technology in the world” for examples of the way I think.
So it tickled my geek side funny bone to happen across @BigDataBorat, who has a profile description of “Learnings of Big Data for Make Nation of Kazakhstan #1 Leading Data Scientist Nation”.
A few of my personal favorites so far:
- I offer new
#bigdata solution having infinite scale(*) and fast access. (*) inifnite scale offer only good first 20 node
- Mrs. BigDataBorat say she arrange for dinner use OpenTable. Is hard keep track all these BigTable clones.
- Nostradamus make absurd prediction and leave before can be verify. This make him world’s first
- Optimist say glass half full. Pessimist say half empty.
#bigdatascientist say need further funding for to reach firm conclusion.
- To me name Vertica sound like perfect application for build data silo
I work for RainStor
, a Big Data database company so I find this funny, and frightfully distasteful at the same time. Well played @BigDataBorat
, well played
This Thursday, July 19, I’m going to be participating in a DM Radio panel titled “How Big Is Big? Why Big Data Comes in Various Sizes,” This will be my 3rd time participating in a DM Radio segment, and if it is anything like the last two, it should be an interesting discussion around the state of the Big Data market today and projections going forward.
Given that RainStor is uniquely designed to ingest and retain data from a variety of data sources, and our patented de-duplication results in the highest compression rates in the industry, it’s no surprise that we’ll have a lot to say on Thursday on this topic. As a teaser, (hopefully you’ll tune in to hear more and participate in the Q&A) the obvious double entendre response is that “it’s not (just) size that matters, it’s what you do with the data.” However before you can do something with the data, you first have to store and retain the data.
So what are the ways of retaining and dealing with large datasets? The current Wikipedia definition states “Big Data” is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage and process the data within a tolerable elapsed time.” By this definition “Big” would imply, beyond the capabilities of traditional RDBMS (Oracle, SQL Server), Data Warehouses and alike. But that leaves room for ambiguity given the growing popularity of a new breed of high performance analytic columnar databases such as HP Vertica and EMC Greemplum to name a couple. Furthermore, there are NoSQL and NewSQL databases (such as Couchbase, VoltDB) designed for internet-scale interactive deployments. Finally the technology now most synonymous with Big Data …. Hadoop, HDFS and various add-on options such as HBase and Hive are rapidly being popularized as the best way to deal with Big Data. The implication being that open source software (being free) is clearly the most economical (But is it really? That’s worthy of debate on the show).
Within the multitude of ways to tackle the Big Data problem we like to think that RainStor offers a competitive option for dealing with Big Data of all shapes and sizes, particularly when variety of access to the data, choice of deployment (Hadoop or non-Hadoop) and where enterprise-grade security, compliance and access are important. Of course size will matter, but RainStor’s compression always solves that problem upfront. I’m looking forward to the show and hope you’ll listen in. If you can’t make it, leave me a comment or tweet me at @RamonChen with your thoughts.
I presented a workshop at the MDM & Data Governance Summit in San Francisco this week on the topic of Big Data and Master Data Management (MDM). It was a particularly interesting topic for me because I have spent the last 8 years working as VP Product Marketing at Siperian (A leading MDM provider acquired by Informatica) and now as VP Product Management of RainStor, a Big Data database provider for the last 3 years.
For the workshop, I presented some base-level definitions around Big Data, types of data and new classes of database, and a quick overview of Hadoop and MapReduce. I was then followed by a real-life financial services institution case study from Manish Sood, the CEO and Founder of Reltio, a new company offering capabilities to model and visualize Big Data from unlimited sources of varying variety. Finally Inderpal Bhandari, VP of Knowledge Solutions, Chief Data Officer, Express-Scripts, Inc. presented a whole range of additional use cases (some Big Data related, some not) ranging from retail to social media analysis.
The audience was very engaging, leading to some interesting questions that I thought I’d reiterate here for your reading pleasure:
Q1. Doesn’t MDM already touch and handle millions of records already? Isn’t that considered Big Data?
While an MDM hub can handle data from many data sources with data volumes in the millions, it doesn’t match the size or complexity of Big Data as currently defined and recognized by most players in the industry. Firstly, MDM cleanses, matches and merges master reference data (e.g customer name, address etc.), which is significantly less voluminous than transactional data (e.g customer orders) stored in applications, which is part of the 360 degree view through which MDM cross references once a system of record is established. Additionally other types of data now include interaction data (e.g. social media activity) and machine-generated data (e.g. from sensors), and those types of data quickly hit the tens of terabytes to petabytes in volume.
Volume is not the only distinguishing factor in Big Data, the other “V’s” include Velocity, the rate at which data is being generated and captured (often in the billions of records per day), and Variety, multi-structured/non-relational data, that cannot be captured and accessed through standard RDBMS and data warehouses.
Q2. How does Big Data affect MDM and what my business users want?
MDM has allowed siloed data sources within applications across an enterprise to be reconciled to gain a 360 degree view of a customer or product, the reality is that new data sources, such as Facebook, Twitter and other forms of social media, have appeared in recent years providing external insights into the behavior, characteristics and relationships of customers. The types of answers that marketing and sales teams are looking to garner now go beyond that in which MDM can provide. For example, it used to be sufficient use MDM to gain an understanding of what products a particular customer is purchasing across an enterprise. Marketers these days now want to know what products that customer may be buying or favoring from competitors, or influencing the purchase of within their social network. To that aim, MDM is no longer sufficient. Ironically while MDM is used to consolidate reference data from multiple internal sources and a few external sources such as DnB etc., gaining insights from Big Data means combining many more sources from different feeds with MDM itself being a contributing source.
Q3. So what is this Hadoop thing, and why should I look at it and other new generation products like Reltio?
Hadoop is a platform that enables big data management at scale on commodity based HW. It features the use of MapReduce that allows data to be processed independent of schema, and can handle ingesting and analysis of extremely high velocity and large data volumes of multi-structured data. This provides an operating framework to ask all manner of questions about the data without having to conform to a fixed data model. In many instances this freedom is combined with a NoSQL form of database (open source HBase or Hive or a product such as RainStor) in order to efficiently manage and provide effective access to the data captured.
This is all well and good if you have the technical expertise (Hadoop consultants are hard to find, hence the popularity of companies like Cloudera) So applications like Reltio are set to take center stage by doing the heavy lifting of capturing, consolidating, modeling and visualization of the data to make sense of it all, without a bus load of consultants.
Q4. What would be the signs that I need to look into Big Data?
Deploying an MDM initiative is a big enough project in its own right and if you are in charge, it may not fall to you to examine Big Data in the context of your efforts. Some signs, as mentioned previously, include your end-users voicing interest in the cross-reference hierarchies and relationships between your customers, suppliers etc. and a hunger to gain more insight from multi-structured data sources and social media feeds. It is more than likely that someone else in your company has already been put in charge of looking at this Big Data thing, but be prepared for that call or tap on the shoulder, as you still hold the “Master” data from which everything is related. So sooner or later the two worlds will collide.
That’s it for now, there were many more questions posed. Let me know if there is interest in exploring more of them, or if you have some questions or comments of your own, please post them for discussion. Also please take a look at my post last year Mastering Big Data Management
As the Facebook IPO frenzy builds up to the pricing and Facebook starts trading this Friday (UPDATE: Facebook has priced at $38 giving it a market cap of about $104B), it got me to thinking about how much data I have uploaded/contributed to Facebook over the last 5 years. Turns out, you can get your own personal slice of the Big Data in Facebook back as a tidy zip file snapshot of everything you have done/uploaded to or had commented on. If you want to try it yourself take a look at the instructions here.
Since I joined Faceook in 2007 it appears that I have generated or uploaded about 1.5GB of data. The Zip file returned (after it took 5 hours – combined file preparation time and bandwidth needed to download), contains a nice HTML index page, which provides a strip down version of the photos and comments in chronological order, just like your wall. The basic capability was made available in 2010 and extended with an enhanced archive option, after complaints made by Irish users who reported their concerns to the Irish data retention commissioner.
So how much Big Data is in Facebook and where is it kept? The popular details making the rounds of Hadoop and Big Data conferences focuses mainly on the huge clusters running Facebook data warehouses running on Hadoop and Hive. There was an interesting article on Facebook’s corporate blog about their massive Hadoop migration (30PB worth) last year to a larger data center. However on a daily basis, the repository and platform you interact with is still powered by MySQL databases.
Given the publicity around how traditional “relational” databases can’t handle internet scale, and that NoSQL databases are the way to go, the fact that Facebook still operates MySQL as the backend is eye opening. This has prompted critique from database experts such as Michael Stonebreaker (Vertica and VoltDB fame) to state that Facebook is “trapped in a MySQL fate worse than death”. This was followed up by another GigaOM article detailing how Facebook is able to make MySQL scale.
The article details how Facebook is not just relying purely on MySQL, and that they have a massive layer of memcached servers that are being used as an in-memory database highlighting that MySQL servers on their own couldn’t possibly handle the read load of live Facebook traffic. For functionality such as the Facebook Inbox, Hadoop and HBase are used instead. Additionally Hadoop is used as the backup for the MySQL data.
Back to my personal download of my Facebook information, I was quite impressed by the time (again 5 hours) it took to download a bulk copy of my personal data. However, I doubt that many users leverage the download option today, rather with more and more users joining, and increasing upload of data per account, it will be interesting to see if the MySQL architecture can continue to hold up, and how the Facebook’s use of Hadoop, Hive and other Apache projects will evolve for Big Data warehousing and analytics.
First of all, congratulations to all Splunk employees, VCs and shareholders! Today is a great day for your company and those of us in Big Data (see who else is who in Big data here).
Almost 12 months ago I wrote a post titled: “LinkedIn’s IPO –A Perfect Storm of Big Data,Open Source and Cloud Computing” in which I marveled at the then $9B market cap a few days after the IPO. I noted that LinkedIn used or involved 3 core technology areas: Big Data, Open Source and Cloud Computing.
Today, I was excited to see that Splunk IPOed and immediately doubled in price making it worth a cool $3B, mostly on the basis of the hype and reality of Big Data. For an interesting financial and company analysis, see Dave Kellog’s post in January about Splunk’s S1 and impending IPO. It’s a great analysis, describing Splunk’s marketing as “the Virgin America of log file analysis,” as evidenced by one of many funny tag lines that often appear on t-shirts they hand out to their users and at trade shows:
The only thing he may have missed the mark on, was that he felt the predicted $1B valuation to be rather high. Wonder what he thinks of the $3B market cap today? Irrational exuberance, high tech bubble, Instagram effect or Big Data trending?
What does Splunk provide for its $40M VC money raised, $3B market cap on $121M in revenues and a $11M loss? According to “About the company” on Splunk.com
Splunk was founded to pursue a disruptive new vision: make machine data accessible, usable and valuable to everyone. Machine data is one of the fastest growing and most pervasive segments of “big data”--generated by websites, applications, servers, networks, mobile devices and the like that organizations rely on every day. By monitoring and analyzing everything from customer clickstreams and transactions to network activity and call records–and more, Splunk turns machine data into valuable insights no matter what business you’re in. It’s what we call operational intelligence.
Splunk was dealing with what is now known as Big “machine-generated” Data since its founding in 2006 long before Hadoop was popular and data was merely large. Splunk cracked the code for helping a very influential constituent, the internal IT group of organizations who were struggling to analyze and manage the millions of logs generated by expanding infrastructure and data growth. In face many companies were dealing with such data at scale BH (Before Hadoop), but there is no doubt that Hadoop has raised the interest in Big Data and Big Data technologies to fever pitch, and Splunk’s IPO has just launched a nuclear missile into that explosive ammunition dump. Of course, I’m particularly excited about the Splunk IPO, since I work for RainStor, a Big Data database company, and we too at RainStor deal with the same Big Data and have done so BH. We also recently announced ourselves as the first database to run natively on Hadoop.
The Splunk IPO is good for everyone who’s in the Big Data space, both from a VC valuation perspective but also a general public understanding of the types of real-life Big Data challenges that our technologies are looking to solve.
In closing to show that everything is happening at warp speed and extraordinary valuations, note that the CEO of Splunk, Godfrey Sullivan (who incidentally has 8% of the company now valued at about $250M – see table at the end of the post), was previously the CEO of Hyperion (founded originally as IMRS in 1981, 26 yrs old), when they were sold to Oracle in 2007 for $3.3B on revenues of about $1B with 2500 employees. Today Splunk (founded in 2006 – 6 yrs old) is pushing on that market valuation with just $121M in revenues and about 500 employees.
It won’t be long before more Big Data related IPOs and M&A follow, thanks and congrats again to Splunk for leading the way.