A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.

Why Life Sciences Must Go Beyond MDM

Originally Published at Reltio.com/blog

Many of the team here at Reltio formed the nucleus of Siperian (acquired by Informatica in 2010), the leading on premise MDM tool widely adopted by life sciences companies. Back in 2005, master data management  (MDM) was just taking shape and companies used MDM primarily to improve Siebel CRM data quality before upgrading and migrating their on premise systems. Back then Siperian was preferred by many to the “seamlessly integrated Siebel Universal Customer Master (UCM)” offering, proving that best-of-breed solutions can be superior to integrated offerings that are designed for a single primary purpose.

One of the biggest issues we faced with Siperian (now Informatica MDM) was defining a relational life sciences data model that could capture not only the basic attributes of healthcare professionals and organizations, but represent real-world  HCP-to-HCP, HCO-to-HCO and HCP-HCO relationships. Also thrown in for good measure was an emerging need to master product data, product hierarchies, groups and baskets for pricing and competitive analysis, and to feed product information (PIM) systems.

At Siperian we admittedly struggled with basic hierarchy management and performance issues with merge and especially unmerge. While we preached multi-domain and coined the term Universal MDM, we were never completely successful with standalone product master data management, let alone bringing together both customer and product data into a single consolidated Siperian Hub.

Back then, the best databases we had to model and store life sciences entities and their relationships were the likes of Oracle, DB2 and SQL Server. Cloud and big data technologies such as graphs, columnar stores, HBase (on Hadoop) and Cassandra simply weren’t available.

Fast forward to the present, the MDM landscape remains more or less unchanged despite a quantum leap in technology.

  • Informatica MDM is still an on premise solution with many of the same challenges we faced while at Siperian
  • Veeva Network is a cloud-based customer master offering by Veeva Systems, focused on improving Veeva CRM data quality. Much like Siebel UCM did for Siebel CRM over 10 years ago
  • Customer and product masters are still supported through separate siloed hubs, even when built using the same tool. In fact, Gartner continues to publish separate customer and product magic quadrants as if to re-enforce this fact
  • Master data must still be delivered to data warehouses or operational data stores in order for business users to get a promised “complete view”
  • MDM systems and tools built on 1990s relational database technologies continue to hinder the ability to model real-world many-to-many-to-many relationships that graph technologies are designed for

For the most part, life sciences companies are no closer to getting basic affiliation management functionality, or their dream of an all encompassing key account management application as they are hindered by legacy MDM tools. Even a new wave of cloud-based MDM solutions do not make things any better. The good news is that life sciences companies can avoid a new kind of MDM (“Making Da-same Mistake”).

The most popular consumer facing applications today such as LinkedIn and Facebook have shown that business facing data-driven applications can be cloud-based, handle multiple data domains, manage structured, unstructured, master, transactional, activity and social data. Companies should expect complete end-to-end “modern data management” instead of relying on recurring “next generation master data management” promises that remain unfulfilled.

Gartner MDM Reports: Leaders are Not Always the Best Choice

It was great to read the annual Gartner MDM Magic Quadrant reports for customer data solutions and product data. As usual Research Directors at Gartner Bill O’Kane, Saul Judah, Andrew White, and their teams have done an excellent job assessing all the players and providing insight into the options available.

As is customary, the report contains the definition of “leaders” in the quadrant which ends with this final statement:

“Leaders have the strategic vision to address evolving client requirements; however, they are not always the best choice. Since MDM technology advances have remained fairly modest over the last 10 years, that statement hasn’t necessarily been true with the usual suspects such as Informatica, IBM, Oracle and Tibco, dominating the customer data solutions quadrant with their size and popularity.”

@Reltio we believe that advances in cloud and big data technology have fundamentally changed the game, for not only MDM, but also data management in its entirety. Reliable master data should be an expected component of a new wave of data-driven applications. It should be seamlessly blended with transactional and activity data for a complete view, without the need for federation or separate consolidation.

We were delighted to be able to brief the Gartner team earlier this year, leading to this kind mention in the customer data solutions report:

“Reltio is a newer vendor specializing in nimble cloud-based capabilities to master and integrate data from enterprise applications, third-party data feeds and social media (see “Cool Vendors in Information Governance and MDM, 2014”)
With leaders still focused on traditional MDM, enterprises that need more agile and differentiated capabilities are looking at new and innovative options. The Gartner report provides a comprehensive list of companies for consideration. For once the statement that leaders are “…not always the best choice.” is truer than ever.

Two Million Reasons to Say Thanks!

I noticed that my blog counter now reads over 2 Million page views! (and about 67,000 unique people who have visited).

A great big thank you to those of you who have bothered to read, comment and sometimes reference my blog.

I never expected to have this much traffic and interest, though way back in 2011, my blog was listed #272 most popular in the data category.

Together with the thank you, a bit of an apology since many of you have emailed me to ask why I haven’t been blogging as regularly as I used to. The main reason is that I now have twin boys (who are now 2 years old) and where I used to be able to stay up late nights to do my blog posts after my day job, these days I am investing more time ensuring that they are well cared for. I’m still able to put out the odd post which hopefully you will find interesting. Nothing gets me going though than good interaction and discussions with you guys, so please keep the comments (or if you are not comfortable email me directly) coming.

That’s it for this very quick post. Thanks again!

10 Things I Learned (So Far) Carrying A Bag

I read an interesting article today titled 9 Silly Mistakes that Marketers Make, #5 in particular caught my eye

#5. Telling salespeople how to sell.

Some marketers attempt to train salespeople to sell, based upon their experience in marketing. In fact, selling is like sex; you can’t possibly explain how to do it well unless you’ve been doing it for a while.

It struck home for me particularly because I have spent the last 3 years on and off  “selling”. Having spent 20+ years of my career developing strategies and executing tactically on various facets of product, corporate, partner marketing as well as product management, I felt the need to challenge myself in a new area. One in which “carrying a bag” would put me out of my comfort zone, and as a marketer make me face the harsh reality of sales.

Like many marketers, I’ve been guilty of sales commission check envy, believing that many sales people luck out and rely on a collective support system to close deals, and possess little or no detailed product knowledge. As I soon found out, those were minor misconceptions that actually detracted from the bigger picture of what it takes to be a great salesperson.

I entered my new challenge with confidence, believing that I knew our products as well as anyone, could stand toe-to-toe with CTOs when grilled technically, and could rely on my marketing spin and positioning to align our products to the challenges and solutions required by customers. I definitely had sales fantasys of large commission checks in my future. So how did I do? I give myself a C so far in selling but I feel that I have added some A’s to my marketing game now that I see both sides clearly.

My 10 top learnings (so far, with many more to come):

  1. Relationships Trump Technical Knowledge – The adage “People by from people they like”  comes to mind.
  2. Building Relationships Take Time and Energy – Dinners, smalltalk, drinks and golf games may seem like fun, but it takes a certain personality and in many cases high tolerance-level individuals to pull it off
  3. Poor Quality Or Weak Funnel of Leads Spells Doom – It’s impossible to turn sand into gold, and even harder to find your own opportunities
  4. They Love Me, They Love Me Not – I’ve lost track of the number of great 1st meetings and calls with prospects who are “definitely interested, let’s do a POC”. Only to never respond to a follow-up until 2 months later, then after the second call, never respond to a follow-up, until the 3rd call etc.
  5. End Of Quarter Has Never Come Faster – Those increments of 3 months are pressure packed with targets that mean so much to the organization and even more so when you have family depending on your variable paycheck
  6. There Are So Many Things Out Of Your Control – Customer internal politics, organizational realignment, varying stakeholders and let’s not forget competition!
  7. It’s Not Over, Even When You Win The Deal – Getting my first commitment from a customer buy was exciting, then came dealing with procurement, and then beyond that actually getting a PO to recognize revenue
  8. Hunters Have To Farm Too – In a small startup there is no such thing as selling and running. If you have any designs on getting add-on business OR having this key account as a reference, you have to track and manage the success of the implementation. And more relationship building
  9. Marketing Materials Everywhere. Except The One I Really Need – No shortage of corporate brochures, white papers or datasheets, except I need something that speaks directly to the customers unique pain and viewpoint (fortunately I was able to create my own as needed :-))
  10. That Press Release Is So Hard To Get – As Marketers, we say “just write it into the contract that they will do a press release”. The reality is that the person who approves the sale has no juristiction to approve a press release. Doesn’t matter what is written contractually. Like you’re going to sue your customer for that press release

Bottom line, I’m still learning and improving my skills at both ends of the sales and marketing spectrum. I’m pleased to say that I have a much higher respect for top sales talent and what it takes to earn that big commission check.



Big Data and MDM Revisited

Last year I presented at the MDM Summit in San Francisco and wrote a blog post titled Big Data and MDM – Where Quantity Meets Quality. At the time, there was a fair amount of curiosity around how the massive data volumes being generated from social media sales and marketing channels would impact more “traditional” Master Data Management deployments and processes, put in place by enterprises over the last 10 years.

Some brief background about my interest in this topic: I previously ran product management and product marketing at Siperian, the MDM leader, which through acquisition is now Informatica MDM. I then moved into Cloud, xaaS and Big Data, joining my present company RainStor 4 years ago. During this journey, I mused in previous posts about the life of an enterprise sales person vs. one who sold SaaS offerings, voiced opinions about the MDM players and ecosystem, and kept a very close eye on the state of MDM, while positioning RainStor as a leader in the exploding Big Data market.

So let’s start with who Gartner thinks is the leader these days in MDM? Firstly, Gartner still tracks Customer MDM and Product MDM separately. With two excellent analysts Bill O’Kane and Andrew White jointly contributing on the state of play for each category’s Magic Quadrant. John Radcliffe, who previously blazed the trail for Gartner on Customer MDM has since retired.

On the topic of Big Data and MDM, a tremendous amount has been written about the topic since my presentation at MDM Summit. Here is a sample set:

Over the last 10 years, companies have made MDM a priority and from a technology vendor’s perspective a multi-billion dollar market. Not only for the core MDM software, but all the touch points of hardware, data integration, analysis and movement tools, and the significant data governance and architectural consulting that accompany a typical multi-year MDM deployment project.

With Big Data and synonymously Hadoop continuing to be a hot topic, the intersection of the two is reaching fever pitch and the big players all realize that they must align their products, strategy and marketing to meet the interest and growing demand.

So while my current company RainStor, which provides a Big Data database doesn’t directly impact MDM and core reference data, since the scale and volumes still dwarf those of transactional and data warehouse/analytic data, I continued to keep a keen eye on developments with more than a passing personal interest.

As ever, feel free to comment or drop me a line if you’d like to have a discussion about this very intriguing topic. BTW, MDM and Big Data may be interesting, but I actually believe there is a related topic of even greater interest. Stay tuned; I may follow on soon with my thoughts in that area. J

World Series Winning Leadership – Business Lessons From the 2010 & 2012 San Francisco Giants

As a devout San Francisco Giants fan, I’ve been giddy over the last 3 years reveling in the success and more so the incredible personalities, characters and adversity beating drive of a team that won the WS in 2010 and now in 2012.

I’m a big proponent of finding lessons in different walks of life (e.g. my blog post 10 Marketing Lessons from Shit My Dad Says Tweets ad If Seinfeld evaluated MDM Vendors), so when it came down to delving into the nuggets that could be gleaned from the 2012 San Francisco Giants, I was licking my chops. As it turns out, the remarkable run of the Giants has already spawned a plethora of well written articles. Some of my favorites include:

So instead I’m turning my analysis to the differences and similarities between the 2010 vs. 2012 teams as I’m curious to examine the “formula” that the Giants are utilizing to such success. Here are a few observations:

  2010 Giants 2011 Giants 2012 Giants
1. Finding missing pieces to complement the core. Picked up Javier Lopez, Ramon Ramirez, Mike Fontenot and importantly Cody Ross Picked up Carlos Beltran from the Mets Picked up Hunter Pence and “Blockbuster” Marco Scutaro
While others “gave up” on role type players like Cody Ross and Marco Scutaro. The Giants found gems and both turned into NLCS MVPs. Fun fact: If Ross had not joined the Red Sox in 2012, Scutaro would not have been sent to the Rockies! Where the Giants picked him up.


So what happened in 2011? Apart from missing Buster Posey, many pointed to Carlos being injured, but also that he didn’t “fit chemistry wise”.

2. Focus on Pitching Giants pitching shut down the powerful Rangers hitters. With Lincecum, Cain and Bumgardner all home grown Giants pitching wasn’t as good as in 2012, Zito hit rock bottom and Jonathan Sanchez flamed out Giants pitching shut own the powerful Detroit hitters. Again Lincecum, Cain, Bumgardner and now Zito & Vogelsong contributing
It took several years of drafting great pitching and watching young stud hitters go to other teams before 2010 it finally clicked. Management’s belief and philosophy that good pitching beats good hitting proves true in both 2010 and 2012.
3. Chemistry and Leadership 2010 everyone pulled for each other. The Misfits were ignited by Edgar Renteria who made his impassioned speech that drove the turn around With Posey out, there wasn’t a passionate leader. Carlos Beltran didn’t contribute as a leader or blend in well. 2012 “the reverend” Hunter Pence stepped in with “the speech” that spurred historical comebacks
In 2011, the loss of Posey was undeniably the biggest blow. Both physically and from a leadership standpoint. Aubrey Huff lost his desire and Pat Burrell retired. Carlos Beltran, though adequate from a batting perspective, was not a fit “chemistry wise” and did not return in 2012
4. Knowing When To Say Goodbye 2010 The arrival of Buster meant the departure of Bengie Molina. Bengie still had game left, and he helped the Rangers meet the Giants in the WS. 2011 management retained Aubry Huff at an expensive price. But did let go of Juan Uribe to the Dodgers 2012 They traded the very popular Andres Torres and Ramon Ramirez to the Mets for Angel Pagan. George Kontos came for Chris Stewart. And Nate Scherholtz went to the Phillies for Hunter Pence
It’s hard to say, but baseball is a business and sentiment is trumped by facts and what’s best for the team. 2010 and 2012, the Giants made all the right and somewhat in the eyes of fans, hard moves.

So what have I learned from comparing the 2010, 2012 and 2011 Giants that I can relate to and apply professionally?

1. Every one pulling on the same rope and the belief in team cannot be understated in any walk of life. The sum of the parts exceed the whole.

2. Consistency and belief in a master plan. Changing directions, following the hot lead, topic could be considered “agile”, but building a long term winner with a repeatable formula requires dedication and commitment.

3. Leadership and vision is an absolute must. Chemistry among team includes joint belief in leadership. Leadership comes from passion, experience and sacrifice.

4. When the time comes, you need to let people go. Whether they no longer perform as you expected, or their skills no longer match what you are trying to achieve. As my CEO once said to me, as a Director you may still care more about your people than the company, as an executive and VP you have to care about the company first. None of the people will survive if the company does not.

Go Giants 2013!


A Simple Marketing Messaging Framework

Last month’s post Pixar’s 22 Rules for Storytelling –Applied to Software Marketing generated some interest, mostly emails asking me how to use a simple marketing messaging framework vs. the laundry list of rules to follow from my Pixar analogy.

There are many ways to go about this, but here is one of my favorites that is a simple messaging framework that fosters discussion quickly and hits all of the high points:

1. FOR: Know thy audience for your message (and product) – This is the most important starting point by which many messaging initiatives fail before they even get off the ground. Many make the mistake of generically stating their audience as “CIO”, when realistically their sales guys will never get to the CIO. If you are not ready to message down to the decision maker level, start with the market category or characteristics of the company you want to target (industry, profile etc.)

2. WHO NEED TO: What is the challenge and forcing function to buy? This is not a feature list. Companies do not buy features and “cool” they buy products that solve their business problems and justify the investment and have an ROI

3. THE: Insert the name of your product here

4. IS A: Use the most coherent single sentence describing the class and type of product

5. THAT: Express the benefit statement up front

6. UNLIKE: Differentiation. How is this different from the competition or other ways of solving the problem

7. IT PROVIDES: Describe now WHAT it does not HOW. So no mention of “features” but the capabilities which translate to benefits results

Here is an example of pulling it all together (with xxx for your customization. you can successively refine and use this for targeted personas as well):

1.FOR: Companies in the xxx industry that have or will have a total of xxx of data or more

2. WHO NEED TO: Retain and access the data on demand for business or regulatory purposes

3. THE: xxxx

4. IS A: xxxx architected for Big Data from the ground up

5. THAT: Allows companies to securely keep and access as much data as they want, for as long as they want at the lowest TCO

6. UNLIKE: Traditional xxx that are xxx or xxx that are xxx, that are complex to manage, maintain and expensive to deploy and scale

7. IT PROVIDES: The most cost-effective way of xxxx, enforcing xxx, providing business analysts and data scientists with xxxx


Pixar’s 22 Rules for Storytelling – Applied to Software Marketing

These rules were originally tweeted by Emma Coates, Pixar’s Story Artist.  Looking at the list, it comes as no surprise how every movie at Pixar becomes such an incredible hit, and their characters genuinely finding their way into our hearts, minds and wallets.

Regular readers of my blog and those who have worked with me in the past know that I have a penchant for using emotional movie or TV theme based music and video in events, presentations and messaging. I believe that good marketing connects with its audience, much the way a feature film such as Pixar’s Toy Story or Up does. So I view developing compelling product and solution marketing messages to have much in common with the 22 rules listed by Emma below. Following Pixar’s rules are my adapted list and how it could applied from a software marketing perspective, with related edits in bold and CAPS:

Pixar’s 22 Rules of Storytelling

  1. You admire a character for trying more than for their successes.
  2. You gotta keep in mind what’s interesting to you as an audience, not what’s fun to do as a writer. They can be very different.
  3. Trying for theme is important, but you won’t see what the story is actually about til you’re at the end of it. Now rewrite.
  4. Once upon a time there was ___. Every day, ___. One day ___. Because of that, ___. Because of that, ___. Until finally ___.
  5. Simplify. Focus. Combine characters. Hop over detours. You’ll feel like you’re losing valuable stuff but it sets you free.
  6. What is your character good at, comfortable with? Throw the polar opposite at them. Challenge them. How do they deal?
  7. Come up with your ending before you figure out your middle. Seriously. Endings are hard, get yours working up front.
  8. Finish your story, let go even if it’s not perfect. In an ideal world you have both, but move on. Do better next time.
  9. When you’re stuck, make a list of what WOULDN’T happen next. Lots of times the material to get you unstuck will show up.
  10. Pull apart the stories you like. What you like in them is a part of you; you’ve got to recognize it before you can use it.
  11. Putting it on paper lets you start fixing it. If it stays in your head, a perfect idea, you’ll never share it with anyone.
  12. Discount the 1st thing that comes to mind. And the 2nd, 3rd, 4th, 5th – get the obvious out of the way. Surprise yourself.
  13. Give your characters opinions. Passive/malleable might seem likable to you as you write, but it’s poison to the audience.
  14. Why must you tell THIS story? What’s the belief burning within you that your story feeds off of? That’s the heart of it.
  15. If you were your character, in this situation, how would you feel? Honesty lends credibility to unbelievable situations.
  16. What are the stakes? Give us reason to root for the character. What happens if they don’t succeed? Stack the odds against.
  17. No work is ever wasted. If it’s not working, let go and move on – it’ll come back around to be useful later.
  18. You have to know yourself: the difference between doing your best & fussing. Story is testing, not refining.
  19. Coincidences to get characters into trouble are great; coincidences to get them out of it are cheating.
  20. Exercise: take the building blocks of a movie you dislike. How d’you rearrange them into what you DO like?
  21. You gotta identify with your situation/characters, can’t just write ‘cool’. What would make YOU act that way?
  22. What’s the essence of your story? Most economical telling of it? If you know that, you can build out from there.

22 Rules Applied and Adapted for Software (and technology) Marketing

  2. You gotta keep in mind what’s interesting to THEM as a BUYER, not what’s fun to do as a MARKETER. They can be very different.
  3. Trying for TAG LINE is important, but you won’t see HOW YOUR PRODUCT TRULY DIFFERENTIATES til you’re at the end of YOUR VALIDATED MESSAGING PROCESS. Now rewrite.
  5. Simplify. Focus. Combine AND GROUP MESSAGES. Hop over detours. You’ll feel like you’re losing valuable stuff but it sets you free.
  6. What is your PRODUCT good at, comfortable with? Throw the polar opposite at them. Challenge them. How do they deal? IF THEY DON’T DEAL WELL ON PAPER, DON’T PUT YOUR SALES TEAMS INTO UNWINNABLE SITUATIONS BY INCLUDING THOSE USE CASES JUST TO INCREASE MARKET OPPORTUNITY
  7. Come up with your ending before you figure out your middle. Seriously. Endings are hard, get yours working up front. CAN YOU DISTILL YOUR CORE PRODUCT MESSAGE DOWN TO THREE WORDS OR LESS?
  8. Finish your MESSAGING, let go even if it’s not perfect. In an ideal world you have both, but move on. Do better next time. HOME RUN WILL OCCUR ONLY WHEN IT RESONATES WITH THE BUYER. 
  9. When you’re stuck, make a list of WHY SOMEONE WILL NOT BUY YOUR PRODUCT. Lots of times the material to get you unstuck will show up.
  10. Pull apart the MESSAGES FROM SUCCESSFUL (non-software) PRODUCTS/COMPANIES you like. What you like in them COULD BE APPLIED WITH A FEW TWEAKS (Like we are doing here) TO YOUR TARGE T MARKET ; you’ve got to recognize it before you can use it.
  11. Putting it on paper lets you start fixing it. If it stays in your head, a perfect idea, you’ll never share it with anyone. COLLABORATE AND MARKET TEST.
  12. Discount the 1st PRODUCT FEATURE that comes to mind. And the 2nd, 3rd, 4th, 5th – get the obvious out of the way. Surprise yourself. FIND DIFFERENTIATION BEYOND PRODUCT FEATURES.
  13. Give your PROSPECTS AGGRESSIVE OPINIONS. Passive/malleable Q&A might seem likable to you as you write, but it’s poison to the SALES PERSON WHO WILL BE HIT WITH THE TOUGH QUESTIONS.
  14. Why must you SELL THIS PRODUCT? What’s the belief burning within YOUR COMPANY that your SALES TEAMS feeds off? That’s the heart of it.
  15. If you were your CUSTOMER, in this situation, how would you feel? Honesty BACKED UP BY REFERENCES AND PROOF POINTS lends credibility to unbelievable METRICS AND OUTRAGEOUS CLAIMS.
  16. What are the stakes? Give YOUR PROSPECT A REASON TO ROOT FOR YOUR PRODUCT. What happens if they don’t BUY? Stack the odds against.
  17. No work is ever wasted. If it’s not working, let go and move on – it’ll come back around to be useful later. WITH TECHNOLOGY  AND MARKETING MESSAGES, WHAT’S OLD IS NEW AGAIN (Examples: Mainframes to Client-Server to Server-Based Computing to On Premise Appliances to Cloud)
  18. You have to know yourself: the difference between doing your best & fussing. MESSAGING is testing, not refining. YOUR BEST MESSAGE IS NOT WHAT YOU OR YOUR COMPANY THINKS IT IS. IT’S WHAT YOUR CUSTOMER AND BUYERS TELL YOU IT IS.
  19. COMPETITIVE JABS to get YOUR COMPETITION into trouble are great; coincidences to get them out of it are cheating.
  20. Exercise: take the building blocks of THE CURRENT MESSAGE OR YOUR COMPETITION’S MESSAGES YOU DON’T LIKE. How do you rearrange them into what you DO like?
  21. You gotta identify with your PROSPECT, can’t just CREATE ‘cool’ MESSAGES. What would make THEM BUY?
  22. What’s the essence of your MESSAGE? Most economical telling of it? DISTILL IT DOWN TO A FEW KEYWORDS. If you know that, you can build out from there.


How Ingres Is Still Changing the Database World

Back in 2005 Wired magazine published a very interesting issue depicting how George Lucas’ blockbuster movie Star Wars eventually spawned a whole legion of special effects experts, all the way from Pixar to movies by James Cameron. The graphic can be found here if you are interested

It got me to thinking how some of the recent new innovations in database technology have similar spin off roots. For example from Michael Stonebraker who founded Ingres and PostgreSQL databases we have many other successful companies in the marketplace.


  • PostgreSQL is the foundation of Greenplum, acquired by EMC and Asterdata, acquired by Teradata
  • Michael also founded Streambase in 2003
  • Streambase’s CEO at one point was Barry Morris who is now co-founder with Jim Starkey at NuoDB a NewSQL database
  • Daniel Abadi collaborated and co-authored (together with others the C-store, column store paper). Daniel has now founded Hadapt a PostgreSQL/Hadoop combo
  • They also collaborated on H-Store which is now the commercial offering VoltDB

Of course this is no surprise, in an industry as large as the database market ($100 Billion in opportunity), there’s room for more than one or even a hundred better mousetraps. But I just have to say kudos to Mr. Stonebraker for 4 decades of innovation and the continued desire and successful evolution of database technology.

As you may have guessed my title “How Ingres is still changing the database world” is a bit tongue in cheek, since Ingres has since renamed itself Actian and has Ingres and Vectorwise as two separate products and aren’t in any way directly related to what Mr. Stonebraker is doing today. But from those humble beginnings in the Berkeley lab, his early pioneering work on relational databases in the Ingres and Postgres projects, has made things very interesting and competitive in a world which also includes NoSQL databases, Hadoop and alike.

It’s definitely a fun time to be in the Big Data and database space!

Structured vs. Unstructured Big Data Q&A

I was asked recently to define structured vs unstructured data and how the different types of data were being managed within the enterprise. I thought I’d list my responses below in case you find it useful/interesting. As ever, feedback and debate welcome :-)

What are the challenges seen with the two different types of unstructured information — unstructured data, such as machine-generated data, versus unstructured content, such as human generated information in emails or social media?

To be clear, machine-generated data (MGD) does have structure. It’s just that the structure is not strictly enforced in a traditional relational database context. In many cases, the data can be considered multi-structured, since there are several ways the data can be viewed, without being fixed to a rigid permanent relational format. For example, MGD is often placed into Hadoop in raw form, and subsequently provided structured through late binding MapReduce processing. Data is often also loaded into HBase and then Hive is used to provide structured with a SQL like syntax to gain traditional analytical insights.

The reason that many call MGD unstructured, is that it is often stored in flat files rather than in a relational database. These flat files are consistent with the containers that hold unstructured human generated content, such as emails or other forms of social media.

  • From a storage and retention perspective: The key difference between MGD and emails or other social media content is that the eventual structure of MGD allows it to be efficiently analyzed, and compressed through value and pattern de-duplication. My company RainStor can use the structured within MGD to reduce the raw physical footprint of this data without losing any of its meaning, thereby saving significant amounts of storage. With unstructured content such as emails and other forms of free format text, space savings are limited to binary compression (much like that for images and video) that can only marginally reduce the amount of storage required to keep the data.
  • From an accessibility perspective: MGD can be analyzed using both MapReduce and traditional SQL using existing BI tools. At RainStor we provide a database over Hadoop that allows both forms of access. Interpretation of emails and social media content requires free-text like scanning of content to find patterns and to build metadata and indexes to which key the search and discovery of the information. Free text search also requires context and ontologies to be effective; there are specialized products such as HPs Autonomy and Oracle’s Endeca that provide such capabilities.

Are businesses being inundated with this data? Are they getting value from this data, or is most of it passing through unnoticed?

The main two reasons why this data might be retained are:

  • Compliance: MGD in many industries such as Telco with Call Detail Records and Financial Services with trades and quotes are regulated to retain this data for pre-mandated periods. The massive storage requirements generated by accumulation of this data has led to companies seeking out new databases such as RainStor who can not only reduce the data footprint through value and pattern de-duplication, but provide the immutability required to meet regulatory demands. For unstructured content such as email communication, there are laws that require discoverable emails, internal or external and there are many products that provide archiving tools to capture and retain this content.
  • Business Competitive Edge: MGD can provide competitive edge as patterns can be explored over greater volumes of data over larger time periods, providing the data can be retained. Sophisticated CRM systems which include customer interaction and support through emails already have technologies that can interpret the “mood” of a customer through the tone of their emails.

Will the value of this data surpass structured transactional data? If there is value now being seen, what types of applications or processes are taking advantage of unstructured data?

Both sets of data are equally as valuable. Although it can be argued that enterprises are slowly awakening to the value of unstructured Big Data. New types of databases and technologies such as Hadoop are being used to take advantage of this data. Additionally at a macro level, social media analysis of Twitter trends and comments, through products such as Salesforce.com’s Radian6 already provide an insight into crowd sentiment and a way of engaging with the prospect or customer through what Salesforce deems the “Social Enterprise”.

What areas of the business are now benefiting from unstructured data — both machine- and user-generated data? What are the issues that still need to be tackled with the two types of unstructured data?

As previously detailed, compliance and regulation by industries such as Telco and Financial services are dictating company-wide retention of Big Data ranging from MGD to email content. Additionally across all industries, marketing departments are seeking better ways to connect with the customer by analyzing their sphere of influence, who they are connected to socially to help drive more targeted sales. They also use social media to manage their company brand by leading the way with their use of sentiment analysis and internet/web related capture and review of clickstream data to better understand customer behavior and patterns.

Is there progress integrating this data into core enterprise systems?

Every enterprise is investigating the use of Hadoop to handle unstructured data, and looking for ways to bridging the gap through integration with transitional enterprise structured systems to enable predictive learning on top of the combined information. RainStor provides the ability to handle the unstructured MGD while also retaining and providing access to both through traditional enterprise tools and new MapReduce paradigms. Allowing you to ask traditional questions of the data, but also to explore questions that have yet to be thought of.

As far as blending truly unstructured content such as social media, emails and enterprise data, large companies such as HP, Oracle, Dell, IBM, Salesforce and others are building or acquiring complementary technologies to provide an enterprise view across all data sources. Other up and coming startups aiming to provide a cross-data all encompassing view of Big Data for enterprises include startups such as Factual, Clearstory and Reltio.