When Public Data = Big Data – 3 Tips for Competitive Intelligence Teams: B2B Market Research podcast
Episode 80: – When Does Public Data Equal Big Data – 3 Tips for Competitive Intelligence Professionals
- Why “public” data isn’t the same as “Internet data.”
- How the Internet is like big data with a nice UI
- How you can get quant from the Internet and aggregate it in an almost “big data,” powerful way.
For more free B2B market research and competitive intelligence resources go to our Resources page.
Sign up for podcast updates – here
Welcome to another episode of The B2B Market Research podcast. In this podcast episode, we’re going to talk about the term ‘public data’. We’re going to talk about why some folks say public data almost with a sneer and other folks seem to say public data with a smile and how you can know when to treat public data one way or the other.
Before we get into that, a few brief programming notes. First, Cascade Insights, the firm I co-own, focuses exclusively on the needs of the B2B technology sector when it comes to competitive intelligence efforts. We work with technology marketers, product management teams, strategy teams, and with centralized CI and research teams.
With that, let’s go ahead and get into the podcast. What do I mean when I say that some people say public data with a sneer, almost like they’re deriding it, the second the words get out of their mouth and other people seem to say it with a smile on their face?
The reason for this is temporal, meaning people who think of public data as it existed in the ’70’s and ’80’s, and even the early ’90’s, I think, tend to think of public data as something that exists in a library, in a dusty stack of magazines, and periodicals, and books.
Today, public data is very, very different. It still exists in the libraries and those types of institutions, but when we think of public data, as a firm, we think of Internet data. We think of what we call getting ‘quant’ from the Internet. In a lot of ways, I think this is illustrated by a story and an experience that I had.
I went to my first SCIP conference in 2006, for the Strategic Competitive and Intelligence Professional Society. I noticed that no one was talking about the Internet. It was almost as if the Internet didn’t exist. This seemed very strange for me being a guy who was a technology professional, I was a certified programmer, network engineer, a database administrator. My first company was focused on the tech sector in a very in-depth way. We did everything from implementations to evangelism work. When we sold that company and founded Cascade Insights, I had a lot of technology background.
I go into this industry conference, of which the technology industry is a part, along with other industries like farm and manufacturing. No one is really talking about mining the Internet. At this point, in 2006, the Internet was not an unknown commodity.
I found this really interesting, but there was this other phrase that kept getting mentioned at the conference, ‘public data’. It was almost said with a bit of a snort in a derisive tone of voice. I couldn’t quite understand it because I couldn’t reconcile it. I understood that they meant library data was maybe dusty and old, and last year’s data, so to speak. What I couldn’t understand is that, to me, the Internet actually had some fairly up-to-date data.
That started us down a path where we built a variety of assets to leverage Internet data and to get quant from the Internet. One of the things was publishing Going Beyond Google, Gathering Internet Intelligence in its first edition. We published that right around ’07, ’08 and we’ve continued to update it ever since. The sixth edition comes out early next year. What I find is people need sometimes really clear examples of the difference.
That’s what we’re going to do in the remainder of this podcast. I want to talk about a couple discreet differences, actually three specific ways that public data isn’t really exactly the same thing as Internet data, or the kind of quant that you can get from the Internet.
The first example is, “How long would it take you to read through three hundred million book pages?” If you really think about that question, you would say, “Well, Sean, it would take a really, really long time to do that.”
The short answer is, it’s going to take a long time to read three hundred million pages, but if you go to LinkedIn, they have three hundred million profiles there. In aggregate, those LinkedIn profiles can provide a great many insights on competitor sales teams, marketing efforts and product development features.
If you pause right there, that data is public, not all of an individual’s profile data is public, but the things I just mentioned, in terms of looking at competitor sales teams, even some product development futures, these things can be divined, at least in part by looking at LinkedIn data, sometimes almost in whole. Why wouldn’t you look at that?
I think the primary issue is, you have to look at certain types of Internet data as really the database that they are first. They’re big data almost. They may be big data with a really nice UI on top of it so it doesn’t feel like you’re dealing with big data and you’re doing big data analysis, but, in many ways, you are.
How many sales people and customers would you need to talk to if you wanted to understand all the countries in the world, where customers were talking and thinking about a given competitor? You’d need to talk to a lot of people. You probably think, “Well, at a minimum, I’d need to talk to one person per country and I’d probably even want to talk to more people than that.” One person per country might be fairly misleading.
In a lot of ways, isn’t this, sometimes how companies do it? They have a country lead or regional lead and they ask that one person, “What have you heard about the competitor?” This country or regional lead might be then commenting on competitor activity across maybe five or six countries and potentially tens of millions of potential buyers.
When you look at that scenario, a different data set you could look at, is you could look at assets like SimilarWeb and Positionly and Google Trends and you could build out a map of search activity related to competitor product names and services. This stuff is relatively easy to do.
In most markets, we know that search activity is an increasing part of even the sales cycle and customers are paring down vendors and suppliers they’re going to pick simply by the searching they do in the beginning. It is a fairly good indicator of where the early stages of the sales cycle and definitely where the marketing cycle is happening for a competitor.
You’re again looking at something that lets you get quant from the Internet and it lets you get it in a really aggregate, again, almost big data-like way. This isn’t public data, said in that almost sneering and snorting way, this is very powerful data.
How many users would you need to talk to before you had a clear sense of migration issues or integration issues with a complex set of B2B software? Most people would say that, depending on a lot of variables that we’re not going to get into in terms of how many countries and how many products and how many different variations on deployment scenarios and things like this. But if you took a really simple, straightforward approach, you’d probably say I’m qualitative, you may be looking at twenty or thirty interviews or more.
On a quantitative study, you might start at a hundred survey respondents and you might scale that up dramatically beyond that. If you were to access a competitor’s support forum, or an industry forum, where a lot of products were being discussed in in-depth ways, and this happens all the time in the technology industry, you might literally have access to thousands or tens of thousands of users. You might even be able to get reasonable longitudinal data from that forum. You have to have the right tools and techniques to do it. Those are the things that we’ve been able to build up over time.
In short, public data isn’t really the same as Internet data or, as we say, the issue of getting quant from the Internet. It’s really important to know the difference. With that, I want to wrap up this podcast. I want to thank you for listening. Hope to have you along in the next one. Thanks for listening.
Get in touch
"*" indicates required fields