Big data can be used to create a powerful supporting argument for nearly anything. Wielded foolishly or maliciously, the potential for harm is great. It has never been more important to use math responsibly.
This article is based off a B2B Market Research Podcast episode.
You can listen to the episode or read the article below.
That’s the core premise of “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy” by Cathy O’Neil. The book demonstrates the importance of looking critically at big data analytics. Especially now that analytics increasingly sway the decisions made by powerful institutions such as business, technology, and government. I highly recommend this book for anyone who leverages big data and/or works in tech.
Business Loves Big Data
We have access to more data than we’ve ever had before. Companies of all types have embraced big data. Retail, healthcare, education, political campaigns… and tech. There are many, many examples of this.
Consider the huge amount of data in an average marketing technology stack. You can:
- Track clicks on your site.
- See which site visitors were on before coming to yours.
- Determine how often visitors came to your site before clicking “add to cart” or filling out a Contact Us Form.
- View the content visitors download from your site.
- Gather exponential amounts of data on social media, content analytics, brand engagement, etc.
Think of the amount of data that cloud services product teams have to consider. A typical SaaS product manager will look at:
- Sign-in frequency.
- The percentage of users that take a certain action within the product.
- How often a customer persona uses certain features.
- Retention stats on customers.
- Churn metrics.
- A customer’s lifetime value.
By the nature of the job, market researchers also add to the piles of data company leaders have to consider. Every in-depth interview, focus group, survey, and usability study adds to the pile.
Things To Consider
So far, I doubt I’ve said anything controversial.
You probably have a sense of the breadth of big data your company collects and relies on to do business.
It’s less likely that you’ve deeply considered the validity of your data.
Are you confident that your company uses big data ethically? How about sensibly?
Bad Big Data
O’Neil argues that there is a dark side to big data collection. To be clear, O’Neil is not advocating stopping all big data efforts. Instead, she asks us to acknowledge both the good and bad side of big data.
O’Neil is more than qualified to comment on such a complex subject. She has a Ph.D. in math from Harvard. O’Neil has been a Barnard College professor, a quant in the financial industry, a data scientist, and other impressive titles. In addition to authoring 2016 National Book Award nominee “Weapons of Math Destruction,” O’Neil wrote “On Being a Data Skeptic” and co-authored “Doing Data Science: Straight Talk from the Frontlines.”
She gives numerous “weapons of math destruction” examples. The stories cross all different types of businesses, from social media to education, travel to finance, and so on. She also tackles how weapons of math destruction are affecting the political process.
Here are a few of the “weapons of math destruction” that she describes.
O’Neil explains how insurance agencies use big data analytics to generate price ranges that vary dramatically from one consumer to another. In many cases, prices are based on proxy data like the neighborhood the person drives through or their credit score. Shockingly, proxy data sometimes holds more sway than the individual’s driving record.
This is quite harmful. Consumers are not graded on merit (driving behavior). Further, consumers may have little to no personal control over the proxy data they’re being graded on.
Insurance companies also leverage big data to develop byzantine pricing structures. Such structures are far too complex for most individual consumers to understand.
As O’Neil puts it in the book, “These pricing tiers are based on how much each group can be expected to pay. Consequently, some receive discounts of up to 90 percent off the average rate, while others face an increase of 800.”
This proxy data may have no real impact on the number of insurance claims filed over a driver’s lifetime. This results in a system that is less than fair.
Elsewhere, O’Neil covers how Facebook modifies user’s newsfeeds. Unfortunately, this filtering often culminates in dramatically polarized results.
Rankings Run Amok
Perhaps the most compelling example in the book focuses on college rankings.
Since 1983, students and parents have relied on US News & World Report’s college rankings when choosing colleges. The report was designed to evaluate 1,800 colleges and universities throughout the US and rank them for excellence.
However, there was a problem early in the process.
“They had no direct way to quantify how a four-year process affected one single student, much less tens of millions of them,” O’Neil writes.
Sadly, the US News & World Report didn’t have data on the success rates of college students once they graduated. Surprisingly, they didn’t even know how to define success. Would it mean that they made X amount of money over their lifetime, or that they had a good credit rating, or that they somehow contributed to society?
Since they didn’t have more relevant data, US News & World Report relied on proxy data.
As O’Neil puts it, “They looked at SAT Scores, Student-Teacher ratios, and acceptance rates. They analyzed the percentage of incoming freshman who made it to sophomore year and the percentage of those who graduated. They calculated the percentage of living alumni who contributed money to their alma mater, surmising that if they gave a college money, there was a good chance they appreciated the education there. Three-quarters of the ranking would be produced by an algorithm – an opinion formulized in code – that incorporated these proxies. In the other quarter, they would factor in the subjective views of college officials throughout the country.”
Cheating By Proxy
Colleges have been racing to game the rankings ever since. O’Neil makes this point quite clearly by examining the actions of Baylor, Texas Christian University, Iona College, and other institutions. (Read the book for these scandalous stories.)
For instance, a reviewer of the US News & World Report found that a Saudi university had contacted a host of highly regarded mathematicians and offered them $72,000 to serve as adjunct faculty.
O’Neil writes that the deal “…stipulated that the mathematicians had to work three weeks a year in Saudi Arabia. The university would fly them there in business class and put them up at a five-star hotel. Conceivably, their work in Saudi Arabia added value locally. But the university also required them to change their affiliation on the Thomson Reuters academic citation website, a key reference for the U.S. News rankings. That meant the Saudi university could claim the publications of their new adjunct faculty as its own.”
The Data That’s Missing
Importantly, O’Neil points out that the US News & World Report didn’t include data on tuition and fees in their ranking criteria. The absence of this meaningful data had a huge impact. Rankings didn’t suffer if colleges posed a huge and disproportionate cost to students.
Between 1985 and 2013, college tuition has skyrocketed 500 percent. While some of this can be attributed to other factors, the US News & World Report rankings are at least partly to blame.
Consequently, many students based their college selection on rankings that did not include any criteria for financial success. These rankings may lead students to pick expensive colleges with dubious payoffs. And remain in debt for their education for years to come.
O’Neil states, “…many poisonous assumptions are camouflaged by math and go largely untested and unquestioned.”
Misinformation That Keeps On Misinforming
You may be thinking, “Assumptions get fixed, right? Good programmers fix bad code. Good data scientists fix bad algorithms.”
Not always. Perhaps not even all that often.
In the case of the US News & World Report, there are assumptions that haven’t been challenged by real data in 29 years. The report is still living off proxy data. In other words, that’s one powerful weapon of math destruction.
This book teaches us that proxies are dangerous when not handled correctly.
Proxy Data Isn’t Always Bad.
In the absence of meaningful data, companies are often forced to rely on proxy data.
Many businesses must use proxy data when real data is inaccessible. Perhaps the appropriate research sample is too costly to obtain. In other cases, there is not enough time to gather real data before a decision needs to be made. In such cases, companies turn to proxies.
Is this an inherently bad practice? No. Sometimes it’s quite expedient.
For example, if you want to know the size of a tech company’s employee base, you might look at LinkedIn data. LinkedIn will give the number of employees who work at the company. Obviously, LinkedIn’s data won’t be 100 percent accurate. LinkedIn can only base its estimates on employees with LinkedIn profiles. Luckily, for tech companies, LinkedIn’s employee estimate will likely only be off by a few percentage points. For companies in industries that don’t utilize LinkedIn as much, that margin of error will probably grow. However, LinkedIn data is a reasonable source of proxy data when tech companies don’t publicly release their employee counts.
Here is another example of the reasonable use of proxy data. You might look at search traffic to a competitor’s site to estimate their growth.
Essentially, you just need to recognize when you’re looking at proxy data. Wherever you can, add real data to your evaluations.
Are You Using a Weapon of Math Destruction?
These are warning signs that you’re using proxy data dangerously:
- Proxy data models influence decision making at scale.
- Unclear criteria form the basis of accepted data models.
- There is huge potential for damage to individual consumers, citizens, or businesses.
We Need To Think Hard About The Validity of Our Data.
This book forces us to ask hard questions about big data ethics. We should be asking ourselves:
- Am I looking at proxy data? If so, am I okay with that?
- Is the model I’m using fair? Does it need more real or qualitative data?
- Does the data model I’m using qualify as a weapon of math destruction?
- Do I need to build a new data model?
To be able to answer these questions, every tech sales, marketing, or product development leader should read this book.
In closing, I’ll let O’Neil have the final word.
She writes, “Data is not going away. Nor are computers—much less mathematics. Predictive models are, increasingly, the tools we will be relying on to run our institutions, deploy our resources, and manage our lives. But as I’ve tried to show throughout this book, these models are constructed not just from data but from the choices we make about which data to pay attention to—and which to leave out. Those choices are not just about logistics, profits, and efficiency. They are fundamentally moral.”
This podcast is brought to you by Cascade Insights. We specialize in market research and competitive intelligence for B2B technology companies. Our focus allows us to deliver detailed insights that generalist firms simply can’t match. Got a B2B tech sector question? We can help.