By using this Site, you consent to the use of cookies. For more information, please see our Privacy Policy.


Big Data Analysis: When It Comes To Attribution, It's All Or Nothing

Volume 2, Issue 4 - April, 2012

Editor, IQ Advisor

“Big Data” is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process within a tolerable amount of time. Big data sizes range from a few dozen terabytes to many petabytes of data in a single data set. And if you believe everything you read these days Big Data is going to change the way everything is done—from bill payments to renewable green energy. That’s why so many providers of products and services are racing to claim that they are “Big-Data-capable.” And the marketing attribution management arena is no exception.

To Sample or Not to Sample

Traditional data analytics methodologies use representative sample groups to better understand hidden patterns and trends within the entire, larger universe of available data. The process of defining these sample groups is often the most time consuming task, requiring multiple iterations, eating up valuable time in delivering results to marketers, and frankly risking the possibility that the sample is not representative of the entire universe of users exposed to your marketing efforts.

With the recent advances in computing power and mass production of very large storage devices that make the handling of Big Data fairly routine, there are no longer any compelling reasons to take short cuts in performing attribution or any other data analytics. Today, the entire universe of available data should be used to analyze and yield the insights which translate into actionable steps to optimize your marketing performance.

Two Different Approaches

In traditional analytical sampling a subset of data (i.e. test group) is subjectively created by applying a number of filtering criteria. For example, if you wanted to identify a group of potential customers who will churn within the next 30 days you could limit the data set by including only accounts:

  • With a delinquent status
  • With an outstanding amount of $20 or more
  • With a credit score less than 700
  • With last customer touchpoint within 10 days of delinquency

As you start to narrow down a subset of the entire universe, you are essentially limiting the data used in the analysis and risk missing the rest of the elements/factors about which you are unaware. And if the filtered subset does not yield the desirable results, a new filtering group needs to be defined and reprocessed.

The Big Data approach to answering the same question would be to separate all the users in your entire universe into different buckets by a particular range within each bucket and then analyze what is discovered about each range (and combination) within each bucket. Some of the buckets would include:

  • Delinquent dollar range
  • Last touchpoint recency
  • Last touchpoint frequency
  • Last touchpoint type (i.e. service, bill payment, complaint)
  • Account status in the last 30 days

Once you have created multiple groups using all of the available data and processed this information, more advanced questions can be answered, such as:

  • Which individual buckets yields the highest churns?
  • What is the combination of delinquency period, amount, credit score yields the highest churns?
  • Did those who churned have any particular type of touchpoint (i.e. complaint) with high (> 3) frequency within certain time (< 10 days) of cancellation?