Data Collection Blind Spots

CiKATA Lean Six Sigma 4.0 Data Analysis Design

In a previous article, “Are You Ready for AI?”, I posed the question if organizations were ready for Artificial Intelligence or Data Analytics based on how they would sequence the 5 Post-Its in figure 1 below.  That was a bit of a trick question and was set up to discover potential blind spots in data science and artificial intelligence.

CiKATA Lean Six Sigma 4.0 Data Analysis Design

First let’s discuss the solution to the 5 Post-It Sequence in the previous article.  Imagine you and your team are working to determine the top sales drivers for the “Kiosk Channel” of Redbox DVD rentals. These top drivers will then be used to create a strategy for driving top line growth in this channel.  The title image shows two people at a Redbox Kiosk at a grocery store.  If you were a team participant, which post-it would you use to start this study? 

Let’s use the following questions to toss this around.  Remember, you must sequence all of the post-its.

  1. Can I start with the Data and forget the other 4 Post-Its? Is this possible? Let’s check the logic.

  2. Can I get data before observing the activity I’m trying to understand?

  3. Can I get data before measuring the activity I’m trying to understand?

  4. Can I get data before quantifying (measuring) a description (fact) of the activity I’m trying to understand?

  5. Can I describe (fact) the Kiosk image in multiple ways?  Absolutely!  The key here is, what’s the question?

  6. Can I get data before observing the activity I’m trying to understand? Yes, but would it be meaningful?

Based on these questions, one can guess the Post-It sequence I would use to understand the drivers of DVD Rental Sales in the Kiosk Channel.  Figure 2 below suggests that in order to get good representative data, I first need to be able to quantify (measure) a fact (description) of an observed process or activity and then be able to capture it reliably and repeatability in order to call it “Data”. But that’s just me. I could be wrong.

CiKATA Lean Six Sigma 4.0 Data Analysis Design

Figure 3 below may be what an observer sees during a typical study. However, the yellow callouts need to be teased out to understand the context of any activity or process being studied. It’s not hard to understand that even if you observe an activity or process, it will take multiple observations, interviews and empathy studies to capture the essence of Figure 3. It should not be a surprise then that the sequence of Post-Its to the left of the Data Post-It in Figure 2 are extremely important to the original question of what the top factors influencing DVD Rentals in the Kiosk channel. 

CiKATA Lean Six Sigma 4.0 Data Analysis Design

Let’s now pivot to the core question of this article. “Are we ready for artificial intelligence?” If we were to task a team to discover which factors are most influential for driving revenue for the Kiosk Channel at Redbox, what approach or framework would be used in the data science or machine learning world? As it turns out, the approach varies. But what is strikingly different in the data science and machine learning world compared to what I described in the Lean Six Sigma or Continuous Improvement world above is they start with the Data Post-It. Yikes!!

In an article published in KDNuggets yesterday, the author discusses the various machine learning project checklists that appear to be in vogue today. Click on the link for the article. What is interesting or disturbing is there is little to no mention of the business context or business case that should be driving the project. Nor does it discuss anything about return of investment. What this essentially suggests is that the existing data that will be used in a predictive or machine learning project is being accepted at face value regardless of whether it is meaningful or precise.

“Houston, we have a problem.”

In fact, most of what I’ve read in this article is limited to accessing, cleaning, deriving and deploying the data so it can be used to build an analytics base table for prediction modeling. This is not a good thing or bad thing. Most AI start ups or companies that claim to do AI are great at exactly that. But, make sure you have data that is representative and meaningful first. Remember, garbage in, garbage out.

 
Figure 5

Figure 5

 

Figure 5 summarizes the blind spot I see in many data science projects. They skip the business understanding steps I’ve modeled above and jump right into preparing, cleaning and piping data to create the Analytics Base Table which will then be used to create a model for analysis. I call this approach -

“cold calling” for insights.

Skipping the business understanding phase is why up to 90% of machine learning projects will fail or under perform.

Figure 6

Figure 6

Here’s the punchline. Figure 6 above does not have to be used in every data science or machine learning project. But I highly recommend it. It’s important to remember that data sources were not designed for predictive analytics or machine learning. Careful study in the business domain is vital to understand the nuances of business behavior in order to select the right data that can properly represent the features appropriate for predictive analytics or machine learning.

We walk participants through this and similar topics in all of our Lean Six Sigma and Analytics courses.

If you enjoyed this article, click the like button and share with your network. If you would more information with Statistical Process Control or related initiatives contact us at succeed@cikata.com. Don’t forget to click on our social links for the latest.

Previous
Previous

The Four Agreements

Next
Next

Data: The Chicken or The Egg?