Sports Fun Yields Analytics Teaching Moment

Always run the numbers, even when it seems the answer’s clear.


In the throes of Chicago’s bleak winter months early this year, two of my OpenBI colleagues and I set out to have some geeky sports fun. College basketball was going strong and the consensus from most TV “experts” at the time was that the Big Ten was the best among the many conferences “top-to-bottom”. The three of us, each with affiliations to Big Ten schools, decided we’d test that conjecture by contrasting conference performance with rankings readily available on the Web, at the same time honing our expertise in several of the visualization packages OpenBI works with.

There’s no shortage of formulae to rank the 347 Division I college basketball teams organized into 32 conferences. For years, the Rating Percentage Index or RPI was the standard. Now, there are so many competitors, you can’t sort them out without a scorecard. Not surprisingly, the different ranking schemes correlate pretty highly. For last February’s exercises, the OpenBI team settled on RPI and Pomeroy. Our task was then to summarize and compare the performance of the conferences by the rankings of their constituent schools.

After I assembled the dataset, colleague Marina built a dashboard in Tableau, Leonard developed one in Pentaho and I coded mine in R with lattice graphics. Our methodologies differed. Marina rated the conferences using the median rankings of constituent teams. Leonard’s approach involved “matching” teams in the different conferences – how did the highest ranking team in the Big Ten compare to the highest ranking team in the Big 12, etc. My thought was to compute “quantiles” of the overall rankings by conference. For each conference, I computed a statistic summing the ranking of the top team, the 25% team, the 50% (median) team and the 75% team. I then ranked the conferences by that statistic, lower being better.

Two hypotheses divined from discussions with college sports friends drove my approach. First, “top-to-bottom” is generally a misnomer. Of the 12 teams in the expert-acknowledged best Big 10 last year, two had pathetic records while two more would charitably be described as mediocre. Indeed, it’s almost impossible for a conference to be good top-to-bottom. So instead of using all conference teams, I considered only the top 75% performers in each conference for my computations. Second, having very highly-ranked teams helps a conference’s stature more than having low-ranked teams hurts. For a conference average ranking of 65, better psychologically to have a #2, #4, #118 and #136, than a #55, #60, #65 and #80. I call that the Duke/Carolina halo.

One finding all three analysts shared was that the data didn’t support the expert rush to anoint the Big Ten the top conference. Based on mid-February rankings, we felt it a dead heat between the Big Ten and the Big East for top conference honors. Six weeks later, Louisville from the Big East beat Michigan from the Big Ten for the national title. Affirmation.

Another consistent finding with each of our methodologies was sub-par performance of the Atlantic Coast Conference (ACC), home to Duke and North Carolina and arguably the top hoops conference over the last 35 years. I couldn’t resist tweaking my daughter, then a freshman at ACC mainstay Wake Forest, on the ACC’s sad predicament.

Last Monday, my daughter texted me to crow about the ACC’s stature in the just-announced NCAA women’s soccer tournament pairings. No less than 8 ACC teams were included in the field of 64 and, impressively, all four top seeds were from the ACC. What’d I think of the ACC now, she deadpanned?

I was astonished actually, and quickly scraped the latest RPI data to confirm the ACC’s ascendant status. Alas, the data again told a different story. Click here to see  Figure 1, which details the findings. The conferences are ranked from left to right, with the Big Ten edging out the ACC for top honors. For each conference, the jittered grey dots depict the rankings of its teams. The lower vertical bar indicates the top (0%) and 25% teams; the hatch in between the bars the median (50%); the upper bar the 75% and 100% teams. The overall conference statistic is the sum of 0%, 25%, 50% and 75% rankings, lower being better.

That the top 25% and 50% of ACC teams are distinguished is clear. If my calculation stopped at 50% rather than 75%, the ACC would win handily. Not surprisingly, when shared the graphic, my daughter demanded that I change the methodology to include only the top half of teams!

So what’s all this have to do with analytics for business? Not a whole lot really. Perhaps if anything, experts and analytics don’t necessarily jive and different analytic formulae will yield disparate results. One should always run the numbers, even when it seems the answer’s clear. The data might surprise.

Steve Miller is co-founder of a Chicago-based business intelligence (BI) services firm OpenBI, LLC, that specializes in delivering analytic solutions with both open source and commercial technologies. You can reach him at steve.miller@openbi.com. This story originally appeared on Information Management.