Thursday, January 18, 2024

Using Historical Tournament Trends to Fill Out My Bracket Projection

Over winter break, I was fascinated by an article written by Will Warren (@statsbywill on Twitter) that used historic trends and stats to best pick the 2023 NCAA Tournament. I would recommend to take a read through if you're interested, and he also wrote articles for 2021 and 2022 (both of which are linked in his article), but his picks from 2023 particularly stood out to me, especially with context from what happened during the tournament.


While he did have Houston over Purdue in the national championship, his Final Four was rounded off by both San Diego State and UConn. Obviously, those two teams were in last year’s national championship, and having picked San Diego State over the likes of Alabama, Arizona, and Baylor was especially impressive to me. At the end, Will was essentially 3 picks away from having a great bracket. Some other picks throughout the article that particularly stood out for me were Arkansas over Kansas and Michigan State over Marquette.


After having read this article, I was inspired. Will spoke of a magical complete tournament data document, which I devoted hours recreating. I downloaded all of the Kenpom pre-tournament data since the 2001 season and plugged in tournament results in order to create a complete dataset which contains trends, statistics, and results from the previous 22 NCAA men’s basketball tournaments.


From there, I compiled rating systems and metrics in order to project the best possible selections, all while encompassing the trends that Will included in his article into one complete, customizable output. In order to validate the various outputs, I tested the rankings against the results of previous tournaments, all of which were highly successful in predicting some of the best selections. For example, the rating system correctly predicted 13 out of the Sweet Sixteen teams from the 2023 tournament, including perfect Sweet Sixteen matchups for the East (FAU, Tennessee, Kansas State, Michigan State) and the West (Arkansas, UConn, Gonzaga, UCLA) Regions.


Here’s a sample of the output of the code using 2023 tournament data to better understand what we’re working with. For the 1st pod in the South Region, it outputs that both Alabama and West Virginia are good choices from the given pod. It also outputs the probability that each team in the given pod reaches the Sweet Sixteen, the Elite Eight, and the Final Four. Then, it prompts the user to make their selections for each game, including the probability that each team wins the given game. Using this example, Alabama had a 97 percent chance of beating Texas A&M-CC. The user is able to continue doing this until the bracket is completed, with various types of information throughout the output. At the beginning, a ranking of the most likely upsets based on historical trends is given and they’re also highlighted when the user has to select a team for these games. It also provides advice for selecting “toss-up games”, or games between 7- and 10-seeds, or 8- and 9-seeds. 


Obviously, no bracket will be perfect, even with this app. However, using this app as a supplement, I hope to be able to output selections that will put users (i.e., myself and some friends) in the best position to win bracket pools in March. We’ll see what happens in practice later in March, but since I’ve put in this work to create the project, I want to put it to use. So, every week, a couple days after my weekly bracket projection update, I intend on filling out my bracket using my app as a supplement, and then randomizing the results of the bracket to see how well I would do in a bracket pool.


So, here’s the first edition. I’m linking the Google Sheet here with my bracket because it might be easier to look at, but I’m also adding images of each region as I break it down. As a bit of a key, every team that is on the line was my selection. Green means that I was right, red means that I was wrong, and the team above a team in red is the team that actually ended up winning that game.


Purdue vs. Texas A&M is an easy matchup. Nevada is worse than Texas A&M according to the rankings and Purdue is easily the best team in the pod. Surely nothing will go wrong this year… right? From there, the code actually triggers an upset alert for both Kentucky (9th most likely) and Clemson (4th most likely) in this pod, but I decide against it to play the numbers, and I end up failing both as a result. Kentucky and Clemson are given as the most likely teams from the pod with S16 ratings of 3.70 and 3.21 (out of 5), which should generally be a red flag that the algorithm isn’t too sure about this pod, which is exactly why I put Purdue to go through to the Elite 8 on the top side of the Midwest Region. Duke is the heavy favorite out of Pod 3, with a S16 rating of 5.26, so that’s an easy pick. Nebraska is the 3rd most likely upset, so I follow the algorithm with this one. According to the output, Utah is the most likely team from Pod 4, so that’s who I have for my S16 team. It ends up not mattering at all because I select the wrong R32 teams from this pod. Western Kentucky is flagged as the 12th-most likely upset from the field, so it was certainly a possibility. From there, I went with Purdue and Duke as my E8 teams, both of which had the best probabilities for their seeds. The algorithm actually told me to choose Purdue, but I decided to pick Duke under the assumption that many other brackets would likely select Purdue, meaning Duke would be a great value pick. Unfortunately, in this simulation, I clearly should’ve listened to the algorithm for the Midwest Region Winner.

The West Region was a very strong region for me. The algorithm gave Villanova better odds as the team from Pod 1, but North Carolina was also indicated as a team likely to make it to the Sweet 16, and I ended up going against the algorithm with this one in a way. North Carolina to the Sweet 16. Even though Indiana State was the 7th most likely upset, I went with BYU over Marquette in the Round of 32. My first slip up in this region was going against Oregon, which was the 8th most likely upset in the field. But Baylor to the Sweet 16 came true. From there, Arizona over Michigan State was a sound pick and one I was willing to make. Again, I went against the algorithm in choosing BYU. My reasoning was that Villanova was the more likely team to make it to the S16 according to the algorithm, so I would take my chances with BYU in case North Carolina was tripped up. Then, I went Arizona in the E8 and F4, both of which were recommended by the algorithm. Unfortunately, the latter was simply wrong in the simulation. All in all, nothing too major to be upset about with the West Region.
The South Region was my joint-worst performance for my R32 selections, but my selections after that were perfect. For the toss-up games, I went with what the algorithm gave me and sometimes those are just wrong, and there’s not much you can do about that. Both Saint Mary’s and Boise State were flagged as upset alerts over Alabama and Iowa State respectively, which was the reason why I went with Creighton and Illinois in those pods. Both Alabama and Iowa State were actually given the edge for those pods, but I went with my gut and chose Creighton and Illinois in those instances because of the possible upsets. From there, it was simply chalk, and that’s what the algorithm put out. Overall, really good region, and my first correct Final Four selection.
The East Region was similarly very strong. All 3 options of UConn, Texas Tech, and Wake Forest were spit out as possibilities for the first pod, so I went with the strongest probability for the S16, which was clearly held by UConn. Texas Tech was the only slip up in the R32 for this region. The Dayton-Grand Canyon-Memphis-Samford pod was actually a very intriguing pod and I’m sure would genuinely be thoroughly entertaining if it were to happen in March. Grand Canyon was flagged as the most likely upset in the field, and Samford was flagged as the second most likely upset in the field, which is absolutely wild. Both Dayton and Grand Canyon actually had better S16 ratings than Memphis and Samford as well. I went with my gut for this pod because it was very difficult to see how it would end up, so I went with Dayton to the S16. New Mexico was also triggered as the 6th most likely upset, and Auburn was really a no-brainer, with a 92 percent win probability. Auburn to the Sweet Sixteen. TCU was marked as the most likely team to the S16, but I stuck with Wisconsin, which worked out in this case. Of the bunch, UConn was the most likely to make it to the Final Four, even though they were the 3rd most likely to make it out of their pod in general. As a result, I went with Auburn to the Final Four and, in hindsight, should’ve probably gone with Dayton too, but that pod was volatile as it was. At the end of the day, Auburn becomes my second correct Final Four selection. Not bad at all.
This is where it falls apart. Coming into Final Four weekend, I would probably be pretty happy. 2 teams in the Final Four and likely a chance at winning my bracket pool. Unfortunately, my Arizona pick didn’t matter, and my Houston pick was just flat wrong.
41 out of 63 picks is really good. I had 14 correct S16 picks and 6 correct E8 picks. If something similar to this exercise happened in March with my bracket, I would probably be pretty pleased. Sometimes, that’s just how the cookie crumbles, and you can’t win them all. Although, I will say, if this result panned out in March, I wouldn’t care about my bracket since I would’ve seen the Boilers win it all in Phoenix. One fan’s pretty decent bracket is another fan’s pure elation.The plan is to repeat this process weekly until March hits. If you have any questions, hit me up @TSBBracketology on Twitter and I might give some more info in the next edition.

No comments:

Post a Comment