BE Analytics has been on a bit of a hiatus the past few weeks, which has surely been concerning for the handful of people who enjoy watching fights while doing math at the same time. I’ve been working on a new prediction model and preparing for the biggest academic sports econ conference of the year. Sleep and morning jiu-jitsu have suffered, but it’s been worth it. My current MMA research project and the upcoming conference will be discussed later. Today’s piece is an introduction to a new preview and prediction series based on analyses of fight statistics.
I use a model which takes historical fight data – some standard statistics and some created by yours truly – and estimates each fighter’s odds of success in an upcoming bout. The process is entirely data driven. I setup and test the model, but there’s no human element to the subsequent predictions. They’re purely mechanical with no personal opinions attached. You might see this as a good or a bad thing, but either way, it’s a different offering from other fight previews and predictions out there.
Results from the very first version of the model were published at FOXSports.com earlier this year and given the headlines "Insider Trading: Sure-fire bets…" and "Insider Trading: Guaranteed money-makers…" which was a little frustrating. Predictions based on probabilities that aren’t zero or 100 percent involve getting things wrong. It’s an inescapable fact of life. The idea is to use information to improve one’s ability to get things right or to find good edges against the money line. Nothing in the fight game is guaranteed or sure-fire so you won’t find that type of talk here at Bloody Elbow.
How It Works
For each matchup, the model produces two numbers representing each fighter’s odds of winning. For example, Ross Pearson had a 68.4 percent chance of beating Diego Sanchez at UFC Fight Night 42 last Saturday, leaving The Dream-Nightmare with 31.6 percent plus hometown, confused or tipsy judges. When a fighter has greater than 50 percent odds, he’s the predicted winner. Sometimes the estimated odds make perfect sense, sometimes they make you think and every once and a while they make you think the model’s broken.
Instead of explaining the odds by discussing regressions, variables and specifications – kinda boring stuff – I comb through the fight statistics to find the story I believe the model is predicting. So while the estimated odds are purely data-driven, the story formed from the fight stats is my interpretation of what the model is seeing.
Predictions will only be made when both fighters have at least three prior bouts in major MMA. The reason is simple. It’s kind of hard to use historical data to make reliable predictions when a fighter has very little historical data. This means the focus will tend to be on main card fights and those towards the top of the prelims, but this is where most people’s interest is anyway.
Think of all the times we’re quoted general statistics like strikes landed per minute or takedowns landed and defended. You’re not going to find those stats in this series. Why? A fighter could have excellent takedown defense from distance and crappy defense in the clinch. His overall takedown defense statistics might appear average but that would be masking the truth. As for strikes landed per minute, what kind of strikes, where do they land and where do they take place?
I'm not disparaging general statistics. They're quoted because they're user-friendly, concise and most people don't care about the details. But for those who do, there's much better information available. There’s a huge difference between a fighter who lands shots by getting takedowns and then ground and pound, one who grinds in the clinch on the cage with dirty boxing and one who stays at distance and picks opponents apart. Fighters can have advantages and disadvantages in a wide variety of MMA positions and it’s important to examine the details whenever possible.
My statistical stories are always broken down into the three key positional categories: at distance, in the clinch and on the ground. Fighter propensities are analyzed in each position as well as their tendency to be in and get out of different positions.
Each fight has a spreadsheet of 322 statistics. No, that’s not a typo and, yes, it was a bitch to make. After running the prediction model, I comb through all of these stats and find the story I think the model is seeing. It’s a time-consuming endeavor but also fun and fascinating to learn the statistical ins-and-outs of so many fighters.
How to Think About Odds
It’s tempting to think that 50/50 odds means a fight will be very close while 85/15 odds means one fighter will crush the other. Try to avoid this temptation. Odds represent the likelihood, before the fight takes place, that one fighter’s perceived advantages win out over the other’s disadvantages.
Take the classic striker vs. grappler matchup. Will the grappler’s takedown offense, clinch and ground control rule the day or will the striker’s takedown defense, standup abilities and (of course) striking? The odds can be close to 50/50 while the fight ends up being a one-sided beatdown. When they actually get locked in the cage the information set changes, we get to see whose advantages win out and it may end up a lopsided affair. So don’t think of the odds as how close a fight should be. Think of them as the degree of certainty about fighters’ advantages and disadvantages on any given night before the cage doors lock.
How to Think About Predictions
For any single event, the accuracy of the predictions is meaningless. Long-term performance is all that matters when making decisions based on probabilities. If you play hold’em poker, sometimes your pocket aces get cracked and other times they hold up. Sometimes you suck out on the river and other times you don’t. Sometimes you play great poker and leave a big winner and other times you play great and get hammered - and now I'm getting excited for Fight Week in Vegas! Random things happen in the short term, but over the long term randomness fades.
A similar idea holds for predicting fights. Don’t sweat the short term and focus on having a sound analytical framework fight-by-fight and event-by-event. Historically, the model predicts around two-thirds favorites and one-third upsets. I’ll track its accuracy and bet recommendations and we’ll see how things go over time. A summary will be reported every January and July, starting in January 2015 since this July is too soon.
You can expect a BE Analytics preview and prediction article for every UFC pay-per-view and FOX event as well as any other events with matchups that are sufficiently interesting. The model’s predictions will be included in the Bloody Elbow staff picks under my name for all events. Estimated odds will also go out on Twitter at @MMAanalytics for fights with enough data.
UFC 174 isn’t the most riveting fight card, but the series will start with an analysis of Johnson/Bagautinov and MacDonald/Woodley later this week. Other matchups will be added if I have time.
This is Revenge of the Nerds meets Bloodsport. We already know Ogre lost to Chong Li, but the real question is could Louis and Gilbert have accurately calculated his probability of success?
DISCLAIMER: If you’re thinking of using this information to gamble, do so at your own risk. While the model has been tested on the past and earned a 59.1 percent ROI selecting bets with good edges from 2009 through the present, I make no guarantees regarding the future. If you see that a fighter has a -150 line with an estimated 64 percent win probability and you decide to bet that four percent edge, you’re probably being unwise. All models have error and betting lines have vig buffers so be smart and be careful. Sports betting can be dangerous to your financial future and should mostly be for fun.