Research Challenge: Eyeballing it

I am not a statistician.  I am someone who find scatterplots informative.

All I've done here is take the FightMetric data and compile it by reach advantage, which I'm defining as the difference in reach between the winner and the loser.  Naturally, this results in a negative reach advantage in cases where the longer fighter loses.

Within any given reach advantage, the number of winning results is compared to the number of total results to produce a Winning Ratio.  This is often called the winning percentage, but it's not a percentage, and I'm pedantic like that.

Then I dropped these two values into a scatterplot, plotting Reach Advantage against Winning Ratio.


You can obviously see a significant correlation there, but if you ignore the trendline you'll notice the distribution gets a little flat closer to a Reach Advantage of zero.   Obviously, there are many more fights contested with smaller Reach Advantages (advantages over 10" are very rare), so I think what we're seeing is the increased signal at smaller (closer to zero) Reach Advantage values drowning out the noise we see at less common Reach Advantage values.

Let's see if we can refine that by dropping all the Reach Advantage values represented in only one fight.


That's a flatter line.  We've only removed seven Win events from the data, but that's a visibly flatter line.  I'm sure someone can (or has) calcuated the r-squared values, but even without them we can see that this is a flatter line.

But there is still clearly some noise in the signal, as we have a few perfect records still scattered through the dataset.  Let's try again, raising the minimum from two fights to three.


Now we've removed 12 more fights from the dataset, and the line got steeper again (I foolishly changed the scale of this chart, but if you extrapolate the line out it is clearly steeper).  So some of that statistical noise was actually hiding a stronger correlation.

Can we show a causal relation?  No.  But these charts do offer some justification for the difference in reach being called an "advantage".



I noticed that a Reach Advantage of 0 has a perfect record in all of these charts, but that's obviously not true.  If you remove that datapoint (as I had intended before I made these charts), it has a small effect on charts 2 and 3, making the trandline very slightly steeper in both cases.

\The FanPosts are solely the subjective opinions of Bloody Elbow readers and do not necessarily reflect the views of Bloody Elbow editors or staff.

Log In Sign Up

Log In Sign Up

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join Bloody Elbow

You must be a member of Bloody Elbow to participate.

We have our own Community Guidelines at Bloody Elbow. You should read them.

Join Bloody Elbow

You must be a member of Bloody Elbow to participate.

We have our own Community Guidelines at Bloody Elbow. You should read them.




Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.