I am not a statistician. I am someone who find scatterplots informative.
All I've done here is take the FightMetric data and compile it by reach advantage, which I'm defining as the difference in reach between the winner and the loser. Naturally, this results in a negative reach advantage in cases where the longer fighter loses.
Within any given reach advantage, the number of winning results is compared to the number of total results to produce a Winning Ratio. This is often called the winning percentage, but it's not a percentage, and I'm pedantic like that.
Then I dropped these two values into a scatterplot, plotting Reach Advantage against Winning Ratio.
You can obviously see a significant correlation there, but if you ignore the trendline you'll notice the distribution gets a little flat closer to a Reach Advantage of zero. Obviously, there are many more fights contested with smaller Reach Advantages (advantages over 10" are very rare), so I think what we're seeing is the increased signal at smaller (closer to zero) Reach Advantage values drowning out the noise we see at less common Reach Advantage values.
Let's see if we can refine that by dropping all the Reach Advantage values represented in only one fight.
That's a flatter line. We've only removed seven Win events from the data, but that's a visibly flatter line. I'm sure someone can (or has) calcuated the r-squared values, but even without them we can see that this is a flatter line.
But there is still clearly some noise in the signal, as we have a few perfect records still scattered through the dataset. Let's try again, raising the minimum from two fights to three.
Now we've removed 12 more fights from the dataset, and the line got steeper again (I foolishly changed the scale of this chart, but if you extrapolate the line out it is clearly steeper). So some of that statistical noise was actually hiding a stronger correlation.
Can we show a causal relation? No. But these charts do offer some justification for the difference in reach being called an "advantage".
I noticed that a Reach Advantage of 0 has a perfect record in all of these charts, but that's obviously not true. If you remove that datapoint (as I had intended before I made these charts), it has a small effect on charts 2 and 3, making the trandline very slightly steeper in both cases.