Tuesday, 26 March 2013

Lies, Damned Lies, and PDO

I'll preface this entry by saying that no, I'm not going to rail against the PDO machine. Unsustainable percentages due to randomness are a part of hockey, and regression toward the mean has been proven and demonstrated by people much smarter than myself (like the 2011-12 Minnesota Wild. Minus the "smarter" part). This purpose of this post is to address some inaccuracies I think surround PDO, as well as look at some of the concerns I have with leaning so heavily on one metric as an indicator of "luck." With that out of the way, let's get down to business.

This post was inspired in part by something Justin Bourne wondered yesterday:
Which provoked this response from one Rob Pizzola:
Well, to me at least, there are a couple of things wrong with this. The first being that while PDO may be a good indicator of team luck, it's not necessarily a good indicator of individual player luck. This is primarily because on-ice shooting% and on-ice save% are completely unrelated to one another. This should just be common sense. How a goalie performs at one end of the ice should have no bearing on how the goalie at the other end of the ice performs since they're 200 feet apart at basically all times. If you're unconvinced, I have a graph to prove it!

Relationship between team Shooting% and Save% for all teams between 2007-2008 to 2011-2012
As shown by the graph, your shooting and goaltending have nothing to do with one another. Your goalie is going to have his mean save%, and your shooters are going to shoot at some mean shooting%. On a team level, this really is irrelevant. PDO is meant to measure how much of a team's goal differential is due to random variance, rather than more demonstrably sustainable abilities like puck possession, and for the most part, others have shown it does that quite nicely.

The problem starts when PDO is used to explain individual output, like the Pizzola quote above. Comparing Seguin's behindthenet.ca stat lines over his young career, you'll quickly see why his PDO this season is so high:

Top row is 2013, bottom row is rookie season in 2010-2011
What Seguin's benefiting from this year is an unusually high on-ice save%. Now, Tuukka Rask sports an impressive .927 at evens for his career (that's Roberto Luongo territory), so a higher on-ice save% than normal should be expected. Compared to last year, Seguin's higher PDO is being driven by a factor he has basically no control over.

So this season, he's experiencing a boost in Corsi, a higher on-ice shooting%, an elevated PDO, but a decline in production from 2.69 EV pts/60 last year to 2.26 EV pts/60 this season. Just from looking at PDO, Seguin should be having good luck, and instead his actual even-strength output is down across the board. PDO does not capture why, nor does it show that he's actually getting lucky in the offensive end of the ice. Therefore, it cannot be concluded that his production will change either way based solely on a high PDO.

(Since the purpose of this post is to demonstrate that PDO as it is frequently used doesn't always capture good and bad luck on an individual player level, exactly why Seguin's production is down isn't really relevant to this discussion. I suspect though that it's because his individual shooting% is below what his true talent mean could be. Behind the Net shows that he's shooting below 6% this year, compared to the 8.3% he shot last year. Of course, there's also the possibility that he's David Booth 2.0. Also note his frequent linemate Brad Marchand shooting near 14% in each of the last two years. That's probably the source of the high on-ice shooting% number Seguin's PDO has enjoyed.)

The second real problem I have with PDO on an individual level is the idea that all players will have an individual mean PDO of 1.000 and any deviation from that is due to variance. We already know that on-ice shooting% and save% are unrelated, so let's ignore the part of PDO that a player can't control and assume that on-ice save% will regress towards your goalie's true-talent mean save%. This leaves a player's on-ice shooting% as the aspect of PDO a player can control to some degree.

If I remember what I read a while ago, Gabe Desjardins argued that there was no such thing as discernible shooting skill, and all offensive output was the result of either puck possession, luck, or a combination thereof. Someone else (I want to say it was David Johnson of HockeyAnalysis.com, but I'm really not sure and I don't want to put words in anyone's mouth) disagreed, arguing that elite players demonstrate consistently superior shooting abilities. I tend to agree with this dissenting opinion, largely because I live in a market that sees David Booth play more frequently than anyone else and holy hell that man can't shoot the puck in the net. As a result, I have a tough time believing that there is no difference in skill between him, a 4th line crash-and-banger, and say, career 17.3% shooter Steven Stamkos or career 15% shooter Sidney Crosby.

In fact, this discrepancy between "talented" and "stone-handed" forwards is illustrated quite nicely in some analysis Tyler Dellow posted this morning as I was writing the first part of this up. Also looking at stuff inspired by Justin Bourne (basing the good player/bad player distinction on Corsi), he posted this table:
Courtesy of Tyler Dellow at mc79hockey.com. Go read his things and laugh at his beloved Oilers plight.
The biggest thing I took from this table is that the shooting% of first liners do exceed that of fourth liners by quite a significant margin (note: it looks like Tom Awad found this distinction too, as Eric Tulsky pointed out in the comments of Dellow's analysis). For the purposes of this argument, let's make the assumption that the average individual mean on-ice shooting% for each first line player is 9.22%, and the average individual mean on-ice shooting% for each fourth liner is 6.24% as Dellow found in his analysis. This means that for a Boston Bruin first liner like Tyler Seguin, they can reasonably expect a mean PDO of 9.22% + Tuukka Rask's 92.7% save%, for an expected individual PDO of roughly 1.02, which is nearly exactly what his PDO was last season. By comparison, a Bruins fourth liner can expect a PDO of around 0.989 with the difference being attributed to finishing skill.

As this is the case, 1.000+ PDOs for first line forwards are not due simply to good luck. Similarly, sub-1.000 PDOs for third and fourth line players are not necessarily due to poor luck either. A player with a PDO of exactly 1.000 - which we consider to be operating at exactly at his expected output with no good or bad bounces - may actually be experiencing a fair amount of variance and can still regress to his individual mean either way, depending on their talent.

So, from all this stuff, there are two main takeaways:
1) PDO does not capture luck on an individual skater level.
2) The mean that PDO can be expected to regress to is unique to each player, and is influenced mainly by that players role and shooting ability.

With these points in mind, I don't think using PDO to make judgments on an individual player's point production is particularly wise. It's a relatively small and seemingly obvious distinction to make, but still an important one nonetheless. It's why I tweeted this to Justin Bourne yesterday:
PDO is good for the broad-brush analysis on a macro-scale ("the Ducks' record is unsustainable!"), but it's "predictive" power doesn't really carry over to a micro-scale, because you're not dealing with nice, round numbers anymore and PDO's two components are completely unrelated to each other. Regression will always happen, but it's not as cut-and-dry as the familiar "it will regress to 1.000!" mantra, at least not on an individual level.

---

As as aside from the above discussion, something Thomas Drance wrote today got me thinking. Mostly, it was this quote:
But that capital is evaporating as quickly as Vancouver's strangehold on the Northwest Division. Part of the reason that the court of public opinion has slowly begun to turn against Mike Gillis is, in part, that for the first season since 2008-09 Vancouver's PDO (PDO is the sum of a team's shooting percentage and save percentage, and functions as a shorthand measurement of puck luck) isn't two standard deviations above the mean
Emphasis is mine. Basically, and this is idle speculation on my part, why can't Vancouver maintain an elevated PDO? On the surface, this is a dumb question, since the answer is "regression." But, as Tyler Dellow and Tom Awad have found, more skilled players have the ability to maintain higher on-ice shooting%, and in Vancouver, Alain Vigneault has done nothing but feed the Sedin twins offensive zone starts over the past three seasons. In fact, both Dellow and Awad identify Daniel Sedin as one of the few players in the NHL that has elite puck possession and finishing skills.

With the Sedins seeing the lions share of Vancouver's offensive zone starts, it means that the Canucks will see a higher proportion of their shots on goal taken by 9%-10% shooters than other teams do. Of course, this also means less offensive zone time for the below average finishers on their roster. This would mean, just from a simple weighted average, the Canucks and Alain Vigneault can effectively inflate their team shooting% above what it would normally be under more traditional player deployments (I also think AV is the main driver behind the Sedin's strong Corsis, but that's a topic for another day). This, coupled with elite even strength goaltending provided by Roberto Luongo the past few years, would lead to a higher than 1.000 mean PDO.

Of course, this could mean that the "elavated" 1.011 PDO Vancouver is operating at so far may in fact be below the Canucks' actual mean, and there could be room to regress upwards. It's an exciting thought for a Canucks fan, and an interesting one too, especially considering the recent (and favourite of Thomas Drance) quote Lawrence Gilman gave Elliotte Friedman:
But believe me when I tell you there are percentage results that allow you to coach and manage your team to hedge bets in certain events.
Could this PDO manipulation through player deployment be the influencing of "percentage results" that Gilman's talking about here? It's possible, but the Canucks are understandably coy about their analytic black box. That's not going to change anytime soon, either. I mean, these aren't the Toronto Raptors we're talking about.

---

One more thing, I still think PDO should stand for "Percentage-Driven Output." Give it a real fancystat name already.

No comments:

Post a Comment