When it comes to hockey, I’ve been playing goalie since 2012. While throughout my career I may have had a subpar save percentage and an unreasonably high goals against average, the nuances of goaltending never escaped me. One of the nuances most important when it comes to goaltending is rebound control which is incredibly important (of course stopping the puck comes first). So now that I am old enough to tie my own skates and understand statistical models (no comment on which came first). I decided to see which goalies had the best rebound control through a regression approach; however, as I was working on the project a realization came to me: unlike my first passion, pokemon, a goalie can’t “catch ’em all” and sometimes they leave a juicy rebound that the defensemen fails to clear away and a forward then gets to hammer in a shot. Because of this I decided to include the players on the ice and also try to isolate their impact on rebound shot creation.
Using data from the moneypuck shot files and play by play from hockey_scraper for the 18-19 season I estimated the probability that a shot would generate a rebound after it. I model the probability using a logistic regression with the following features as inputs to the model:
- Shot Distance
- Shot Angle
- An indicator for if the shot came of a rush
- An indicator for if the shot itself was a rebound
- An indicator for if the shooter was on their off wing when shooting
- An indicator for if the shooting team was trailing
- An indicator for if the shooting team was leading
- An indicator for the goalie
- Indicator’s for every defensive player on the ice
- Indicator’s for every non-shooting offensive player on the ice
To evaluate a player’s rebound control I look at the coefficients to understand the player’s impact on the chance a rebound shot occurs after said shot.
To control for sample size I impose L2 regularization. This means that the model pushes a player’s coefficient towards 0 and only allow a player’s coefficient to grow if they have a larger sample size.
Because the coefficients from a logistic regression don’t convert directly to an impact on probability like they do with a normal linear regression, and the logistic regression curve is non-linear you can’t say that given player X’s coefficient they lower the chance of a rebound by Y percent; but, you can compare one player’s coefficient to another’s and the coefficients gain value relationally (how they compare to other goalie coefficients) where more negative numbers lead to a lower probability and vice-versa. Additionally, when it comes to logistic regression positive values always mean the player is making the vent more likely and negative values make the event less likely.
IMPORTANT TO REMEMBER: Because I only use one year for the data most of these numbers are subject to high variance and shouldn’t be taken as the absolute. The main purpose of this post is to demonstrate how a regression approach could approximate a players impact on team rebounding.
Below are all the values for goalies. Negative values are good and positive values are bad it is a little counter-intuitive.
Below are all the values for forwards. Because forwards can be on the ice defensively and offensively, when it comes to a shot each player has a defensive impact and an offensive impact. For offence, a positive impact is good because we want to generate rebounds, and for defence a negative impact is good because we want to suppress rebounds. For the purposes of easily understanding the graph I multiplied the defensive impact by -1 so that the impacts can be read as helping or hurting your team compared to the average player.
Below are all the values for defensemen. Just like forwards defensemen have two different impact values for defence and offence and the defensive value has been multiplied by -1.
Using a regularized logistic regression that has shot information and what players where on the ice I train a model to predict the probability of a shot generating a rebound shot after it. I then look at each player’s/goalie’s coefficient to determine their impact on rebound shot generation both offensively and defensively; however, goalie’s only have one impact value since they don’t play offence.
If you haven’t already I encourage you to look at Micah Blake McCurdy’s shooter and goalie talent model as I based much of my methodology off of his post.