Sunday, November 29, 2009

Do Catchers Affect Balls and Strikes?


Catcher defense is a “known unknown” in quantitative defensive evaluation. Some good work has been done with traditional catcher statistics such as caught stealing percentage, wild pitches/passed balls, and errors (Tango, and many others). Catcher ERA has also been heavily examined, with the consensus being no detectable difference between catchers (Woolner, James, and many others). With the public availability of PitchFX data, another reputed catcher skill can be examined, the ability to "work" an umpire, through deft mitt movement or otherwise, to gain more favorable ball/strike calls.

Some initial work in this arena has been done by Dan Turkenkopf of Beyond the Box Score on the pitchFX data from 2007. While Dan doesn't delve into statistical rigor, his work is one of the main inspirations for this study and the core concepts of looking at pitcher, batter, catcher, and umpire effects, as well as the method of run value evaluation are his work.

The question of whether or not catchers can influence balls and strikes can be broken into three sub questions. Are catchers' results consistent from year to year? Is the difference between catchers statistically significant? Is the difference between catchers practically significant? Before exploring the answers to those questions, first, some definitions.

Called Pitch and Called Pitch Percentage (CP%)
All pitches not swung at. This does not include balls in the dirt (as there is no way to fool the umpire on this) and intentional balls. Percentage is called pitches divided by total pitches.

Pitches In and Out of Strike Zone, and Percentages (nSZ% and oSZ%, respectively)
A called pitch is determined to be in the strike zone if the coordinates provided by pitchFX are within the rulebook strike zone. Top and bottom of the strike zone are provided in the pitchFX data, and sides are the width of home plate. An allowance of one ball radius is added in each direction, as only part of the ball needs to be within the strike zone. Percentages are out of called pitches as defined above. nSZ% and oSZ% necessarily sum to 100%.

Extra Strike and Extra Strike Percentage (ES%)
A called strike which crossed the plate outside of the rulebook strike zone - it should have been called a ball. Extra Strike Percentage is extra strikes per called pitch outside the strike zone.

Extra Ball (EB) and Extra Ball Percentage (EB%)
A ball which crossed the plate inside of the rulebook strike zone - it should have been called a strike. Extra Ball Percentage is extra balls called pitch inside the strike zone.

Are catchers' results consistent from year to year?
With both the 2008 and 2009 seasons worth of PitchFX data available, there are two years to compare; 2007 is omitted because it is incomplete. Each of the four relevant actors on every pitch were examined (pitcher, batter, catcher, umpire). Those included in the study were every player or umpire involved in at least 1000 total pitches in each of the two seasons. (For the statistically inclined, 1000 pitches is used to insure sufficient sample size to use the normal approximation to the binomial distribution for all metrics).

For each group of players/umpires and each of the four statistics introduced earlier, the correlation between 2008 and 2009 results is calculated. This allows examination of the relative levels of year to year reliability for each.

CP%
Group
Correlation Coefficient
Batters
0.817
Pitchers
0.767
Catchers
0.401
Umpires
-0.149


Most of the called pitch table makes intuitive sense. Pitchers may control called pitch percentage by "nibbling" or going after hitters, and batters can control called pitch percentage by being more or less selective. Umpires have a year to year correlation close to zero; they have no means of directly controlling whether or not a pitch is taken or swung at. The catchers' correlation, however, illustrates a problem with this methodology. Catchers only catch a small subset of total pitchers, and pitchers pitch to only a small subset of total catchers. A means of further separating catcher and pitcher is necessary - this analysis cannot tell us if the correlation of each group is only due to the unique pitcher/catcher relationship or not.

nSZ%/oSZ%
Group
Correlation Coefficient
Batters
0.791
Pitchers
0.614
Catchers
0.378
Umpires
0.058

In and out of strike zone percentages behave similarly to CP% - decent year to year consistency for pitchers and batters, none for umpires, and less correlation for catchers than for pitchers.

EB%
Group
Correlation Coefficient
Catchers
0.715
Pitchers
0.461
Batters
0.452
Umpires
0.374

Extra ball percentage is where things begin to get interesting. Batters have a moderate amount of consistency year to year. This fits with accounts of batters reputation influencing balls and strikes. Umpires begin to make their mark as well. Catchers have the most consistent results from year to year, showing evidence of influencing balls and strikes.

ES%
Group
Correlation Coefficient
Batters
0.688
Pitchers
0.657
Catchers
0.477
Umpires
0.550


Catcher year to year correlation for ES% is less than for EB%, and less than pitcher correlation for ES%. This is not an expected result. While it is not a surprise that CP% and nSZ%/oSZ% are less correlated for catchers than pitchers, it is a bit unexpected that only EB% is more correlated for catchers while ES% is not. Two possible explanations spring to mind. One is the fact that umpires tend to call strike zones smaller than the rulebook zone. Both 2008 and 2009 have higher average EB% than ES%. What could be happening is catchers are able to get pitches which are within the rulebook strike zone but outside the umpires strike zone called strikes at varying rates. These occurrences should contribute to ES% but instead contribute to lessening EB%. Constructing umpire specific strike zones, and calculating EB% and ES% in reference to those zones would help confirm or deny that theory.

Another explanation is that umpires are too savvy to be fooled on a regular basis by moving the glove into the strike zone after the catch. A catcher can only mess things up by not presenting a steady target, or by antogonizing the umpire into intentional missed calls by attempting to be too tricky. This theory is corraborated by Brent Mayne's Art of Catching newsletter. More data is needed to determine the whys.

Is the difference between catchers statistically significant?
In order to separate pitchers and catchers further, pitcher-catcher pairs where found, where the same pitcher threw at least 10 extra balls or extra strikes to the same catcher in a single season (again, for the statistically inclined, this is the minimum needed to do statistical tests on proportions). These data points were paired up in two different ways - constant pitcher, and constant catcher. The constant pitcher data set paired up each pitcher-catcher pair with every other pitcher-catcher pair with the same pitcher. Likewise for the constant catcher data set, except the same catcher was in each pitcher-catcher pair. The two years of data were paired up separately, then combined. This gave four data sets
1. Same catcher, different pitcher, qualified for EB%
2. Same catcher, different pitcher, qualified for ES%
3. Same pitcher, different catcher, qualified for EB%
4. Same pitcher, different catcher, qualified for ES%


For each pair in each data set, a two proportion, Z test was performed for the relevant metric and a Z score was calculated. Then, the distribution of Z scores was compared to the normal distribution. If there significantly more high z scores than would be expected due to randomness, then there is likely a real difference in abilities to influence ball/strike calls. The results:

Constant Catcher, Different Pitcher, EB%
There were 10,048 qualifying pairs between 2008 and 2009.

Z score greater than:
Count
%
Expected %, normal distribution

Z>1
3685
18.3370%
31.7311%
Z>2
819
4.0754%
4.5500%
Z>3
115
0.5723%
0.2700%
Z>4
6
0.0299%
0.0063%
Z>5
0
0.0000%
0.0001%

The odds of having 6 or more z scores greater than 4 by chance is only 0.0054%. Pitchers have a statistically significant difference in EB%, holding catcher constant.

Constant Catcher, Different Pitcher, ES%
There were 8,544 qualifying pairs between 2008 and 2009.

Z score greater than:
Count
%
Expected %, normal distribution
Z>1
3826
214.702581%
31.731051%
Z>2
1356
76.094276%
4.550026%
Z>3
352
19.753086%
0.269980%
Z>4
72
4.040404%
0.006334%
Z>5
12
0.673401%
0.000057%
Z>6
0
0.000000%
0.000000%

The odds of having 12 z scores greater than 5 are so small as to not be worth calculating. Pitchers have a statistically significant difference in ES%, holding catcher constant.

Constant Pitcher, Differing Catcher, EB%
There were 891 qualifying pairs between 2008 and 2009.

Z score greater than:
Count
%
Expected %, normal distribution
Z>1
333
18.6869%
31.7311%
Z>2
64
3.5915%
4.5500%
Z>3
14
0.7856%
0.2700%
Z>4
2
0.1122%
0.0063%
Z>5
0
0.0000%
0.0001%

The odds of having 6 or more Z scores greater than 4 by chance is only 0.1532%. Catchers have a statistically significant difference in EB%, holding pitcher constant. Comparing to the results for pitchers for EB%, catchers have a slightly higher percentage of Z scores above 4. Catchers likely have a little more to do with EB% than pitchers.

Constant Pitcher, Differing Catcher, ES%
There were 776 qualifying pairs between 2008 and 2009.

Z score greater than:
Count
%
Expected %, normal distribution
Z>1
299
19.2655%
31.7311%
Z>2
55
3.5438%
4.5500%
Z>3
7
0.4510%
0.2700%
Z>4
0
0.0000%
0.0063%
Z>5
0
0.0000%
0.0001%

The odds of having 7 or more Z scores above 3 is 0.5714%. While this is statistically significant, it is much, much less so than it is for pitchers, even acccounting for the differences in sample size. ES% is likely mostly pitcher controlled.

Is the difference between catchers practically significant?
The distinction between statistical and practical significance is an important one. Statistical significance only determines the level of certainty of a result, practical significance determines whether the result is relevant. In order to determine whether the result of differing EB% is practically significant, first the value of an extra ball needs to be determined.

The same methodology used by Dan Turkenkopf in his original study is used here. League batting data after each count was obtained from baseballreference.com. For each, average run value per PA was calculated using the same linear weights used to calculate wOBA. Then, for each possible starting count, the difference in runs per PA between adding one strike and adding one ball was calculated. Finally, a weighted average by PAs in which a count occurred was calculated. This methodology yields values of 0.165 runs per extra ball in 2009, and 0.161 runs per extra ball in 2008.

One issue with this calculation is the fact that called pitches are more likely to occur on some counts than others. Turkenkopf was able to correct for this by further weighting by called pitches per count, but, with that data unavailable, a more simplistic solution is to eliminate counts with 2 strikes from the weighted average. Hitters typically swing at more pitches with 2 strikes, and these counts are also where an extra ball has the 1st, 2nd, 4th and 5th highest values. This yields values of 0.120 runs per extra ball in 2009, and 0.115 runs per extra ball in 2008.

To be certain that the high and low estimates of the run value of an extra ball bracket the true value contributed by the catcher, the low value was multiplied by 0.5, assigning "half credit" to the catcher, and the remainder staying with the pitcher. This likely underestimates the catcher's influence.
Finally, each catcher's runs prevented above or below average was calculated using both the high and low estimates. This was accomplished by taking the league average called pitches in the strike zone per game (as CP% and nSZ% are not controlled by the catcher) and multiplying by the difference between the catcher's EB% and the league average EB%. Then, this value is multiplied by 120 games, a reasonable total for a full time catcher.

Runs prevented had a range of 104 runs for the high estimate, and 37 runs for the low estimate in 2008. In 2009, the range was 79 runs for the high estimate and 29 runs for the low estimate. In 2008, there was a low sample size catcher at both the top and bottom, causing the larger range Catcher to catcher differences are practically significant, even with the low estimate.

Interesting Data

League Average Rates
Year
CP%
nSZ%
oSZ%
EB %
ES%
2008
52.05%
33.52%
66.48%
25.44%
11.79%
2009
52.53%
33.85%
66.15%
23.65%
11.21%
TOTAL
52.29%
33.69%
66.31%
24.52%
11.50%

Top 10 Catchers, 2008

catcher first
catcher last
Runs/120, High Estimate
Runs/120 Low Estimate
Johnny
Estrada
43.90
15.71
Miguel
Montero
33.15
11.86
Jose
Molina
31.71
11.35
Yadier
Molina
29.69
10.62
Wil
Nieves
24.13
8.63
Chad
Moeller
23.49
8.40
Russell
Martin
22.87
8.18
Ryan
Hanigan
22.07
7.90
Robby
Hammock
20.94
7.49
Jeff
Mathis
19.54
6.99


Top 10 Catchers, 2009
catcher first
catcher last
Runs/120, High Estimate
Runs/120 Low Estimate
David
Ross
35.88
13.02
Brian
McCann
32.16
11.67
Mike
Rivera
29.22
10.61
Miguel
Montero
27.71
10.06
Yadier
Molina
24.69
8.96
Jose
Molina
21.53
7.81
Paul
Bako
20.55
7.46
Eli
Whiteside
18.83
6.84
Gregg
Zaun
18.43
6.69
Geovany
Soto
18.40
6.68

Bottom 10 Catchers, 2008
catcher first
catcher last
Runs/120, High Estimate
Runs/120 Low Estimate
Max
Ramirez
-59.78
-21.39
Ryan
Doumit
-46.28
-16.56
Rob
Johnson
-38.19
-13.67
Luke
Montz
-34.20
-12.24
Nick
Hundley
-33.39
-11.95
Rob
Bowen
-30.59
-10.95
Gerald
Laird
-28.46
-10.18
Paul
Hoover
-27.28
-9.76
Jarrod
Saltalamacchia
-24.10
-8.63
Dioner
Navarro
-23.59
-8.44

Bottom 10 Catchers, 2009
catcher first
catcher last
Runs/120, High Estimate
Runs/120 Low Estimate
Ryan
Doumit
-42.98
-15.60
George
Kottaras
-26.54
-9.63
Rob
Johnson
-23.02
-8.35
Landon
Powell
-22.73
-8.25
Kenji
Johjima
-21.62
-7.85
Lou
Marson
-21.36
-7.75
Omir
Santos
-17.19
-6.24
Raul
Chavez
-16.12
-5.85
Nick
Hundley
-13.61
-4.94
Jorge
Posada
-13.17
-4.78

Further Work
1. Repeat this work with strike zones which better reflect how the zone is actually called rather than the rulebook zone.
2. Tighten range of run values through better estimate of the value of an extra ball and more accurate assignment of credit between pitcher and catcher
3. Integrate run values with other measures of catcher defense
4. Compare results to differences in catcher ERA