"Support-Neutral" Statistics -- A Method of
Evaluating the True Quality of a Pitcher's Start
Michael Wolverton
870 E. El Camino Real, #168
Mountain View, CA 94040
appeared in _By_The_Numbers_, SABR's Statistical Analysis
subcommittee's newsletter, Volume 5, Number 4, Dec. 1993.
Motivation
----------
In recent years, we've seen the development and growing use of two
measurements designed to evaluate starting pitchers on a game-by-game
basis: Quality Starts and Game Score. Both measures are attempting in
some way to look at the quality of each outing the starter has, rather
than looking at the average or cumulative performance over the course
of year like ERA does. But both measures have their weaknesses as
total measures of pitching performance.
The arguments against Quality Starts are well known. Detractors point
out that the worst qualifying outing -- 6 innings and 3 earned
runs -- is not "quality" at all. A related objection is that Quality
Starts makes no attempt to quantify the degree of quality a start
has -- 6 innings, 3 runs is the same as 8 innings, 2 runs which is the
same as a 9-inning shutout.
Partly in answer to these objections, Bill James developed the Game
Score, which combines a starter's box score numbers (IP, H, ER, R, BB,
K) using weights, where the weights are assigned such that the league
average score is around 50, the best imaginable score is around 100,
and the worst imaginable score (by someone outside the state of
Colorado) is around 0. Game Score is acknowledged as an interesting
measure of "game domination" by a starter, but it has weaknesses as a
total measure of starter quality (i.e., his contribution to team
victories): it's too dependent on strikeouts, possibly too dependent
on hits and walks (after all, the number of runs given up is really
the only thing that matters), and it isn't park-adjusted.
Despite the weaknesses of these two measures, looking at a pitcher's
starts game-by-game is still a good idea. Looking at each start's
contribution to winning, rather than cumulative run-prevention over
the course of a year (ERA or Pitching Runs), can help us answer
questions like: Given equal ERAs, do some pitchers pitch in a way that
will tend to win more games than other pitchers? In particular, is it
better for a starter to be flaky -- either very good or very bad on a
given day -- or consistently average? Does the park have a smaller
influence on the value of the start when the start is very good or
very bad?
So here's what we'd like out of a stat measuring the quality of a
start:
- it should depend only on numbers appearing in a box score.
- it should be independent of a pitcher's support, both from his
team's offense and from his team's relievers.
- it should be park-adjusted.
- the resulting measurement should be in terms of some kind of
meaningful unit, such as games or runs, rather than being a unitless
index (and, ideally, it should be obvious to any baseball fan what a
good or bad score in those units is).
- most importantly, it should reflect the contribution that a start
had toward winning the game.
I've developed a couple of measurements that meet these five
requirements. (Actually, the ideal stat would also be very easy to
compute, but hey, 5 out of 6 isn't bad, right?). Support-Neutral Wins
and Support-Neutral Losses (SNW and SNL) measure the expected number
of wins and losses a pitcher would have with his outings, if he got
average support from his offense and his bullpen. Support-Neutral
Value Added (SNVA) measures the total number of games that an average
team would win given the pitcher's starts, over the number of games
they'd win with a league average starter. All of these stats are
computed using only the number of innings pitched, number of runs
given up, and the park the game was pitched in. SNVA may be a
slightly more accurate measure of a starter's actual value compared to
league average, but the SNW/L record has the advantage of being
flexible and more understandable. Both of them, in my opinion,
constitute an improvement over Thorn and Palmer's Pitching Runs as a
total measure of starter worth.
Support-Neutral Wins and Losses
-------------------------------
Support-Neutral Wins is calculated by determining the probability that
a pitcher would get the win for each start he has, and then summing up
the individual probabilities over all of his starts. The sum gives you
the number of wins a pitcher could expect to get for an average team,
given his performances. A "performance" here consists only of the
number of innings pitched, the number of runs (not earned runs) given
up, the park in which the game was played, and whether the pitcher was
at home or on the road -- SNW assumes that these are the only things
which influence whether the pitcher wins or loses.
The rest of this section describes the formulas that are used to
calculate SNW; readers who aren't interested in the specific methods
of calculation are welcome to skim or skip to the next section.
To calculate the probability that a pitcher wins the game, we just
need to look at the definition of a win: A starting pitcher wins the
game if his team has the lead when he's taken out of the game, and
they never relinquish that lead. So, for a given outing by the
starter, the probability that he gets the win is just the probability
that his team will take the lead (score more runs than the starter
gives up) by the time he's removed times the probability that they'll
hold that lead until the game is over.
To put this into a formula, we just need to determine and add up the
probabilities of all the different ways his team can take and hold a
lead:
SNW(i, r) = sum [j = (r+1) to INFINITY] of
PScore(i, j) * PHold(j-r, 9-i),
where
SNW(i, r) is the probability a starter who goes i innings and gives up
r runs will get the win, given an average team playing behind him.
PScore(i, r) is the probability that an average team will score r runs
in i innings.
PHold(k, i) is the probability that an average team will hold a k-run
lead (without ever relinquishing it) for the i remaining innings until
the end of the game.
The above formula is actually a simplification of the formula I use in
my software to calculate SNW (I'll refer to the formula in my software
as the "real" SNW formula). In order to make it easier to explain, I
made a few assumptions to get the formula above. First, that formula
assumes that the starter comes out of the game after pitching a full
inning (i.e., he pitches no extra thirds of an inning). The formula
is complicated somewhat when thirds of an inning are taken into
account, but the same general idea applies: his team must be leading
when he comes out, and his team must hold the lead for the extra
thirds in the inning he leaves, plus all the rest of the remaining
innings. The real SNW formula does take thirds of an inning into
account.
Second, the above formula doesn't explicitly take the park into
account. To take park effects into account, we need to make SNW,
PScore, and PHold be functions of the park in which the game is
played. A hitter's park should inflate the probabilities that an
average team will score a high number of runs, and a pitcher's park
should do the opposite. The real SNW formula does take park into
account. I talk a little more about my handling of park effects in
the Appendix.
Third, the above formula doesn't take into account whether the starter
is pitching at home or on the road. Maybe contrary to intuition, this
does make a difference. Consider a starter who leaves after pitching
the 7th inning: if he's at home, he's pitched the top of the 7th, so
he gets credit for the runs his team scored in the first 6 innings,
plus the runs they score in the bottom of the 7th; if he's on the
road, however, he pitched the bottom of the 7th, so he gets credit for
the runs his team scored in the first 7 innings, plus the runs they
score in the top of the 8th. So, all other things being equal, it's
easier for pitchers to get wins (and harder for them to get losses)
when they pitch on the road. The formula above is for a pitcher
pitching at home, and the road formula is slightly different. The
real SNW formula does take home/road status into account.
Finally, the above formula doesn't quite reflect the full definition
of a pitcher's win -- a starter can't get the win unless he goes 5
innings or more. Presumably, this extra condition was put into the
win rule to reduce the number of undeserving starters getting lucky
wins. But when you're assigning fractions of a win, rather than 1 win
or 0 wins, there's no possibility of getting lucky. So, the real SNWL
formula does not take the five- inning condition into account,
although, for the purposes of comparison, I do calculate an expected
win (E(W)) number which is equal to 0 if the pitcher goes less than 5
innings and equal to SNW otherwise.
Let's finish off the formula above. PScore is easy to find
recursively, provided you know an average team's single-inning scoring
distribution, PInningScore:
PScore(i, r) = sum [j = 0 to r] of
PInningScore(j) * PScore(i-1, r-j), i > 1
PScore(1, r) = PInningScore(r)
where
PInningScore(r) is the probability that an average team will score r
runs in an inning.
PHold is a little more complicated, since you have to see to it that
the pitcher's team never relinquishes the lead. Still, it's not too
hard to reduce it to the following (below, "tr" stands for the number
of runs the pitcher's team scores in an inning, and "or" stands for
the number the opposing team scores in an inning):
PHold(k, i) = sum [tr = 0 to INFINITY] of
sum [or = 0 to k + tr - 1] of
PInningScore(tr) * PInningScore(or) *
PHold(k+tr-or, i-1), i > 0
PHold(k, 0) = 1
The only remaining unknown is the single-inning scoring distribution,
PInningScore. But that's readily available from linescores of past
games. The scoring distribution (separate distributions for each
league) I'm using right now was taken from a few weeks of linescores
in USA TODAY from late-April and early-May of 1992. I'll probably be
able to get a more accurate distribution someday, but I'm sure that
this one is close enough.
The SNL value for a single start is calculated analogously to SNW.
Support-Neutral Value Added
---------------------------
SNW and SNL gives us a nice way of getting a "fair" W/L record for a
starter, which can then be used to compare to his actual W/L record,
or a replacement-level winning percentage, etc. (see the Results
section). But these numbers calculate how likely it is that the
pitcher will win or lose the game -- i.e., get the "W" or "L" next to
his name in the box score. A related but slightly different notion is
the likelihood that the team will win when a pitcher takes the mound.
In measuring the starter's contribution to team victories, we'd like
to evaluate how much the outing by the starter changes the team's
chance of winning from what it was at the beginning of the game (which
I'll assume to be 50%). This is what SNVA is designed to measure.
Not surprisingly, the formula for SNVA looks pretty similar to the
formula for SNW:
SNVA(i, r) = -0.5 +
sum [j = 0 to INFINITY] of
PScore(i, j) * PATWin(j-r, 9-i)
where
SNVA(i, r) is the difference between an average team's chance of
winning after the starter has left after pitching i innings and giving
up r runs, and their chance of winning at the beginning of the game
(50%).
PScore(i, r) was defined above
PATWin(r, i) is the chance that an average team will eventually win
the game given that there are i innings left and the difference
between their score and their opponents' score is r.
Also not surprisingly, PATWin looks a lot like PHold:
PATWin(r, i) = sum [tr = 0 to INFINITY] of
sum [or = 0 to INFINITY] of
PInningScore(tr) * PInningScore(or) *
PATWin(r+tr-or, i-1) , i > 1
PATWin(r, 0) = 1, r > 0
PATWin(0, 0) = 0.5
PATWin(r, 0) = 0, r < 0
What SNVA gives us (when summed over all a pitcher's starts) is the
number of games in the standings he's worth to his team above the
average starter. Of course, this is exactly the same unit (games above
the average player) that all of Total Baseball's[1] measurements are
in. So it'll be interesting to compare SNVA to Thorn and Palmer's
Adjusted Pitching Runs to see how well they correlate and also where
the differences lie.
Results
-------
Best, worst, luckiest, and unluckiest starters of 1992
------------------------------------------------------
That's enough of the gory details of the calculation of the stats.
Let's look at the fun stuff -- what the stats tell us about real
pitchers. I tracked all starting pitchers in the majors over the 1992
season, and Tables 1 and 2 show the top pitchers in both leagues for
1992. Each table shows the pitcher's Support-Neutral Wins (SNW),
Losses (SNL), and Winning Percentage (SNPct), followed by his actual
win-loss record (W, L), his runs allowed per 9 innings (RA), his
Adjusted Pitching Runs(*1) (APR), and his Support-Neutral Value Added
(SNVA). Interestingly, Greg Maddux, with the fabulous year he had
pitching in Wrigley, was the only pitcher in either league who came
close to "deserving" to win 20 games.
Pitcher Team SNW SNL SNPct W L RA APR SNVA
--------------------------------------------------------------------
Mussina BAL 17.2 7.8 .688 18 5 2.61 47.0 4.60
Clemens BOS 17.5 8.5 .674 18 11 2.92 43.8 4.39
Appier KCR 15.2 6.6 .698 15 8 2.55 42.6 4.08
Guzman,Ju TOR 13.4 6.4 .679 16 5 2.79 32.3 3.34
Nagy CLE 16.3 9.9 .623 17 10 3.25 33.3 3.11
Eldred MIL 8.2 2.4 .776 11 2 1.88 28.1 2.81
McDowell CHI 16.3 10.7 .602 20 10 3.28 30.5 2.53
Smiley MIN 16.0 10.5 .603 16 9 3.47 28.3 2.75
Navarro MIL 15.8 10.8 .595 17 11 3.59 22.8 2.45
Abbott,J CAL 13.6 8.6 .612 7 15 3.11 27.7 2.36
Viola BOS 15.8 11.2 .586 13 12 3.74 21.4 2.35
Fleming SEA 15.1 10.7 .586 17 10 3.73 19.7 2.10
Perez,M NYY 14.9 10.5 .586 13 16 3.42 26.3 1.90
Wegman MIL 15.6 11.5 .576 13 14 3.58 24.4 2.06
Erickson MIN 13.3 9.8 .574 13 12 3.65 20.8 1.75
Bosio MIL 14.3 11.1 .563 16 6 3.89 13.6 1.52
Key TOR 13.3 10.4 .561 13 13 3.66 18.0 1.42
Brown,K TEX 15.5 12.9 .545 21 11 3.96 14.5 1.18
Welch OAK 8.0 5.8 .580 11 7 3.42 9.7 0.91
Rasmussen KCR 3.0 0.8 .785 4 1 1.67 11.4 1.09
Table 1: Top 20 AL Starters in 1992, ranked by SNW-SNL
Pitcher Team SNW SNL SNPct W L RA APR SNVA
--------------------------------------------------------------------
Maddux,G CHI 19.5 7.4 .724 20 11 2.28 53.9 5.75
Tewksbury STL 16.1 7.3 .687 15 5 2.45 38.5 4.12
Schilling PHI 13.9 6.8 .670 12 9 2.59 31.1 3.37
Morgan CHI 16.3 9.5 .632 16 8 3.00 30.4 3.22
Rijo CIN 13.9 8.1 .632 15 10 2.86 28.5 2.57
Smoltz ATL 16.6 11.0 .601 15 12 3.28 25.1 2.67
Glavine ATL 15.1 9.8 .608 20 8 3.24 23.9 2.71
Martinez,D MON 14.5 9.1 .613 16 11 2.98 24.0 2.50
Swindell CIN 13.8 8.5 .619 12 7 3.05 24.5 2.56
Swift SFG 10.4 5.1 .670 9 3 2.36 23.6 2.51
Drabek PIT 15.9 10.8 .595 15 11 2.95 26.7 2.32
Fernandez,S NYM 13.6 8.8 .608 14 11 2.81 24.7 2.25
Hill MON 14.0 10.0 .583 16 9 3.14 19.3 1.93
Leibrandt ATL 13.5 9.5 .586 15 7 3.68 11.8 2.02
Smith,P ATL 5.6 2.1 .724 7 0 2.22 16.2 1.69
Wakefield PIT 6.4 3.3 .656 8 1 2.54 13.8 1.42
Rivera PHI 6.5 3.7 .639 7 3 2.95 10.8 1.34
Benes SDP 14.0 11.3 .553 13 14 3.50 10.7 1.22
Portugal HOU 6.6 4.0 .621 5 3 2.69 12.5 1.18
Avery ATL 14.0 11.5 .549 11 11 3.66 14.8 1.15
Table 2: Top 20 NL Starters in 1992, ranked by SNW-SNL
On the flip-side, Tables 3 and 4 show the worst(*2) 10 starting pitchers
in 1992 for each league. Not surprisingly, many of these guys showed
up in different uniforms in 1993, several on expansion teams.
Pitcher Team SNW SNL SNPct W L RA APR SNVA
--------------------------------------------------------------------
Armstrong CLE 5.2 11.5 .313 3 15 6.37 -28.4 -3.08
Milacki BAL 4.5 9.5 .320 6 8 6.18 -21.8 -2.32
Terrell DET 2.9 7.5 .280 3 6 6.98 -22.9 -2.26
Slusarski OAK 2.5 6.9 .265 5 5 6.25 -18.7 -2.05
Sanderson NYY 9.9 14.0 .414 12 11 5.40 -22.4 -2.02
Aldred DET 2.4 6.5 .273 2 7 7.63 -21.7 -1.89
McCaskill CHI 10.2 14.2 .417 12 13 5.00 -16.1 -1.92
Wells TOR 3.4 6.9 .332 6 7 7.70 -27.7 -1.81
Stieb TOR 3.4 6.7 .337 3 6 5.92 -13.3 -1.50
Otto CLE 3.9 7.2 .354 5 9 6.75 -19.8 -1.57
Table 3: Bottom 10 AL Starters in 1992, ranked by SNW-SNL
Pitcher Team SNW SNL SNPct W L RA APR SNVA
--------------------------------------------------------------------
Bowen HOU 0.6 6.1 .094 0 7 12.22 -31.3 -2.61
Wilson,T SFG 7.0 11.3 .384 8 14 4.79 -18.5 -2.03
Abbott,K PHI 4.5 8.3 .352 1 14 4.92 -11.4 -1.84
Martinez,R LAD 7.3 11.1 .397 8 11 4.90 -19.1 -1.84
Henry,B HOU 8.3 11.7 .414 6 9 4.40 -12.4 -1.57
Young,A NYM 2.8 6.2 .313 1 7 5.79 -16.8 -1.63
Black SFG 8.6 11.9 .420 10 12 4.47 -14.7 -1.54
Hershiser LAD 10.2 13.3 .434 10 15 4.31 -12.5 -1.60
Hammond CIN 7.0 10.0 .409 7 10 4.61 -7.4 -1.36
Blair HOU 1.4 4.5 .241 1 5 7.51 -16.8 -1.52
Table 4: Bottom 10 NL Starters in 1992, ranked by SNW-SNL
This method also allows you to evaluate the level of luck a pitcher
experienced in his W/L record -- i.e. it allows you to look at how
much a pitcher's actual W/L record differs from his expected W/L
record given the way he pitched. Tables 5 through 8 show the luckiest
and unluckiest starters in each league in 1992. No one should be
surprised that Jack Morris, who compiled a 21-6 record despite a 4+
ERA, was far and away the luckiest starter in either league last year.
SNW/L evaluation shows that you'd expect his 1992 performance to
produce a 13-13 mark if he had gotten average support. Equally
unsurprising is the result that Jim Abbott was the unluckiest pitcher
in either league. The Angels gave him enough support only for a
miserable 7-15 record, while his pitching actually merited something
closer to 13-9.
Pitcher Team E(W) E(L) W L Diff.
------------------------------------------------
Morris TOR 13.3 13.1 21 6 14.7
Brown,K TEX 15.5 12.9 21 11 7.4
Moore OAK 12.1 14.3 17 12 7.3
Bosio MIL 14.2 11.1 16 6 6.9
Hibbard CHI 8.1 11.3 10 7 6.2
Darling OAK 11.5 12.5 15 10 6.0
Sanderson NYY 9.7 14.0 12 11 5.3
Wickman NYY 2.7 2.9 6 1 5.2
Slusarski OAK 2.3 6.9 5 5 4.6
McDowell CHI 16.2 10.7 20 10 4.5
Table 5: Luckiest 10 AL Starters in 1992, ranked
by W-E(W)+E(L)-L
Pitcher Team E(W) E(L) W L Diff.
------------------------------------------------
Abbott,J CAL 13.3 8.6 7 15 -12.7
Perez,M NYY 14.6 10.5 13 16 -7.0
Hanson SEA 9.6 12.8 7 17 -6.8
Armstrong CLE 5.2 11.5 3 15 -5.8
Wegman MIL 15.6 11.5 13 14 -5.1
Valera CAL 10.4 9.6 7 11 -4.8
Kamieniecki NYY 8.8 12.0 6 14 -4.7
Ryan TEX 9.3 8.7 5 9 -4.6
Chiamparino TEX 1.5 1.3 0 4 -4.2
Reed KCR 5.1 6.0 2 7 -4.1
Table 6: Unluckiest 10 AL Starters in 1992,
ranked by W-E(W)+E(L)-L
Pitcher Team E(W) E(L) W L Diff.
------------------------------------------------
Burkett SFG 9.6 13.0 13 9 7.5
Glavine ATL 15.0 9.8 20 8 6.8
Seminara SDP 5.1 6.3 9 4 6.2
Lefferts SDP 8.4 10.4 13 9 6.0
Tomlin PIT 11.5 11.3 14 9 4.8
Hurst,B SDP 12.4 12.1 14 9 4.7
Cone NYM 11.3 9.9 13 7 4.6
Leibrandt ATL 13.2 9.5 15 7 4.3
Osborne STL 9.2 11.4 10 8 4.2
Wakefield PIT 6.3 3.3 8 1 4.0
Table 7: Luckiest 10 NL Starters in 1992, ranked
by W-E(W)+E(L)-L
Pitcher Team E(W) E(L) W L Diff.
------------------------------------------------
Abbott,K PHI 4.5 8.3 1 14 -9.2
Candiotti LAD 11.8 10.5 10 15 -6.3
Gross,Ke LAD 10.9 10.9 8 13 -5.0
Clark,M STL 5.4 8.0 3 10 -4.4
Schilling PHI 13.9 6.8 12 9 -4.0
Benes SDP 13.8 11.3 13 14 -3.5
Carter SFG 1.5 2.4 1 5 -3.1
Boskie CHI 3.2 7.1 3 10 -3.1
Maddux,G CHI 19.5 7.4 20 11 -3.1
Whitehurst NYM 2.3 3.3 1 5 -3.0
Table 8: Unluckiest 10 NL Starters in 1992,
ranked by W-E(W)+E(L)-L
League total numbers
--------------------
In theory, the support-neutral record of the entire league should come
close to the actual win-loss record of the league, and in fact, in
1992, SNW/L did appear to predict league W/L pretty well. Table 9
shows both the expected and actual W/L totals for each league in 1992.
The National League's record corresponded very well to the record
expected by the model, with no-decisions being underpredicted only
slightly by SNW/L. The American League is predicted a little less
successfully -- there were nearly 30 more wins in the league than
expected, and nearly 10 more losses than expected. I believe that
part of the discrepancy between expected record and actual record can
be explained by the fact that relief pitchers prevented runs better
than starters in 1992. Since starters are competing for the (actual)
decision primarily with the other starter, it makes sense that
starters would get a few more (actual) wins than predicted by a model
which has them competing with league average pitching for the
decision.
E(W) E(L) E(Pct.) W L Pct.
NL 660.9 690.3 .489 655 678 .491
AL 776.1 846.7 .478 805 837 .490
Table 9: Expected and Actual records of all starters in the leagues
Value of "flaky" and "steady" pitchers
--------------------------------------
Do the Support-Neutral stats tell us anything that Thorn and Palmer's
Adjusted Pitching Runs weren't already telling us? Since both APR and
SNVA are trying to measure exactly the same thing (albeit by different
methods), we'd expect there to be a pretty strong correlation between
them. There is. For most pitchers, SNVA (whose unit is "games above
average") is approximately equal to one-tenth of APR (whose unit is
"runs above average"). This is what you'd expect given the well-known
result that each 10 runs prevented (or gained) leads on average to
about 1 extra win in the standings (see, e.g., [2]). However, there
are plenty of cases where APR and SNVA give significantly different
evaluations. Look at the 1992 records of Charlie Leibrandt and Melido
Perez:
APR SNVA
-----------------------
Leibrandt 11.8 2.02
Perez,M 26.3 1.90
APR evaluates Perez as being 14.5 runs -- about one-and-a-half
games -- better than Leibrandt. However, SNVA shows that, when the
pitchers' performance is evaluated game-by-game, Leibrandt was
actually a little better than Perez.
The key to this discrepancy between the two measurements is found in
the amount of consistency the two pitchers exhibited in their starts.
Perez was a model of consistency last year; he rarely got bombed, but
he also was rarely dominating. Leibrandt, on the other hand, was one
of the least consistent pitchers in the majors. And that is the most
surprising result I've seen so far from these SN stats: run-prevention
stats such as ERA and APR tend to undervalue flaky pitchers, and
overvalue consistent ones, at least when you consider them pitching
for an average team. Tables 10 through 13 show the "flakiest" (most
inconsistent) and "steadiest" (most consistent) pitchers in the
leagues last year, as evaluated by the variance of the SNVA of their
individual starts. You can see from those tables that APR pretty
consistently underestimates a pitcher's value when the pitcher is
flaky, and pretty consistently overestimates his value when he's
steady. 9 of the 10 flakiest pitchers in both the NL and AL were
underestimated by APR, and 8 of the 10 steadiest in the NL and 10 of
the 10 steadiest in the AL were overestimated by APR. And the
pitchers for whom there were really large discrepancies between APR
and SNVA -- Leibrandt, Kyle Abbott, Gooden, Hammond, Sutcliffe,
Perez, Kamieniecki, McDowell -- all showed up near the top of the
predicted list.
The reason for this undervaluing is that APR counts all runs as equal,
while in fact all runs do not contribute an equal amount toward
winning/losing a game. In particular, Bill James did a study that
showed that runs scored by a team after they've already scored 5 in a
game do not contribute the same amount toward the probability of
winning than those first 5 runs did[3]. So, pitchers who give up more
than 5 runs in a couple of games will be undervalued by ERA and APR,
because those really crummy outings probably weren't quite as crummy
as ERA and APR would have you believe.
Pitcher Team APR SNVA SNVA Var.
--------------------------------------------
Smith,Z PIT 3.7 0.70 0.088
Smoltz ATL 25.1 2.67 0.083
Saberhagen NYM 4.5 0.73 0.082
Leibrandt ATL 11.8 2.02 0.082
Osborne STL -12.6 -0.97 0.079
Glavine ATL 23.9 2.71 0.076
Hurst,B SDP -1.5 0.12 0.075
Cone NYM 8.5 0.67 0.074
Belcher CIN 1.9 0.53 0.074
Benes SDP 10.7 1.22 0.068
Table 10: Flakiest 10 NL Starters in 1992,
ranked by variance of SNVA (15 starts
minimum)
Pitcher Team APR SNVA SNVA Var.
--------------------------------------------
Abbott,K PHI -11.4 -1.84 0.022
Rijo CIN 28.5 2.57 0.032
Browning CIN -8.7 -1.17 0.035
Gooden NYM -6.1 -1.33 0.036
Hammond CIN -7.4 -1.36 0.041
Tewksbury STL 38.5 4.12 0.042
Maddux,G CHI 53.9 5.75 0.042
Fernandez,S NYM 24.7 2.25 0.043
Boskie CHI -12.5 -1.47 0.044
Gardner MON -9.5 -1.20 0.044
Table 11: Steadiest 10 NL Starters in 1992,
ranked by variance of SNVA (15 starts
minimum)
Pitcher Team APR SNVA SNVA Var.
--------------------------------------------
Sutcliffe BAL -8.3 -0.33 0.089
Smiley MIN 28.3 2.75 0.078
Krueger MIN -0.1 0.14 0.078
Johnson,R SEA 1.7 0.24 0.077
Gubicza KCR 7.3 0.78 0.075
Langston CAL 5.6 0.77 0.073
Fleming SEA 19.7 2.10 0.073
Viola BOS 21.4 2.35 0.073
Rhodes BAL 6.7 0.77 0.071
Darling OAK -4.8 -0.31 0.070
Table 12: Flakiest 10 AL Starters in 1992,
ranked by variance of SNVA (15 starts
minimum)
Pitcher Team APR SNVA SNVA Var.
--------------------------------------------
Armstrong CLE -28.4 -3.08 0.034
Darwin BOS 8.2 0.45 0.036
Milacki BAL -21.8 -2.32 0.037
Kamieniecki NYY -8.9 -1.57 0.038
Perez,M NYY 26.3 1.90 0.039
Reed KCR -0.3 -0.20 0.040
Appier KCR 42.6 4.08 0.040
Cook CLE -3.6 -0.56 0.042
Hibbard CHI -10.7 -1.28 0.045
McDowell CHI 30.5 2.53 0.045
Table 13: Steadiest 10 AL Starters in 1992,
ranked by variance of SNVA (15 starts
minimum)
As an example of this, consider a David Wells outing from 1992: he
gave up 13 runs in 4+ innings. APR just subtracts his 13 runs from
the number of runs a league average pitcher would have given up in
those same 4 innings (about 2), and concludes that Wells was worth
about -11 runs, or -1.1 games, in that start. Did Wells really cost
the Blue Jays more than a game in the standings with that awful start?
Of course not. He guaranteed them a loss, of course, but they had
some chance of losing the game to begin with anyway -- about a 50%
chance if you make the simplifying assumption that they're an average
team. SNVA gives a far more reasonable value for Wells's start: it
was worth about -0.5 games. That's as much as a single start can cost
you. Wells didn't have the requisite 15 starts to show up in Table
12, but you can see from his record in Table 3 how much he was
underestimated by APR.
Effect of the park on win probability
-------------------------------------
One other question I've been looking at is how the value of starts is
influenced by park effects. Figure 1 shows the SNVA for a 9-inning
complete game in both Wrigley Field (an extreme hitters' park) and the
Astrodome (an extreme pitchers' park). We can see from the figure
that the effect of the park on the value of the start is far less at
the two extremes of start quality than it is for middle-of-the-road
starts. The difference between Wrigley and the Astrodome for the
value of a 9-inning, 5-run start is about four times as large as the
difference between Wrigley and the Astrodome for the value of a
shutout.
[If this were PostScript, there'd be a graph here]
Figure 1: SNVA for Wrigley Field (top line)
and the Astrodome (bottom line), given that
the starter pitched 9 innings
This would imply that methods of park adjustments which simply
multiply a pitcher's "raw" value by a park factor might be over- or
underestimating the park's actual effect on his value. Since the
park's effect on very good or very bad starts is much less than on
average starts, a reasonable hypothesis would be that very good or
very bad pitchers deserve less of a boost (or less diminishment) to
their rating than current park adjusment methods give them.
However, the preliminary investigation of this hypothesis I have done
on real starting pitchers (with 1992 data) has failed to find much
support for it. I'd still like to do some more work on this issue.
Weaknesses of the Approach
--------------------------
Here are a few of the problems with these measurements:
- They assume that scoring distributions of an inning are independent
from the distributions of surrounding innings.
- They (like most other measures of pitching) don't account for
situational pitching. A pitcher who gets a big lead is likely to
start throwing all fastballs, and he may give up a few meaningless
runs that he wouldn't have given up without the big lead. I'm not too
worried about this, because I don't think those big-lead situations
are common enough for anybody to make much of a difference.
- They don't account for differences in the ways pitchers are used by
their managers. Some pitchers get left in the game to get pounded,
some are routinely yanked early, etc. Note however that SN stats do a
better job than other methods of mitigating the manager's effect. If
Cito Gaston leaves David Wells in the game to give up 13 runs, SNVA
produces a rating which is not much different than if Gaston had
yanked Wells after giving up "only" 7 or 8 runs.
- They don't account for the defense playing behind the pitcher.
Suffice it to say that this is a very hard problem.
Conclusion
----------
I've presented Support-Neutral Wins, Losses, and Value Added, three
park- and league- adjusted measurements of the value of individual
starts, and of starting pitchers. I feel these are a valuable
addition to existing measurement methods, both because they can
provide a measurement of pitcher worth in units which are familiar to
all baseball fans (pitcher wins and losses) and because they seem to
be a slightly more accurate measure of the true value of a start than
existing methods.
Special thanks to Greg Spira, whose discussion sparked many of the
ideas presented here. Thanks to David Tate and others on the Internet
newsgroup rec.sport.baseball, who provided valuable feedback on the
method. And thanks to my wife, Cindy, for reading this paper and
giving me many useful suggestions.
References
----------
[1] Thorn, J. and Palmer, P. (eds.), Total Baseball, 3rd edition,
Harper Collins, New York, 1993.
[2] Thorn, J. and Palmer, P., The Hidden Game of Baseball, Doubleday
Books, New York, 1985.
[3] James, B., The 1986 Bill James Baseball Abstract, Ballantine
Books, New York, 1986, pp. 172P175.
Appendix: Park Effects
----------------------
One possible way of incorporating park effect numbers into these
measurements would be to take whatever final value the above formulas
produce (SNW, SNL, or SNVA) and multiply it by some park effect
constant for the pitcher's home park. This is essentially the approach
Thorn and Palmer use in Total Baseball. But the method of calculating
the Support-Neutral stats allows a potentially more informative use of
park effects. Since park effects (as printed in Elias, e.g.) reflect
how a park inflates or deflates average scoring ability, it makes
sense to have the "average team" playing behind the pitcher effected
by the park, and then calculate the likelihood that the pitcher's
outing plus this park-adjusted average team will lead to a win. So for
any game, the PInningScore (league average scoring) distribution is
adjusted to reflect the park's effect on run scoring. The resulting
number then reflects the park's effect on winning rather than
cumulative run scoring/prevention.
The question then becomes: how do you translate a single park effect
percentage like the ones in Elias (the only source of park effects I
have) into an adjusted PInningScore distribution? There are an
infinite number of ways to do this. The way I'm doing it now is to
change the probability of scoring 0 runs by one factor, and change the
probability of scoring i runs for i>1 by another factor, such that the
total number of expected runs scored in an inning is increased/reduced
by the Elias number. For example, if the Astrodome decreases scoring
by 10%, I increase PInningScore(0) for the Astrodome by one factor,
and decrease PInningScore(i) for i>1 by another factor, such that the
expected single-inning score reflected by PInningScore is reduced by
10% from the park-neutral scoring distribution. If that isn't clear
(and I'm sure it isn't), I should say that I don't think it makes much
difference the exact method used.
*1 Adjusted Pitching Runs is the basic metric which Thorn and Palmer
(the authors of Total Baseball) use to evaluate pitchers. APR is the
number of runs prevented by a pitcher that a league average pitcher
would've given up. The APR that I'm using in this paper differs from
Thorn and Palmer's statistic in two ways: 1) I'm using runs where
Thorn and Palmer use earned runs, and 2) the method of park adjustment
I use is a simplification of the one used in Total Baseball. It is
included here for comparison with SNVA.
*2 Actually, it's probably inaccurate to use the word "worst" here,
since the method of ranking the pitchers -- ranking them according to
SNW-SNL -- sets the baseline for comparison at league average (anyone
below .500 gets a negative rating). Of course, it's quite possible
for a below-average pitcher to still be valuable to his team. A
better method of producing this list might have been to compare a
pitcher's SN record to a lower baseline, e.g., a .450 pitcher. This
would have left pitchers like Hershiser and McCaskill, who pitched a
lot of innings at somewhat below-league-average performance, off of
the lists in favor of other pitchers who pitched fewer innings but at
further-below-average performance.