[ Home ] [ Library ] [ Bookstore ] [ Contact ] [ Search ]

The Sagarin number, simply!

by Tom Fontaine
8 Mar 1996 


An Alternative Derivation of the Sagarin Number from AVG, OBP, and SLG
----------------------------------------------------------------------
Sometime  in August '95 someone posted  some Sagarin  numbers from the
USA Today to the Red Sox mailing list.  The Sagarin number essentially
tells you how  many runs per  game would be  scored  by a  lineup that
consisted of nine copies of the same player.  I was very curious about
where the  numbers came from.   Unfortunately, Sagarin only chooses to
say that  he does a "Markov Chain  Analysis" based on  AB, H, 2B, etc.
He certainly  does not publish   enough information  to replicate  his
work.
Around the same time I was fiddling with a lineup simulator that I had
written to help  myself understand how AVG,  OBP, and SLG affected run
scoring for a team.   The results that  I got gave  me Total Bases and
OBP for an    entire season of 25  batting   outs per  game with   an
arbitrary  lineup.   This  data I used  in  Bill  James'  Runs Created
formula RC ~= TB  *  OBP.  It immediately   occurred to me that  I  get
recreate the Sagarin effect by simply repeating the same batter in all
nine lineup slots.  For instance 1941 Ted Williams:
                Hits     BB      TB      PA     AVG     OBP     SLG
                ----     --      --      --     ---     ---     ---
Williams         336    263      608    1091    406     549     734
Williams         330    257      597    1069    406     549     735
Williams         322    252      583    1046    406     549     734
Williams         314    245      568    1019    406     549     734
Williams         309    241      559    1001    407     549     736
Williams         300    235      543     974    406     549     735
Williams         294    229      532     952    407     549     736
Williams         285    223      516     925    406     549     735
Williams         278    217      503     902    406     549     734
                ----    ---     ----    ----    ---     ---     ---
Total:          2768    2162    5009    8979    406     549     735  RC: 2750
Sagarin came up with   a total of  16.99 runs  per  game for 1941  Ted
Williams while 2750 Runs/162 Games = 16.98.  Hmmm...  and I don't even
have any Markov Chains!
There are some  other interesting results.   The AL hit .270/.344/.427
this (1995) season.    Let's call this  MLV-AL, and  plug  it into  my
program.
                Hits     BB      TB      PA     AVG     OBP     SLG
                ----     --      --      --     ---     ---     ---
MLV-AL           182     76      288     750    270     344     427
MLV-AL           178     75      282     735    270     344     427
MLV-AL           174     73      275     719    269     344     426
MLV-AL           170     71      269     701    270     344     427
MLV-AL           167     70      264     688    270     344     427
MLV-AL           162     68      256     670    269     343     425
MLV-AL           159     66      251     654    270     344     427
MLV-AL           154     65      244     636    270     344     427
MLV-AL           150     63      237     620    269     344     425
                ----    ---     ----    ----    ---     ---     ---
Total:          1496    627     2366    6173    270     344     427  RC: 814
Sagarin # = 814/162 = 5.025 RPG
I estimated the league run average by adding up the Runs for each team
in the AL and dividing it by  the number of games (144) and the number
of teams (14).  I ended up with an estimated RA of 5.07.  This is an
error of about %0.9 which I think is pretty good for an estimate based
on an approximation.
After thinking about  it for  a while, I  realized that  with all nine
AVG, OBP,  and SLG the same I  could derive a  simple equation for the
Sagarin number from the equations in my lineup simulator.
The derivation...
-----------------
Bill James' Runs Created formula describes team scoring.  The Sagarin
number also describes team scoring.  It just so happens that Sagarin's
team consists of all identical players.  The only differnce is that
James describes total runs while Sagarin describes runs/game.  After
normalizing for the number of games played one would expect identical
results from each.
Key
---
AB  - At Bats
AVG - Batting Average
BO  - Batting Outs
BOG - Batting Outs per game
H   - Hits
OB  - Number of times On Base (H + WHP)
OBP - On Base Percentage
PA  - Plate Apperances (AB + WHP)
SLG - Slugging Percentage
TB  - Total Bases
WHP - Walks plus HBP
(0)     TB * OBP = Runs_Created == Sagarin #
Well, OBP is obvious, but how can we figure out TB...
....First we need to figure out the number of PAs...
        (a) OBP = OB/PA  &&  (1 - OBP) = BO/PA (b)
....Because you either get on base or you make an out
We don't know OB, so let's try...
        PA = BO/(1 - OBP)  (from b)
Batting Outs can be derived empirically, but I haven't seen any stats
services that provide either PAs or HBP (for PA = AB + BB + HBP), so...
....We know that...
        OBP = OB/PA = (H + WHP)/(AB + WHP)  (from a)
        OBP*(AB + BB) = H + WHP
        OBP*AB - H = WHP - OBP*WHP
        OBP*AB - H = WHP*(1 - OBP)
        (c) WHP = (OBP*AB - H)/(1 - OBP)
Now back to Batting Outs...
        BO  = PA*(1 - OBP) (from b)
        BO  = (AB + WHP)*(1 - OBP)
Using 1995 AL statistics...
        OBP      AB       H     WHP      BO      BOG
        ---      --       -     ---      --      ---
BAL     .342    4837    1267    589     3570    24.79
BOS     .357    4997    1399    599     3598    24.99
CAL     .352    5019    1390    581     3629    25.03 (145 games)
CHI     .354    5060    1417    579     3643    25.12 (145 games)
CLE     .361    5028    1461    554     3567    24.77
DET     .327    4865    1204    575     3661    25.42
KC      .328    4903    1275    496     3628    25.20
MIL     .336    5000    1329    529     3671    25.50
MIN     .346    5005    1398    510     3607    25.05
NYY     .357    4947    1365    624     3582    24.70 (145 games)
OAK     .341    4915    1296    577     3619    25.13
SEA     .350    4996    1377    572     3619    24.96 (145 games)
TEX     .338    4913    1304    539     3609    25.06
TOR     .328    5036    1309    510     3727    25.88
(d) Mean BOG = 25.12
Now for one game in the Sagarin case...
(1)     PA = BOG/(1 - OBP) ~= 25/(1 - OBP)
We have PAs, but we want TB, let's get H first...
        PA * OBP = BOG/(1 - OBP) * OBP = OB
That's not quite what we need, so...
        OB * H/OB = H
....but what is H/OB?
        H/OB = H/(H + WHP)
        H = AVG*AB
        (e) H/OB = AVG*AB/(AVG*AB + WHP)
Let's get rid of WHP...
        H/OB = AVG*AB/[AVG*AB + (OBP*AB - H)/(1 - OBP)] (from c & e)
        H/OB = AVG*AB/[AVG*AB + (OBP*AB - AVG*AB)/(1 - OBP)]
AB cancels from all terms!!!...
        H/OB = AVG/[AVG + (OBP - AVG)/(1 - OBP)]
Simplifying H/OB...
        H/OB = AVG*(1 - OBP) / [AVG*(1 - OBP) + OBP - AVG]
        H/OB = AVG*(1 - OBP) / (AVG - OBP*AVG + OBP - AVG)
        H/OB = AVG*(1 - OBP) / (OBP - OBP*AVG)
        (e) H/OB = AVG*(1 - OBP) / OBP*(1 - AVG)
So now we have...
        PA * OBP * H/OB = OB * H/OB = H
....or...
        BOG/(1 - OBP) * OBP * AVG*(1 - OBP)/OBP*(1 - AVG) = H (from 1 & e)
(2)     H = BOG*AVG/(1 - AVG)
With H we can get TB...
        SLG = TB/AB = TB/(H/AVG) = TB*AVG/H
        TB = H*SLG/AVG
        TB = BOG*AVG/(1 - AVG) * SLG/AVG (from 2)
(3)     TB = BOG*SLG/(1 - AVG)
Now plug TB back into Bill James' formula...
                        BOG*SLG*OBP
        RC = TB * OBP = -----------  (from 0 & 3)
                         (1 - AVG)
Thus...
************************************************************************
*                                                                      *
*                                      SLG * OBP                       *
*                    Sagarin # ~= 25 * ---------                       *
*                                      (1 - AVG)                       *
*                                                                      *
************************************************************************
Checking our math...
Ted Williams hit .406/.549/.735 in 1941: 
        25*.735*.549/(1 - .406) = 16.98  vs 16.99 claimed by Sagarin!

[ Home ] [ Library ] [ Bookstore ] [ Contact ] [ Search ]

Last Updated: Contact webmaster@stathead.com for corrections or problems

Copyright 1997-2001 by Keith Woolner. All included authors retain the copyrights to their original works.