[ Home ] [ Library ] [ Bookstore ] [ Contact ] [ Search ]
8 Mar 1996 An Alternative Derivation of the Sagarin Number from AVG, OBP, and SLG ---------------------------------------------------------------------- Sometime in August '95 someone posted some Sagarin numbers from the USA Today to the Red Sox mailing list. The Sagarin number essentially tells you how many runs per game would be scored by a lineup that consisted of nine copies of the same player. I was very curious about where the numbers came from. Unfortunately, Sagarin only chooses to say that he does a "Markov Chain Analysis" based on AB, H, 2B, etc. He certainly does not publish enough information to replicate his work.
Around the same time I was fiddling with a lineup simulator that I had written to help myself understand how AVG, OBP, and SLG affected run scoring for a team. The results that I got gave me Total Bases and OBP for an entire season of 25 batting outs per game with an arbitrary lineup. This data I used in Bill James' Runs Created formula RC ~= TB * OBP. It immediately occurred to me that I get recreate the Sagarin effect by simply repeating the same batter in all nine lineup slots. For instance 1941 Ted Williams:
Hits BB TB PA AVG OBP SLG
---- -- -- -- --- --- ---
Williams 336 263 608 1091 406 549 734
Williams 330 257 597 1069 406 549 735
Williams 322 252 583 1046 406 549 734
Williams 314 245 568 1019 406 549 734
Williams 309 241 559 1001 407 549 736
Williams 300 235 543 974 406 549 735
Williams 294 229 532 952 407 549 736
Williams 285 223 516 925 406 549 735
Williams 278 217 503 902 406 549 734
---- --- ---- ---- --- --- ---
Total: 2768 2162 5009 8979 406 549 735 RC: 2750
Sagarin came up with a total of 16.99 runs per game for 1941 Ted Williams while 2750 Runs/162 Games = 16.98. Hmmm... and I don't even have any Markov Chains!
There are some other interesting results. The AL hit .270/.344/.427 this (1995) season. Let's call this MLV-AL, and plug it into my program.
Hits BB TB PA AVG OBP SLG
---- -- -- -- --- --- ---
MLV-AL 182 76 288 750 270 344 427
MLV-AL 178 75 282 735 270 344 427
MLV-AL 174 73 275 719 269 344 426
MLV-AL 170 71 269 701 270 344 427
MLV-AL 167 70 264 688 270 344 427
MLV-AL 162 68 256 670 269 343 425
MLV-AL 159 66 251 654 270 344 427
MLV-AL 154 65 244 636 270 344 427
MLV-AL 150 63 237 620 269 344 425
---- --- ---- ---- --- --- ---
Total: 1496 627 2366 6173 270 344 427 RC: 814
Sagarin # = 814/162 = 5.025 RPG
I estimated the league run average by adding up the Runs for each team in the AL and dividing it by the number of games (144) and the number of teams (14). I ended up with an estimated RA of 5.07. This is an error of about %0.9 which I think is pretty good for an estimate based on an approximation.
After thinking about it for a while, I realized that with all nine AVG, OBP, and SLG the same I could derive a simple equation for the Sagarin number from the equations in my lineup simulator.
The derivation... -----------------
Bill James' Runs Created formula describes team scoring. The Sagarin number also describes team scoring. It just so happens that Sagarin's team consists of all identical players. The only differnce is that James describes total runs while Sagarin describes runs/game. After normalizing for the number of games played one would expect identical results from each.
Key --- AB - At Bats AVG - Batting Average BO - Batting Outs BOG - Batting Outs per game H - Hits OB - Number of times On Base (H + WHP) OBP - On Base Percentage PA - Plate Apperances (AB + WHP) SLG - Slugging Percentage TB - Total Bases WHP - Walks plus HBP
(0) TB * OBP = Runs_Created == Sagarin #
Well, OBP is obvious, but how can we figure out TB...
....First we need to figure out the number of PAs...
(a) OBP = OB/PA && (1 - OBP) = BO/PA (b)
....Because you either get on base or you make an out
We don't know OB, so let's try...
PA = BO/(1 - OBP) (from b)
Batting Outs can be derived empirically, but I haven't seen any stats services that provide either PAs or HBP (for PA = AB + BB + HBP), so...
....We know that...
OBP = OB/PA = (H + WHP)/(AB + WHP) (from a)
OBP*(AB + BB) = H + WHP
OBP*AB - H = WHP - OBP*WHP
OBP*AB - H = WHP*(1 - OBP)
(c) WHP = (OBP*AB - H)/(1 - OBP)
Now back to Batting Outs...
BO = PA*(1 - OBP) (from b)
BO = (AB + WHP)*(1 - OBP)
Using 1995 AL statistics...
OBP AB H WHP BO BOG
--- -- - --- -- ---
BAL .342 4837 1267 589 3570 24.79
BOS .357 4997 1399 599 3598 24.99
CAL .352 5019 1390 581 3629 25.03 (145 games)
CHI .354 5060 1417 579 3643 25.12 (145 games)
CLE .361 5028 1461 554 3567 24.77
DET .327 4865 1204 575 3661 25.42
KC .328 4903 1275 496 3628 25.20
MIL .336 5000 1329 529 3671 25.50
MIN .346 5005 1398 510 3607 25.05
NYY .357 4947 1365 624 3582 24.70 (145 games)
OAK .341 4915 1296 577 3619 25.13
SEA .350 4996 1377 572 3619 24.96 (145 games)
TEX .338 4913 1304 539 3609 25.06
TOR .328 5036 1309 510 3727 25.88
(d) Mean BOG = 25.12
Now for one game in the Sagarin case...
(1) PA = BOG/(1 - OBP) ~= 25/(1 - OBP)
We have PAs, but we want TB, let's get H first...
PA * OBP = BOG/(1 - OBP) * OBP = OB
That's not quite what we need, so...
OB * H/OB = H
....but what is H/OB?
H/OB = H/(H + WHP)
H = AVG*AB
(e) H/OB = AVG*AB/(AVG*AB + WHP)
Let's get rid of WHP...
H/OB = AVG*AB/[AVG*AB + (OBP*AB - H)/(1 - OBP)] (from c & e)
H/OB = AVG*AB/[AVG*AB + (OBP*AB - AVG*AB)/(1 - OBP)]
AB cancels from all terms!!!...
H/OB = AVG/[AVG + (OBP - AVG)/(1 - OBP)]
Simplifying H/OB...
H/OB = AVG*(1 - OBP) / [AVG*(1 - OBP) + OBP - AVG]
H/OB = AVG*(1 - OBP) / (AVG - OBP*AVG + OBP - AVG)
H/OB = AVG*(1 - OBP) / (OBP - OBP*AVG)
(e) H/OB = AVG*(1 - OBP) / OBP*(1 - AVG)
So now we have...
PA * OBP * H/OB = OB * H/OB = H
....or...
BOG/(1 - OBP) * OBP * AVG*(1 - OBP)/OBP*(1 - AVG) = H (from 1 & e)
(2) H = BOG*AVG/(1 - AVG)
With H we can get TB...
SLG = TB/AB = TB/(H/AVG) = TB*AVG/H
TB = H*SLG/AVG
TB = BOG*AVG/(1 - AVG) * SLG/AVG (from 2)
(3) TB = BOG*SLG/(1 - AVG)
Now plug TB back into Bill James' formula...
BOG*SLG*OBP
RC = TB * OBP = ----------- (from 0 & 3)
(1 - AVG)
Thus...
************************************************************************ * * * SLG * OBP * * Sagarin # ~= 25 * --------- * * (1 - AVG) * * * ************************************************************************
Checking our math...
Ted Williams hit .406/.549/.735 in 1941:
25*.735*.549/(1 - .406) = 16.98 vs 16.99 claimed by Sagarin!
[ Home ] [ Library ] [ Bookstore ] [ Contact ] [ Search ]
Last Updated: Contact webmaster@stathead.com for corrections or problems
Copyright 1997-2001 by Keith Woolner. All included authors retain the copyrights to their original works.