Monday, March 26, 2012

An Alternative Batting Statistic For Century Counts And Batting Average

In late 2011, I published two new statistics that measure Test batsmanship - Performance and Value. Performance measures when a batsman has scored his Test runs, while Value measures the value of a batsman to his team. Two batsmen with similar or even identical batting averages could have very different Performance and Value scores depending on when they scored their runs and how strong their respective teams are. The recent interest in the milestone of a 100 international hundreds has also brought to light the largely arbitrary nature of the century as a milestone. Sachin Tendulkar has made 100 international hundreds, and 161 international half centuries. 28 of those 161 half centuries are scores from 90 and 99. So, if we considered 90 to be a milestone instead of 100, Tendulkar would, statistically at least, have to be seen as being far further ahead of his contemporaries (and predecessors) than we currently imagine him to be. His 50 to 90 conversion rate is significantly superior to his 50 to 100 conversion rate (which is already among the best of all time).

My friend Amogh told me about the Hirsch Index, which is a measure of productivity and impact of a scholar through a measure of his citations, and suggested that it could be used as a model for a measure for batting. If a scholar has a total of n publications, then his h-index such  that he has produced h publications that have been cited at least h times. We decided to use this basic idea to develop such a score for Test batsmen.



A Test Batsman's H score is such that he has produced h innings of at least h runs. Suppose a batsman has played 10 Test innings and his scores in these innings are 8, 0, 72, 14, 45, 148, 30*, 3, 90 and 1. This sequence includes exactly 7 scores of at least 7, giving the batsman a H score of 7.

The longer a player plays (i.e., the more innings he plays), the higher his H score is likely to be. The H value has the property that it can never reduce. For example, if the batsmen in the above paragraph follows this sequence of scores with 6 consecutive innings between 0 and 3, his H score would still be 7 after those six consecutive failures. This aspect of the H index also illustrates the requirement that it be normalized. We normalize the H score by innings. If you see the chart below, Sir Donald Bradman has a H value of 44 in 80 innings, while Sachin Tendulkar has an H value of 76 in 302 innings.

While the H score grows as a batsman plays more innings, the H/Innings score typically declines. This is because of at least a couple of reasons. First, longer careers are more likely to have phases where a player is not in form. Secondly, and more pertinently, in order for Tendulkar to improve hi H score, he will need to score at least 76, while in order for Bradman to have improved his H score, his 81st innings (had he played it) would have had to be 45 for him to have a chance to improve his score. In Tendulkar's case, a single score of 77 would not guarantee an improvement in his H score. In order to improve hi H score, he would have to score at least 77 enough number of times to replace all the scores of 76 that make up his current H score.

As the final H-statistic, we propose to use H*H/Innings. This is a measure of consistency and quality without either privileging or penalizing a player for length of career (Contemporary players play Tests far more frequently than earlier generations). In order the illustrate how this might work, see the following charts which compare the H*H/Innings scores for Jacques Kallis (H = 66), Ricky Ponting (H = 67) and Rahul Dravid (H = 68).


This chart shows how the H*H/Innings score develops over a player's career. The chart below shows the development of each player's H score over consecutive 80 innings splits (in homage to Sir Donald Bradman, who ended his first and only 80 innings in Test Cricket with a H score of 44, unmatched by any player over  the first 80 innings of his career). Ricky Ponting's best 80 innings split has a H score of 45. What this type of statistic shows is the trajectory of a player's career. Ponting's has had one extremely strong phase, while Kallis and Dravid have had steadier careers.


We propose that the statistic ably replaces both the batting average as well as century and half-century counts in Test Cricket. Currently, Sachin Tendulkar's career would have to be summarized by saying that he has made 51 centuries and 65 half centuries, with an average of 55.44 in over 300 innings in Test Cricket. It would be far more complete to simply say that Tendulkar has a H score of 76 in over 300 Test innings.

The H score is also a much more reliable measure of the quality of a batsman in our view because it takes into consideration the frequency with which the first runs are scored in an innings, as opposed to batting average, which is often inflated by undefeated innings and big centuries in which the later runs are often made in easy batting conditions and on flat wickets.

Top 50 Test batsmen of all time according to the H*H/Innings 
The adjoining chart shows the top 50 batsmen of all time sorted by their H*H/Innings scores. The batsmen marked in yellow are current players. Column "H" refers to the H index at the end of (or at the end of 2011, our dataset is not updated to include 2012 Tests) a player's career. "H/Inns" normalizes this by the number of innings player (given in 'Inns'), while "H2/Inns" is the final score (H*H/Inns) for each player.

Here are some salient facts about the H2/Inns score:

The only player to have a H*H/Inns score over 24 at some point in his career is Sir Donald Bradman. As the chart shows, he also ended his career with a score over 24.

Only two players have had a H*H/Inns score over 23 at some point in their careers - Bradman, Sir Jack Hobbs in 1930, and Ken Barrington in 1965.

Only six players have broken 22 - Bradman, Hobbs, Barrington, Sunil Gavaskar (1979), Herbert Sutcliffe (1932) and Everton Weekes (1955).

Only seven have broken 21 - Michael Hussey (2008) joins the six mentioned above. In his first 45 innings in Test Cricket, Hussey reached at least 37, 37 times!

Only 11 players have had a H*H/Inns over 20 at some point in their careers - Bradman, Hobbs, Barrington, Gavaskar, Sutcliffe, Weekes, Tendulkar (2002), Lara (1997), Sir Len Hutton (1954) and Mohammad Yousuf (2007).

For comparison, our last chart shows the top 50 Test batsmen of all time as they stood on December 31, 2000. As in the adjoining chart, the players marked in yellow were active Test players at that date.

Please comment, critique and share this idea. Please also feel free to leave a comment and suggest a groups of players that you would like to see compared using this method.
















































9 comments:

  1. KD,

    Very interesting thoughts. I wonder if we can tie the H-score to Test victory conversion in some form.
    This is my brief idea:

    I heard Michael Holding say once that Viv never had to score 150+ or 200's in the innings he played purely because he already scored enough for the bowlers to rip apart the opposition. When you have 4 or 5 genuine fast bowlers, it makes perfect sense to score enough so that the bowlers can defend the total as opposed to score 200's and reduce time for the bowlers to take the wickets.

    ReplyDelete
  2. Great and very comprehensive post you have shared with us and being a cricket lover it is very interesting for me.

    ReplyDelete
  3. Hi KD,

    One thing I wanted to mention is that while the H-Index highlights at least x innings when a batsman has scored x runs, x (w.r.t innings) still represents only a small fraction of the total innings a batsman has played. e.g. for SRT, H-score is 76 in 302 innings. 76 is still only ~25% of the innings he has played in his career. While another very interesting way to put it would be that 75% of the time he hasn't performed up to this index ;)

    But this might undermine some of the knocks (such as the 52 he scored against Pak in the Asia cup recently) because this index wouldn't account for it since it is <76. If you could couple this with a strength of the opponent index, that would make this even more brilliant!

    Also, let's compare this to H Sutcliffe (right above SRT in Table 1). The H-Index is 40 from 82 innings. While this doesn't seem as impressive as SRT's simply because statistically speaking, SRT has at least 76 runs scores in 76 Innings, which is most of Sutcliffe's career, the H^2/I index appears to make Sutcliffe a slightly more valuable player than SRT. Do correct me if I'm not right.

    Also, it would be great to come up with something to complement the H-Index that accounts for the remaining 75% of SRT's knocks (and so on for other players) because if you look at most players, the H-Index represents (on an average) only 25-35% of their total knocks.

    Just my two cents. I really applaud you for the use of stats and the systematic approach you adopt towards addressing any issue! And go places others usually don't venture. Kudos :)

    Ajit Bhaskar (@ajit_bhaskar on twitter).

    ReplyDelete
  4. This is interesting stuff, Kartikeya, and I will write a post on it. Before I do, I'm just trying to work out what happened to your stats. Hobbs had 102 innings, not 100. Barrington had 131, not 130. Sutcliffe 84 and not 82, etc.

    I'm also guessing that you've excluded Bangladesh Tests, but I'm not sure.

    ReplyDelete
  5. Thanks David. I look forward to your post.

    I haven't excluded Bangladesh Tests. My data had DNB's and TDNB's as 0 not outs. I got rid of those, and may have gotten rid of some actual 0 not outs as well.

    ReplyDelete
  6. Very interesting piece David! Great job. Really enjoyed reading it!

    ReplyDelete
  7. I think you might have subtracted those DNB's from an innings total which didn't include them in the first place. But I can't get all the numbers to match up properly, so I don't know.

    In any case I've got my own code running again so I can find h-indices myself. My post is about half-written, and it'll probably be at least another day before I work everything out - every time I think I've understood what an h-index means in cricket, I find another wrinkle....

    ReplyDelete
  8. David,
    I think you are right about the data. I'm going to try and fix the data this week end.

    One thing to note is, that the DNB's do not affect the Hirsch index itself, but obvious do affect the normalization.

    I'll look forward to your post.

    ReplyDelete
  9. You have a real talent for wow gold putting your thoughts into clear, original content. Your article is easy to cheap world of warcraft gold read and understand. You have brought forth some cheap wow gold really good points that I agree with and appreciate.

    ReplyDelete