| Fun, Data Mining | 23 Mar 2011 |
| Predictive Analytics and Cricket by Ingo Mierswa | Comment (1) |
I am not really deep into Cricket myself. However, I found this interesting blog entry which discusses some reasons for successful cricket games discoverey by data mining. It is not hard to tell that the author favors the Indian team :-)
The first thing to do is some basic statistics: How often did the Indian cricket team won in the past against certain other teams? For example, the Indian team won against England in 66% of all occasions during the last 5 years where both teams played against each other. Agains Australia, however, Indian won only in 40% of all those cases.
So the important point is: what were the circumstances under which India had won those 40%? And here is where RapidMiner was used: the matches were described by attributes like "partnership", "pace bowlers", or "slow bowlers". The resulting decision tree looks like the following:
The model was built on all existing cases between India and Australia from the last 5 years. It is easy to tell that partnerships play the most significant role. In particular,
- India need to have 2 significant partnerships worth at least 77 runs
- If not, the bowlers, specifically pace bowlers, have to step into the breach and take more than 7 wickets
Without any knowledge about cricket, I have hardly any idea what this actually means. I suppose that those two strong partnerships with 77 runs or more are two sets of good batting partners playing well with each other. If you don't have those, it seems that fast bowles taking down the wooden "goals" at least 7 times helps a lot.
This is what data mining is actually about: Finding insights in data without the need of having prior knowledge (of course you have to validate the findings!). The latter is actually missing in the blog post but maybe is part of the full report which can be downloaded on the web site. However, a fun read and a nice data mining application!


