
Choice Words/Phrases
Super crunchers is a book on how we can make a variety of predictions based on number crunching. You’d be surprised to know the power of super crunching. The kind of predictions it can make and the accuracy with which it can make are worth a consideration. You may not be able to predict a stock price and make quick money out of it, but you can definitely use crunching to predict a whole lot of things.
The wine prediction:
Orley challenged the wine quality prediction techniques (swishing and spitting) that had been in place for ages and how was that? Regression? (Wine Quality = 12.145+0.00117 winter rainfall+ 0.0614 average growing season temp – 0.00386). Orley found that low levels of harvest rain and high average summer temperatures produce the greatest wines. In years when the summer is particularly hot, grapes get ripe, which lowers their acidity, while in years when there is below- average rainfall the fruit gets concentrated. So it’s in the hot and dry years that you tend to get the legendary vintages.
The point is – the use of number crunching/statistical analysis to make complex predictions. That’s what “Super Crunchers” is about. Super crunching gets together a combination of size (terabytes- 1000 gigabytes & Petabyte (1000 terabytes)), speed and scale.
Data is power- call it the data empowered businesses. Companies that can leverage this will definitely have an edge over their non data friendly peers. Super crunching leads you to data driven decision taking.
Orley challenged the wine quality prediction techniques (swishing and spitting) that had been in place for ages and how was that? Regression? (Wine Quality = 12.145+0.00117 winter rainfall+ 0.0614 average growing season temp – 0.00386). Orley found that low levels of harvest rain and high average summer temperatures produce the greatest wines. In years when the summer is particularly hot, grapes get ripe, which lowers their acidity, while in years when there is below- average rainfall the fruit gets concentrated. So it’s in the hot and dry years that you tend to get the legendary vintages.
The point is – the use of number crunching/statistical analysis to make complex predictions. That’s what “Super Crunchers” is about. Super crunching gets together a combination of size (terabytes- 1000 gigabytes & Petabyte (1000 terabytes)), speed and scale.
Data is power- call it the data empowered businesses. Companies that can leverage this will definitely have an edge over their non data friendly peers. Super crunching leads you to data driven decision taking.
Let’s take a peep into our day to day impacts of the data crunching:
1) iTunes lists the top downloaded songs
2) Delicious lists the most popular internet bookmarks
3) Netflix can recommend different movies to different people
4) eHarmony recommends people that are very similar to you
5) Google pre-picking the web pages based on your previous browsing habits etc
6) CapitalOne’s proactive call center and algorithms enables them to answer some of your questions even before you ask them. When their customer calls them for cancelling the account, they are re-directed either to a cancelling system/person or to retention specialist based on the client profitability and myriad other factors.
7) Visa with a little mining of my credit card charges can make a guess of whether I’l divorce in the next five years
Generating Data:
One can generate random data using ( =rand()) function in MS Excel. The random data thus generated can be used to gather insights
Randomized testing can be done with any policy that can be applied to some people and leave the rest. It cannot be applied to things such as a space shuttle launch or the federal rates determination etc., where we cannot randomly pick some people and assign them high interest rates, keeping the interest rates low for the rest. However there are dozens of other business/government situations to which randomized testing can be applied and make better predictions based on the randomized trial results.Should Google buy Youtube? This kind of question too cannot be readily answered by Super crunching. Super crunching requires analysis of the results of repeated decisions.
Evidence based Medicine: Misdiagnosis accounts for about 1/3rd of all medical error.
Marketing crunchers predict what products you will want to buy. Randomized studies predict how you’ll respond to a prescription drug.
Why are humans bad at making predictions? The human mind tends to suffer from a number of cognitive failings and biases that distort our ability to predict accurately e.g. people give too much weight to unusual events that seem salient( such as murders over more common and more dangerous deaths) etc.
So how about both? Traditional experts make better decisions when supplemented with results from a statistical prediction. In several studies, the most accurate way to exploit traditional experience is to merely add the expert evaluation as an additional factor in the statistical algorithm.
Fine then, what’s left for us to do? Hypothesize. What is left for us and for our intuition is to determine the variables that need to be used in the statistical analysis. The hunches of humans are still crucial in deciding on what to test and what not to test. Humans also play a vital role in collecting the right data for the analysis. Intuition can be treated as a pre-cursor to super crunching.
Epagogix has a neural network to try to predict a movie’s receipts based primarily on the characteristics of the script.
Understand numbers, understand standard deviation, get smart, speak more than yes or no….understand that Supercrunching is a complement to intuition. At the end of the day, the aim is to be able to make better choices in life—intuition or otherwise.
SD: There’s a 95% probability that a normal variable will lie within two standard deviations from the mean.
e.g. For every 100 people who work in the IT sector, an average of 40 people suffer from high cholesterol, with a SD of 10, meaning to say that, 20-60 people suffer from high cholesterol in 95%of the cases(different lots of 100 people).
e.g. For every 100 people who work in the IT sector, an average of 40 people suffer from high cholesterol, with a SD of 10, meaning to say that, 20-60 people suffer from high cholesterol in 95%of the cases(different lots of 100 people).