Thursday, October 31, 2013

Sample to Population - The Big Data Leap

Capacity of technology to churn huge amount of data has ushered a new era in statistics.

Lets visit Sampling definition " sampling is concerned with the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population " , 'subset' is the term which expresses statisticians workaround to get as close as possible to the population spread. As we know the act of getting sample right is of paramount importance.

Oh , what if we can get the whole population data - so that's a clear possibility with the advent of Big Data moving in fast. Massive parallel processing of the Map-Reduce Framework on the Distributed File System , its all possible. To top it, likes of Revolution R & Mahout are already there.

Statistics as we know of today is on its journey to accept population as the base data for analysis rather than the sample. Better estimates , predictions are on its way....

No comments:

Post a Comment