Big data and statistics: A statistician’s perspective
DOI:
https://doi.org/10.7203/metode.0.3590Keywords:
Big Data, statistics, case studies, pitfalls, challengesAbstract
Big Data brings unprecedented power to address scientific, economic and societal issues, but also amplifies the possibility of certain pitfalls. These include using purely data-driven approaches that disregard understanding the phenomenon under study, aiming at a dynamically moving target, ignoring critical data collection issues, summarizing or preprocessing the data inadequately and mistaking noise for signal. We review some success stories and illustrate how statistical principles can help obtain more reliable information from data. We also touch upon current challenges that require active methodological research, such as strategies for efficient computation, integration of heterogeneous data, extending the underlying theory to increasingly complex questions and, perhaps most importantly, training a new generation of scientists to develop and deploy these strategies.
Downloads
References
Berry, D., 2012. «Adaptive Clinical Trials in Oncology». Nature Reviews Clinical Oncology, 9: 199-207. DOI: <10.1038/nrclinonc.2011.165>.
Curtice, J. and D. Firth, 2008. «Exit Polling in a Cold Climate: the BBC-ITV Experience Explained». Journal of the Royal Statistical Society A, 171(3): 509-539. DOI: <10.1111/j.1467-985X.2007.00536.x>.
Fan, J.; Han, F. and H. Liu, 2014. «Challenges of Big Data Analysis». National Science Review, 1 (2): 293-314. DOI: <10.1093/nsr/nwt032>.
Font-Burgada, J.; Reina, O.; Rossell, D. and F. Azorín, 2013. «ChroGPS, a Global Chromatin Positioning System for the Functional Analysis and Visualization of the Epigenome». Nucleic Acids Research, 42(4): 1-12. DOI: <10.1093/nar/gkt1186>.
Gorton, G., 2009. «Information, Liquidity, and the (Ongoing) Panic of 2007». American Economic Review, 99(2): 567-572. DOI: <10.1257/aer.99.2.567>.
Hilbert, M., 2012. «How Much Information Is There in the “Information Society”?». Significance, 9(4): 8-12. DOI: <10.1111/j.1740-9713.2012.00584.x>.
International Business Machines Corporation, 2011. IBM Big Data Success Stories. International Business Machines Corporation. Armonk, NY. Available at: <http://public.dhe.ibm.com/software/data/sw-library/big-data/ibm-big-data-success.pdf>.
Jordan, M., 2013. «On Statistics, Computation and Scalability». Bernoulli, 19(4): 1378-1390. DOI: <10.3150/12-BEJSP17>.
King, G. et al., 2009. «Public Policy for the Poor? A Randomized Assessment of the Mexican Universal Health Insurance Programme». The Lancet, 373: 1447-1454. DOI: <10.1016/S0140-6736(09)60239-7>.
Lazer, D.; Kennedy, R.; King, G. and A. Vespignani, 2014. «The Parable of Google Flu: Traps in Big Data Analysis». Science, 343(6176): 1203-1205. DOI: <10.1126/science.1248506>.
Lewis, M., 2003. Moneyball. The Art of Winning an Unfair Game. W. W. Norton & Company. New York.
Lohr, S., 2012. «The age of Big Data». The New York Times, 11 February 2012. Available at: <www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html>.
Müller, P.; Parmigiani, G.; Robert, C. and J. Rousseau, 2004. «Optimal Sample Size for Multiple Testing: the Case of Gene Expression Microarrays». Journal of the American Statistical Association, 99(468): 990-1001. DOI: <10.1198/016214504000001646>.
Nuzzo, R., 2014. «Scientific Method: Statistical Errors», Nature, 506: 150-152. DOI: <10.1038/506150a>.
Rossell, D.; Stephan-Otto Attolini, C.; Kroiss, M. and A. Stöcker, 2014. «Quantifying Alternative Splicing from RNA-Sequencing Data». The Annals of Applied Statistics, 8(1): 309-330. DOI: <10.1214/13-AOAS687>.
Silver, N., 2012. The Signal and the Noise: Why So Many Predictions Fail – but Some Don’t. Penguin Press. New York.
Shaw, J., 2014. «Why “Big Data” Is a Big Deal». Harvard Magazine, 3: 30-35, 74-75. Available at: <http://harvardmag.com/pdf/2014/03-pdfs/0314-30.pdf>.
Student, 1931. «The Lanarkshire Milk Experiment». Biometrika, 23(3-4): 398-406. DOI: <10.2307/2332424>.
World Economic Forum, 2012. Big Data, Big Impact: New Possibilities for International Development. World Economic Forum. Cologny, Switzerland. Available at: <www3.weforum.org/docs/WEF_TC_MFS_BigDataBigImpact_Briefing_2012.pdf>.
Downloads
Published
How to Cite
-
Abstract1278
-
PDF (Català)364
-
PDF (Español)161
-
PDF172
Issue
Section
License
All the documents in the OJS platform are open access and property of their respective authors.
Authors publishing in the journal agree to the following terms:
- Authors keep the rights and guarantee Metode Science Studies Journal the right to be the first publication of the document, licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License that allows others to share the work with an acknowledgement of authorship and publication in the journal.
- Authors are allowed and encouraged to spread their work through electronic means using personal or institutional websites (institutional open archives, personal websites or professional and academic networks profiles) once the text has been published.