Tidy Data
Wickham, Hadley
Journal of statistical software 59 (2014): 1-23.
https://doi.org/10.18637/jss.v059.i10
“It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data (Dasu and Johnson 2003). Data preparation is not just a first step, but must be repeated many times over the course of analysis as new problems come to light or new data is collected. Despite the amount of time it takes, there has been surprisingly little research on how to clean data well. Part of the challenge is the breadth of activities it encompasses: from outlier checking, to date parsing, to missing value imputation. To get a handle on the problem, this paper focuses on a small, but important, aspect of data cleaning that I call data tidying: structuring datasets to facilitate analysis.”
Wickham, Hadley
Journal of statistical software 59 (2014): 1-23.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton
Nature 521, no. 7553 (2015): 436-444.
Mandelbrot, Benoit
Science 156, no. 3775 (1967): 636-638.
Nuzzo, Regina
Nature News 506, no. 7487 (2014): 150
Bennett, Charles H.
In Randomness And Complexity, From Leibniz To Chaitin, pp. 3-12. 2007
Bennett, Charles H., Gilles Brassard, and N. David Mermin
Physical Review Letters 68, no. 5 (1992): 557
Minsky, Marvin
Communications of the ACM 43, no. 8 (2000): 66-73
Turing, Alan
Proceedings of the London mathematical society 2, no. 1 (1937): 230-265
Lloyd, Seth
Nature 406, no. 6799 (2000): 1047-1054
Knuth, Donald E.
The Computer Journal 27, no. 2 (1984): 97-111
Brooks, Frederick P.
IEEE Computer 20, no. 4 (1987): 10-19
Lamport, Leslie, Robert Shostak, and Marshall Pease
ACM Transactions on Programming Languages and Systems (TOPLAS) 4, no. 3 (1982): 382–401
Lindley, Sam, Philip Wadler, and Jeremy Yallop
Journal of Functional Programming 20, no. 1 (2010): 51-69
Wadler, Philip
ACM SIGPLAN Notices 47, no. 9 (2012): 273-286
Richards, Blake A., Timothy Lillicrap, et al.
Nature Neuroscience 22, no. 11 (2019): 1761-1770.