Tidy Data

Tidy Data

Wickham, Hadley
Journal of statistical software 59 (2014): 1-23.
https://doi.org/10.18637/jss.v059.i10

It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data (Dasu and Johnson 2003). Data preparation is not just a first step, but must be repeated many times over the course of analysis as new problems come to light or new data is collected. Despite the amount of time it takes, there has been surprisingly little research on how to clean data well. Part of the challenge is the breadth of activities it encompasses: from outlier checking, to date parsing, to missing value imputation. To get a handle on the problem, this paper focuses on a small, but important, aspect of data cleaning that I call data tidying: structuring datasets to facilitate analysis.
— Hadley Wickham
Blogverzeichnis - Bloggerei.de