Data Cleansing with SQL and R


On a given project, data scientists can spend upwards of 80% of their time preparing, cleaning, and correcting data. In this session, we will look at different data cleansing and preparation techniques using both SQL Server and R. We will investigate the concept of tidy data and see how we can use tools in both languages to simplify research and analysis of a small but realistic data set.


On August 16, 2017, I gave a version of this talk at NDC Sydney. You can get the recording on the NDC Youtube channel.


Click here to access demo code for this presentation. This includes all of the SQL and R code, as well as data sources used in demos. This also includes a notebook for tidyr

The source code is licensed under the terms offered by the GPL.