09 — Large structured data
Billions.

Slack channel: #09-large-structured-data
This week is about the first look at real “big” data, conventional data with many observations and variables that are not always as nicely formatted as you would like them to be.1
Lecture slides
Code
To be added.
Further recommended resources
tidyverse
For more information on the tidyverse, check out the following links:
- The chapter on data transformation in Hadley Wickham’s book “R for Data Science”: https://r4ds.had.co.nz/transform.html (PS: The whole “book” is worth a read)
- This website that visualizes each step in a chained (piped) tidyverse transformation: https://tidydatatutor.com. The tidylogpackage prints the changes resulting from the transformation in your console.
- Gábor Békés and Gábor Kézdi’s great book “Data Analysis for Business, Economics, and Policy”, especially chapters 2 and 3
- and Grant McDermott’s slides: https://raw.githack.com/uo-ec607/lectures/master/05-tidyverse/05-tidyverse.html
data.table
For more information on the data.table, check out the following links:
- The official data.tablevignette: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
- Grant McDermott’s slides: https://raw.githack.com/uo-ec607/lectures/master/05-datatable/05-datatable.html
- This neat introduction to data.tableby atrebas: https://atrebas.github.io/post/2020-06-17-datatable-introduction/
ggplot2
We only just started with ggplot2, but if you want to know more already, check out the following links:
- Kieran Healy’s “Data Visualization — A practical introduction”, especially the chapter “3 Make a plot.”
- The chapter on data visualization in “R for Data Science”: https://r4ds.had.co.nz/data-visualisation.html