04 — Large structured data
Billions.
Slack channel: #04-large-structured-data
This week is about the first look at real “big” data, conventional data with many observations and variables that are not always as nicely formatted as you would like them to be.1
Lecture slides
Code
Check the course repository for the application.
Further recommended resources
tidyverse
For more information on the tidyverse
, check out the following links:
- The chapter on data transformation in Hadley Wickham’s book “R for Data Science”: https://r4ds.had.co.nz/transform.html (PS: The whole “book” is worth a read)
- This website that visualizes each step in a chained (piped) tidyverse transformation: https://tidydatatutor.com. The
tidylog
package prints the changes resulting from the transformation in your console. - Gábor Békés and Gábor Kézdi’s great book “Data Analysis for Business, Economics, and Policy”, especially chapters 2 and 3
- and Grant McDermott’s slides: https://raw.githack.com/uo-ec607/lectures/master/05-tidyverse/05-tidyverse.html
data.table
For more information on the data.table
, check out the following links:
- The official
data.table
vignette: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html - Grant McDermott’s slides: https://raw.githack.com/uo-ec607/lectures/master/05-datatable/05-datatable.html
- This neat introduction to
data.table
by atrebas: https://atrebas.github.io/post/2020-06-17-datatable-introduction/
ggplot2
We only just started with ggplot2
, but if you want to know more already, check out the following links:
- Kieran Healy’s “Data Visualization — A practical introduction”, especially the chapter “3 Make a plot.”
- The chapter on data visualization in “R for Data Science”: https://r4ds.had.co.nz/data-visualisation.html