Projects include Neurofeedback, Neuroscience of Meditaition, cognitive flexibility, Neurodegeneration -- Other projects - data visualization, UX/UI, software development for neuroscientists and psychologists - Programming education --- <center> <img src="img/r_first_then.png" width="500"/> </center> --- # Workshop aims Introduce the main components of the Tidyverse - readr (read files) - dplyr, tidyr (manipulate data) - ggplot2 (make awesome graphs) I have to assume you have a basic knowledge of R We don't really have time to cover all of the tidyverse (it is a huge universe!) --- class: inverse, middle, center # Day I: Data wrangling with tidyverse --- background-image: url(img/messymeme.jpeg) background-position: 50% 50% background-size: cover class: center, bottom, inverse --- background-image: url(img/tidyversepackages.jpg) background-position: 50% 50% background-size: cover class: center, bottom, inverse --- # The tidyverse workflow!! ![](img/workflow.jpg) --- # Functions we will cover today -- .pull-left[ - read_csv() - skim() - filter() - select() ] -- .pull-right[ - arrange() - mutate() - group_by() %>% summarize() ] -- There are more functions in the tidyverse package, but this should be enough to get you going with data analysis! If you have not done so already, please install all the packages in the tidyverse by running `install.packages("tidyverse")` in R Studio. We will then load the package using `library(tidyverse)`. --- # Importing data -- With the `readr`, `haven`, `readxl` packages, we can load various type of data -- typical usage: `read_*()` where * can be csv, excel, spss -- ```r library(tidyverse) # Load data df_csv <- read_csv("./../data/sample_data1.csv") *df_spss <- haven::read_spss("./../data/sample_data3.sav") df_excel <- readxl::read_excel("./../data/sample_data3_datadictionary.xlsx") ``` -- *package*`::`*function* calls out a function from a package. -- ```r df_spss <- haven::read_spss("./../data/sample_data3.sav") ``` is the same as: -- ```r library(haven) # Load data df_spss <- read_spss("./../data/sample_data3.sav") ``` --- # Understanding your data
--- # Selecting variables: **`select()`** -- We use `select()` to select certain variables/columns to work with (your data may be huge). -- .pull-left[ ```r df_csv %>% select(ID, Dx, Sex) %>% head() ``` ``` # # A tibble: 6 x 3 # ID Dx Sex # <dbl> <chr> <chr> # 1 1 nfvPPA Male # 2 2 bvFTD Male # 3 3 PSP Male # 4 4 PSP Male # 5 5 bvFTD Male # 6 6 svPPA Male ``` ] -- .pull-right[ ![](img/fx_select.JPG) ] --- # Filtering variables: **`filter()`** -- We use `filter()` to remove or select rows depending on their values. -- .pull-left[ ```r df_csv %>% select(ID, Dx, Sex) %>% filter(Dx == "CONTROL" & Sex == "Male") ``` ``` # # A tibble: 13 x 3 # ID Dx Sex # <dbl> <chr> <chr> # 1 20 CONTROL Male # 2 22 CONTROL Male # 3 24 CONTROL Male # 4 37 CONTROL Male # 5 38 CONTROL Male # 6 44 CONTROL Male # 7 50 CONTROL Male # 8 56 CONTROL Male # 9 62 CONTROL Male # 10 91 CONTROL Male # 11 144 CONTROL Male # 12 179 CONTROL Male # 13 197 CONTROL Male ``` ] -- .pull-right[ ![](img/fx_filter.JPG) ] --- # Arranging variables: **`arrange()`** -- We use `arrange()` to changes the order of our data. -- .pull-left[ ```r df_csv %>% select(ID, Dx, Sex, MMSE) %>% filter(Dx == "CONTROL") %>% arrange(desc(MMSE)) ``` ``` # # A tibble: 42 x 4 # ID Dx Sex MMSE # <dbl> <chr> <chr> <dbl> # 1 20 CONTROL Male 30 # 2 51 CONTROL Female 30 # 3 60 CONTROL Female 30 # 4 98 CONTROL Female 30 # 5 101 CONTROL Female 30 # 6 197 CONTROL Male 30 # 7 17 CONTROL Female 29 # 8 19 CONTROL Female 29 # 9 38 CONTROL Male 29 # 10 48 CONTROL Female 29 # # ... with 32 more rows ``` ] -- .pull-right[ ![](img/fx_arrange.JPG) ] --- # Creating variables: **`mutate()`** -- The job of `mutate()` is to add new columns that are functions of existing columns. -- .pull-left[ ```r df_csv %>% filter(Dx == "CONTROL") %>% select(ID, q1:q6) %>% mutate(q_total = q1 + q2 + q3 + q4 + q5 + q6) ``` ``` # # A tibble: 42 x 8 # ID q1 q2 q3 q4 q5 q6 q_total # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 16 3 2 1 4 1 3 14 # 2 17 2 2 2 3 5 2 16 # 3 19 4 5 5 2 1 2 19 # 4 20 4 1 5 2 3 2 17 # 5 21 3 1 3 5 1 5 18 # 6 22 3 2 2 3 1 3 14 # 7 23 3 5 2 2 4 3 19 # 8 24 4 2 4 1 4 5 20 # 9 25 4 4 5 1 3 5 22 # 10 37 1 2 2 3 4 2 14 # # ... with 32 more rows ``` ] -- .pull-right[ ![](img/fx_mutate.JPG) ] --- # Renaming variables: **`rename()`** -- The job of `rename()` is to add rename existing columns. -- .pull-left[ ```r df_csv %>% filter(Dx == "CONTROL") %>% select(ID, q1:q6) %>% rename(WB_1 = q1, WN_2 = q2) ``` ``` # # A tibble: 42 x 7 # ID WB_1 WN_2 q3 q4 q5 q6 # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 16 3 2 1 4 1 3 # 2 17 2 2 2 3 5 2 # 3 19 4 5 5 2 1 2 # 4 20 4 1 5 2 3 2 # 5 21 3 1 3 5 1 5 # 6 22 3 2 2 3 1 3 # 7 23 3 5 2 2 4 3 # 8 24 4 2 4 1 4 5 # 9 25 4 4 5 1 3 5 # 10 37 1 2 2 3 4 2 # # ... with 32 more rows ``` ] -- .pull-right[ ![](img/fx_rename.JPG) ] --- # Grouping variables and summarizing data: **`group_by()`** and **`summarize()`** -- These two are usually a couple since we want to use them to create group summaries. -- .pull-left[ ```r df_csv %>% group_by(Dx, Sex) %>% summarize(Edu_mean = mean(Education), MMSE_mean = mean(MMSE)) %>% rename(Education = Edu_mean, MMSE = MMSE_mean) ``` ``` # # A tibble: 12 x 4 # # Groups: Dx [6] # Dx Sex Education MMSE # <chr> <chr> <dbl> <dbl> # 1 AD Female 17.0 19.0 # 2 AD Male 17.7 19.6 # 3 bvFTD Female 17.5 21.5 # 4 bvFTD Male 17.5 21.3 # 5 CONTROL Female 16.3 27.1 # 6 CONTROL Male 16.4 27.9 # 7 nfvPPA Female 17.4 19.8 # 8 nfvPPA Male 16.9 19.1 # 9 PSP Female 16.7 19.6 # 10 PSP Male 17 18 # 11 svPPA Female 17.8 20.2 # 12 svPPA Male 16.9 15.8 ``` ] -- .pull-right[ ![](img/fx_group_summarize.JPG) ] --- # Transforming between wide and long data: **`gather()`** -- `Gather()` turns wide data into long format -- .pull-left[ ```r df_csv %>% filter(Dx %in% c("AD", "bvFTD", "svPPA")) %>% group_by(Dx) %>% summarize(Edu_mean = mean(Education), MMSE_mean = mean(MMSE)) ``` ``` # # A tibble: 3 x 3 # Dx Edu_mean MMSE_mean # <chr> <dbl> <dbl> # 1 AD 17.3 19.2 # 2 bvFTD 17.5 21.4 # 3 svPPA 17.4 18.2 ``` <img src="img/fx_gather.JPG" width="300"> ] -- .pull-right[ ```r df_csv %>% filter(Dx %in% c("AD", "bvFTD", "svPPA")) %>% group_by(Dx) %>% summarize(Edu_mean = mean(Education), MMSE_mean = mean(MMSE)) %>% gather(key = "Cog", value = "Score", Edu_mean, MMSE_mean) ``` ``` # # A tibble: 6 x 3 # Dx Cog Score # <chr> <chr> <dbl> # 1 AD Edu_mean 17.3 # 2 bvFTD Edu_mean 17.5 # 3 svPPA Edu_mean 17.4 # 4 AD MMSE_mean 19.2 # 5 bvFTD MMSE_mean 21.4 # 6 svPPA MMSE_mean 18.2 ``` ] --- # Transforming between wide and long data: **`spread()`** -- `Spread()` turns long format into wide format -- .pull-left[ ```r df_csv %>% filter(Dx %in% c("AD", "bvFTD", "svPPA")) %>% group_by(Dx) %>% summarize(Edu_mean = mean(Education), MMSE_mean = mean(MMSE)) %>% gather(key = "Cog", value = "Score", Edu_mean, MMSE_mean) ``` ``` # # A tibble: 6 x 3 # Dx Cog Score # <chr> <chr> <dbl> # 1 AD Edu_mean 17.3 # 2 bvFTD Edu_mean 17.5 # 3 svPPA Edu_mean 17.4 # 4 AD MMSE_mean 19.2 # 5 bvFTD MMSE_mean 21.4 # 6 svPPA MMSE_mean 18.2 ``` ] -- .pull-right[ ```r df_csv %>% filter(Dx %in% c("AD", "bvFTD", "svPPA")) %>% group_by(Dx) %>% summarize(Edu_mean = mean(Education), MMSE_mean = mean(MMSE)) %>% gather(key = "Cog", value = "Score", Edu_mean, MMSE_mean) %>% spread(key = "Cog", value = "Score") ``` ``` # # A tibble: 3 x 3 # Dx Edu_mean MMSE_mean # <chr> <dbl> <dbl> # 1 AD 17.3 19.2 # 2 bvFTD 17.5 21.4 # 3 svPPA 17.4 18.2 ``` <img src="img/fx_spread.JPG" width="300"> ] --- class: center, middle # Thanks! # Now let's get hands-on! 