This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
R code goes in code chunks, denoted by three backticks. Executing code chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Crtl+Shift+Enter (Windows) or Cmd+Shift+Enter (Mac).
Along the workshop, I will try to share as many best practices as possible. These will be denoted by an fyi box like this:
This is a best practice in R coding (or at least i think it is)
This is an exercise for you to complete
This is a is a hint for the exercise
The first chunk in an R Notebook is usually titled “setup,” and by convention includes the R packages you want to load. Remember, in order to use an R package you have to run some library()
code every session. Execute these lines of code to load the packages.
To install a package, you have to type install.packages("packagename")
.
I’ve actually set the packages up in a way that if the package is not installeed yet if(!require(packagename))
(!require()
means package does not exist), it will automatically install it install.packages("packagename")
, and run load the package library(packagename)
.
Here is an example: if (!require(knitr)) install.packages("knitr"); library(knitr)
.
if (!require(knitr)) install.packages("knitr"); library(knitr)
## Loading required package: knitr
knitr::opts_chunk$set(comment = "#", warning = FALSE, message = FALSE)
if (!require(kableExtra)) install.packages("kableExtra"); library(kableExtra)
if (!require(tidyverse)) install.packages("tidyverse") ; library(tidyverse)
if (!require(ggrepel)) install.packages("ggrepel") ; library(ggrepel)
if (!require(skimr)) install.packages("skimr") ; library(skimr)
We are going to load the sample_data1.csv
dataset located in the datadir. Let’s load the data first.
# Set directories
root_dir <- "../"
data_dir <- file.path(root_dir, "data")
# Set files
sample_csvfile <- "sample_data2.csv"
# Load data
df <- read_csv(file.path(data_dir, sample_csvfile))
Let’s get a rough idea of how our data looks like using R. Use skim()
to see our data
# Type in your code after this line!
skim(df)
Name | df |
Number of rows | 156 |
Number of columns | 10 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 8 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Country or region | 0 | 1 | 4 | 24 | 0 | 156 | 0 |
Continent | 0 | 1 | 4 | 13 | 0 | 6 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Overall rank | 0 | 1 | 78.50 | 45.18 | 1.00 | 39.75 | 78.50 | 117.25 | 156.00 | ▇▇▇▇▇ |
Score | 0 | 1 | 5.41 | 1.11 | 2.85 | 4.54 | 5.38 | 6.18 | 7.77 | ▂▇▇▇▃ |
GDP per capita | 0 | 1 | 0.91 | 0.40 | 0.00 | 0.60 | 0.96 | 1.23 | 1.68 | ▃▅▇▇▃ |
Social support | 0 | 1 | 1.21 | 0.30 | 0.00 | 1.06 | 1.27 | 1.45 | 1.62 | ▁▁▂▆▇ |
Healthy life expectancy | 0 | 1 | 0.73 | 0.24 | 0.00 | 0.55 | 0.79 | 0.88 | 1.14 | ▁▃▃▇▅ |
Freedom to make life choices | 0 | 1 | 0.39 | 0.14 | 0.00 | 0.31 | 0.42 | 0.51 | 0.63 | ▁▃▆▇▆ |
Generosity | 0 | 1 | 0.18 | 0.10 | 0.00 | 0.11 | 0.18 | 0.25 | 0.57 | ▆▇▆▁▁ |
Perceptions of corruption | 0 | 1 | 0.11 | 0.09 | 0.00 | 0.05 | 0.09 | 0.14 | 0.45 | ▇▅▁▁▁ |
GGplot is a powerful data visualization package in R!
Advantages of ggplot2
Understanding the grammar of ggplot2 is fundamental to being able to use it effectively. There are many layers of plotting with ggplot. Ggplot uses independent building blocks and combine them to create just about any kind of graphical display you want. These layers include:
The first thing in ggplot is reference a data to plot. You can do this by using the pipe, or typing it in the code itself.
df %>% ggplot()
is the same as ggplot(data = df)
df %>% ggplot()
Use ggplot(data = df)
to see how the plot looks like.
# Type your code below this line!
ggplot(data = df)
The results are the same!
ggplot(data = df)
puts the ggplot graphing canvas out and loads the data onto ggplot, so you won’t have to call the variable from the data every time.
In ggplot, aesthetics are something you can see. They include:
Each type of geom accepts only a subset of all aesthetics-refer to the geom help pages to see what mappings each geom accepts. Aesthetic mappings are set with the aes()
function.
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
))
Notice that now you can see the X and Y axes. However, did you notice that the data points are missing? It is because these points are geometric objects, or geoms
.
:::puzzle Set the X axis to GDP per capita and y axis to Healthy life expectancy. :::puzzle
*Hint: use `Variable Name`
when there are spaces in your variable names.
# Type your code below this line!
df %>% ggplot(aes(x = `GDP per capita`,
y = `Healthy life expectancy`))
Geometric objects are the actual marks we put on a plot. Examples include:
geom_point
, for scatter plots, dot plots, etc)geom_line
, for time series, trend lines, etc)geom_boxplot
, for, well, boxplots!)A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator
ggplot provides a number of geoms:
geom_abline geom_density_2d geom_linerange geom_rug
geom_area geom_density2d geom_map geom_segment
geom_bar geom_dotplot geom_path geom_sf
geom_bin2d geom_errorbar geom_point geom_sf_label
geom_blank geom_errorbarh geom_pointrange geom_sf_text
geom_boxplot geom_freqpoly geom_polygon geom_smooth
geom_col geom_hex geom_qq geom_spoke
geom_contour geom_histogram geom_qq_line geom_step
geom_count geom_hline geom_quantile geom_text
geom_crossbar geom_jitter geom_raster geom_tile
geom_curve geom_label geom_rect geom_violin
geom_density geom_line geom_ribbon geom_vline
Below are examples of the geoms:
More information about ggplot can be found in the ggplot page
Additional geoms are available in packages like ggbeewsarm
and ggridges
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
)) +
geom_point()
Now we get to see the data points in the plot!
Add the points for your GDP v healthy life expectancy plot
# Type your code below this line!
df %>% ggplot(aes(x = `GDP per capita`,
y = `Healthy life expectancy`)) +
geom_point()
Now, you have just successfully created your first plot using ggplot. Great job!
Let’s make it more neat by adding a line using geom_smooth()
.
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
)) +
geom_point() +
geom_smooth()
Now, we normally want to see a linear trend, so we have to adjust the line to make it linear. We can do this by setting the properties of geom_smooth()
to geom_smooth(method = "lm")
.
:::fyi It is good to check the documentation of ggplot and its functions to learn more. :::fyi
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
)) +
geom_point() +
geom_smooth(method = "lm")
Add the regression line to your GDP v healthy life expectancy plot In addition, remove the confidence interval.
Hint: Check se
in the documentation.
# Type your code below this line!
df %>% ggplot(aes(x = `GDP per capita`,
y = `Healthy life expectancy`)) +
geom_point() +
geom_smooth(method = "lm",
se = FALSE)
Sometimes you might want to add your regression model into the plot. You can do this in ggplot too!
reg_model <- lm(Score ~ Continent + `Social support` + Generosity , data = df) # Fit your regression model
predicted_df <- data.frame(predicted_vals = predict(reg_model, df),
Generosity = df$Generosity)
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
)) +
geom_point() +
geom_smooth(data = predicted_df,
aes(x = Generosity,
y = predicted_vals),
method = "lm",
color = "red")
I want you to try this first. Create a bar plot for Continent vs Score. Use geom_bar(position = "dodge", stat = "identity")
to add it to ggplot.
# Type your code below this line!
df %>% ggplot(aes(x = Continent,
y = Score)) +
geom_bar(position = "dodge", stat = "identity")
Keep this plot in mind! Now, create mean Score for each Continent.
Hint: You need to use group_by() %>% summarize()
to create the mean score.
# Type your code below this line!
df %>% group_by(Continent) %>% summarize(Score = mean(Score))
Do you notice the difference between the scores on the plot and the mean scores? They are different!
This is one problem with ggplot when you want to plot bar charts.
It is advisable to summarize the data using group_by
and summarize
first before plotting if you are doing group plots.
Now, create a mean score data called ContinentScore
and create the bar chart. Plot the ContinentScore on the y-axis and Continent on the x-axis
# Type your code below this line!
df %>% group_by(Continent) %>% summarize(ContinentScore = mean(Score)) %>%
ggplot(aes(x = Continent, y = ContinentScore)) +
geom_bar(position = "dodge", stat = "identity")
Now this is the correct bar graph!
There are various types of box plots, including the standard box plots, violin plots, and jittered dot plots (beeswarm
package).
Let’s explore these plots!
For box plot:
df %>%
ggplot(aes(x = Continent, y = `GDP per capita`)) +
geom_boxplot() +
geom_point()
With box plots, you cannot see the distribution, even when you add the points on top of the box plots.
Let’s try to jitter the points using geom_jitter()
.
df %>%
ggplot(aes(x = Continent, y = `GDP per capita`)) +
geom_boxplot() +
geom_jitter()
You still don’t really see the sample distribution with this.
Hence we can use violin plots to see the means and distribution at the same time.
For violin plot:
df %>%
ggplot(aes(x = Continent, y = `GDP per capita`)) +
geom_violin()
For a nice dotted box plot, you can use the beeswarm
package.
if (!require(ggbeeswarm)) install.packages("ggbeeswarm"); library(ggbeeswarm)
df %>%
ggplot(aes(x = Continent, y = `GDP per capita`)) +
geom_quasirandom()
The shape looks a lot like the violin plot! Let’s stack the geoms to see how it looks!
df %>%
ggplot(aes(x = Continent, y = `GDP per capita`)) +
geom_violin() +
geom_quasirandom()
Note that the order of aesthetics and geometrics affect the layout of your plot!
df %>%
ggplot(aes(x = Continent, y = `GDP per capita`)) +
geom_quasirandom() +
geom_violin()
Now we only have a basic plot. What if you want to look at where the groups are in the dimensional space? We can add more aesthetics!
We can add color to different categories in the plot inside the aesthetics.
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
)) +
geom_point(aes(color = Continent)) +
geom_smooth(method = "lm")
If we want the overall of the points to change, set the color outside aesthetics.
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
)) +
geom_point(color = "Red") + # You can only set Continent to colors inside aesthetics
geom_smooth(method = "lm")
Set the points to colors of different amounts of social support to your GDP v healthy life expectancy plot.
# Type your code after this line!
df %>% ggplot(aes(x = `GDP per capita`,
y = `Healthy life expectancy`)) +
geom_point(aes(color = `Social support`)) +
geom_smooth(method = "lm",
se = FALSE)
Now the points look a little small to the eyes. We can increase the size using geom_*point*(size = N)
where N is the size in integer.
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
)) +
geom_point(aes(color = Continent),
size = 6) + # You can only set Continent to colors inside aesthetics
geom_smooth(method = "lm",
size = 3,
color = "black")
Try to play around with the size of the grouped bar plot.
size
, use width
.# Type your code after this line!
df %>% group_by(Continent) %>% summarize(ContinentScore = mean(Score)) %>%
ggplot(aes(x = Continent, y = ContinentScore)) +
geom_bar(aes(color = Continent),
position = "dodge",
stat = "identity",
width = 0.5)
With bar charts, you have to use fill
to fill the bars. Remember that colors are for the outside colors (borders) and fill are for the inside colors.
df %>% group_by(Continent) %>% summarize(GDP = mean(`GDP per capita`)) %>%
ggplot(aes(x = Continent, y = GDP)) +
geom_bar(aes(fill = Continent),
color = "black",
stat = "identity",
width = 0.3)
When you use size in geom_bar
, you are changing the size of the border.
df %>% group_by(Continent) %>% summarize(GDP = mean(`GDP per capita`)) %>%
ggplot(aes(x = Continent, y = GDP)) +
geom_bar(aes(fill = Continent),
color = "black",
stat = "identity",
size = 2,
width = 0.3)
Changing the shapes of the points can be a good way to identify different groups. There are many shapes in ggplot, which can be identified by shape
or pch
if used in aesthetics.
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
)) +
geom_point(aes(fill = Continent),
size = 6,
shape = 23) + # See how the coloring has changed!
geom_smooth(method = "lm",
size = 3,
color = "black")
Now the points has changed from circles to diamonds. But only the outside color remains, while the inner color is gone. Can you guess why?
With GDP vs Healthy life expectancy, add different shapes to continents, and retain the colors of each Continent.
Hint: Remember that ordering matters
# Type your code after this line!
df %>% ggplot(aes(x = `GDP per capita`,
y = `Healthy life expectancy`)) +
geom_point(aes(color = Continent,
pch = Continent)) +
geom_smooth(method = "lm",
se = FALSE)
Sometimes you want to add text to your data! Well, ggplot can do it too, using geom_text
or geom_text_repel
.
if (!require(ggrepel)) install.packages("ggrepel") ; library(ggrepel)
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
"Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
"India", "Italy", "China", "South Africa", "Ukraine",
"Botswana", "Costa Rica", "Bhutan", "Rwanda", "France",
"United States", "Germany", "United Kingdom", "Barbados", "Norway", "Japan",
"New Zealand", "Singapore")
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
)) +
geom_point(aes(fill = Continent), # I used fil because i set the shape to transparent circle
shape = 21, # Shape set to transparent circle
size = 6) +
geom_smooth(method = "lm",
size = 1) +
geom_text_repel(aes(label = `Country or region`),
color = "gray20",
data = filter(df, `Country or region` %in% pointsToLabel),
force = 10)
When you want to plot in separate groups, you can use facets. Facetting facilitates comparison among plots, not just of geoms within a plot. ggplot2
offers two functions for creating smaller graphs:
facet_wrap()
: define subsets as the levels of a single grouping variablefacet_grid()
: define subsets as the crossing of two grouping variablesUsing facet_wrap
:
df %>% ggplot(aes(x = Generosity, # Set the X variable
y = Score # Set the Y variable
)) +
geom_point(aes(fill = Continent), # I used fil because i set the shape to transparent circle
shape = 21, # Shape set to transparent circle
size = 6) +
geom_smooth(aes(color = Continent,
fill = Continent), # Color the CI also
method = "lm",
size = 1) +
geom_text_repel(aes(label = `Country or region`),
color = "gray20",
data = filter(df, `Country or region` %in% pointsToLabel),
force = 10) +
facet_wrap(. ~ Continent, ncol = 3, nrow = 2)
Use facet_wrap()
to facet separate your GDP vs Healthy life expectancy graph by continent (2 rows and 3 columns)
# Type your code after this line!
df %>% ggplot(aes(x = `GDP per capita`,
y = `Healthy life expectancy`)) +
geom_point(aes(color = Continent)) +
geom_smooth(method = "lm",
se = FALSE) +
facet_wrap(.~Continent, ncol = 2, nrow = 3)
We can also do the same with bar chart. Let’s experiment a little.
We will create high vs low Happiness groups using cuts. Then we will gather the happiness factors into long data format. Then we have to create summaries of the data.
df %>% mutate(Category = cut(Score, # Use cut to break cotinuous score to categories
breaks = c(0, 4.5, 9),
labels = c("Low","High"))) %>%
gather(key = HappinessFactors, value = Score,
`GDP per capita`:`Perceptions of corruption`) %>% # Use : to gather consecutive
group_by(Category, Continent, HappinessFactors) %>% # Group and summarize
summarize(meanScore = mean(Score)) %>%
rename(Score = meanScore)
Now you can pipe them to facet_grid using facet_grid(Category~Continents)
df %>% mutate(Category = cut(Score, # Use cut to break continuous score to categories
breaks = c(0, 4.5, 9),
labels = c("Low","High"))) %>%
gather(key = HappinessFactors, value = Score,
`GDP per capita`:`Perceptions of corruption`) %>% # Use : to gather consecutive columns
group_by(Category, Continent, HappinessFactors) %>% # Group and summarize
summarize(meanScore = mean(Score)) %>%
rename(Score = meanScore) %>%
ggplot(aes(x = HappinessFactors, y = Score)) +
geom_bar(aes(fill = HappinessFactors),
stat = "identity") +
facet_grid(Continent ~ Category)
These are the steps to create the bar graph:
Hint: Use min(.$
Healthy life expectancy) - 1
, mean(.$
Healthy life expectancy)
, and max(.$
Healthy life expectancy) + 1
to create the breaks
# Type your code after this line!
df %>% mutate(Category = cut(`Healthy life expectancy`, # Use cut to break continuous score to categories
breaks = c(min(.$`Healthy life expectancy`) - 1,
mean(.$`Healthy life expectancy`),
max(.$`Healthy life expectancy`) + 1),
labels = c("Low","High"))) %>%
gather(key = HappinessFactors,
value = Score,
`Social support`, Generosity, `Freedom to make life choices`,
) %>% # Use : to gather consecutive columns
group_by(Category, Continent, HappinessFactors) %>% # Group and summarize
summarize(Score = mean(Score)) %>%
ggplot(aes(x = HappinessFactors, y = Score)) +
geom_bar(aes(fill = HappinessFactors),
stat = "identity") +
facet_grid(Continent ~ Category)
All geoms use a statistical transformation (stat) to convert raw data to the values to be mapped to the objects features.
The available stats are
stat_bin stat_density2d stat_smooth
stat_bin_2d stat_ecdf stat_spoke
stat_bin_hex stat_ellipse stat_sum
stat_bin2d stat_function stat_summary
stat_binhex stat_identity stat_summary_2d
stat_boxplot stat_qq stat_summary_bin
stat_contour stat_qq_line stat_summary_hex
stat_count stat_quantile stat_summary2d
stat_density stat_sf stat_unique
stat_density_2d stat_sf_coordinates stat_ydensity
Each geom has a default stat, and each stat has a default geom.
stat_identity
.stat_count
.Coordinate system functions include
coord_cartesian coord_flip coord_polar coord_trans
coord_equal coord_map coord_quickmap
coord_fixed coord_munch coord_sf
Let’s use coord_flip to flip the axes.
df %>% mutate(Category = cut(Score, # Use cut to break continuous score to categories
breaks = c(0, 4.5, 9),
labels = c("Low","High"))) %>%
gather(key = HappinessFactors, value = Score,
`GDP per capita`:`Perceptions of corruption`) %>% # Use : to gather consecutive columns
group_by(Category, Continent, HappinessFactors) %>% # Group and summarize
summarize(meanScore = mean(Score)) %>%
rename(Score = meanScore) %>%
ggplot(aes(x = HappinessFactors, y = Score)) +
geom_bar(aes(fill = HappinessFactors),
stat = "identity") +
facet_grid(Continent ~ Category) +
coord_flip()
To add a title to your plot, set the title with ggtitle("Title goes here")
.
df %>% mutate(Category = cut(Score, # Use cut to break continuous score to categories
breaks = c(0, 4.5, 9),
labels = c("Low","High"))) %>%
gather(key = HappinessFactors, value = Score,
`GDP per capita`:`Perceptions of corruption`) %>% # Use : to gather consecutive columns
group_by(Category, Continent, HappinessFactors) %>% # Group and summarize
summarize(meanScore = mean(Score)) %>%
rename(Score = meanScore) %>%
ggplot(aes(x = HappinessFactors, y = Score)) +
geom_bar(aes(fill = HappinessFactors),
stat = "identity") +
facet_grid(Continent ~ Category) +
coord_flip() +
ggtitle("Figure 1. Happiness Factors vs Happiness Score")
Another neat trick is to have your plot ready as an object, and then add further ggplot components to it.
fig1 <- df %>% mutate(Category = cut(Score, # Use cut to break continuous score to categories
breaks = c(0, 4.5, 9),
labels = c("Low","High"))) %>%
gather(key = HappinessFactors, value = Score,
`GDP per capita`:`Perceptions of corruption`) %>% # Use : to gather consecutive columns
group_by(Category, Continent, HappinessFactors) %>% # Group and summarize
summarize(meanScore = mean(Score)) %>%
rename(Score = meanScore) %>%
ggplot(aes(x = HappinessFactors, y = Score)) +
geom_bar(aes(fill = HappinessFactors),
stat = "identity") +
facet_grid(Continent ~ Category) +
coord_flip()
fig1 + ggtitle("Figure 1. Happiness Factors vs Happiness Score")
ggtitle()
to fig2.# Type your code after this line!
fig2 <- df %>% ggplot(aes(x = `GDP per capita`,
y = `Healthy life expectancy`)) +
geom_point(aes(color = Continent)) +
geom_smooth(method = "lm",
se = FALSE) +
facet_wrap(.~Continent, ncol = 2, nrow = 3)
fig2 + ggtitle("Figure 2. GDP vs Healthy life expectancy")
labs()
To change the labels of your axes, you can use labs(x = "", y = "")
. Note: This also works for your title!! labs(x = "", y = "", title = "")
.
For your GDP vs Healthy life expectancy graph (fig2), set: 1. x axis label as Gross Domestic Product per Capita 2. y axis label as Life Expectancy 3. Title as Figure 2: GDP vs Healthy Life Expectancy
# Type your code after this line!
fig2 + labs(x = "Gross Domestic Product per Capita",
y = "Life Expectancy",
title = "Figure 2: GDP vs Healthy Life Expectancy")
Now, we have come to the final part: Setting the overall theme.
ggplot2 supports the notion of themes for adjusting non-data appearance aspects of a plot, such as
Theme elements can be customized in several ways:
The full documentation of the theme
function lists many customizable elements.
There are some available themes provided by ggplot2
:
theme_bw theme_gray theme_minimal theme_void
theme_classic theme_grey theme_dark theme_light
fig1 + ggtitle("Figure 1. Happiness Factors vs Happiness Score") +
theme_minimal()
Things we need to fix and change: 1. y-axis label has no space 2. Facet headers are not right 3. Legend has no meaning in this plot 4. Modify x and y axes! 5. I don’t like the colors!
Item 1 is relatively quick fix since we have gone through them earlier! So, let’s do this first!
fig1 + labs(x = "Happiness Factors",
y = "Happiness Score",
title = "Figure 1. Happiness Factors vs Happiness Score") +
theme_minimal()
For item 2, we can rotate Continent so that it becomes horizontal. Furthermore, we can also change the elements in the facet using strip.text.x = element_text()
or strip.text.y = element_text()
.
We can use angle = 0
to set the rotation of the text to horizontal.
fig1 + labs(x = "Happiness Factors",
y = "Happiness Score",
title = "Figure 1. Happiness Factors vs Happiness Score") +
theme_minimal() +
theme(strip.text.x = element_text(size = 10),
strip.text.y = element_text(size = 10, angle = 0))
For item 3, we can remove the legend with legend.position = "none"
.
fig1 + labs(x = "Happiness Factors",
y = "Happiness Score",
title = "Figure 1. Happiness Factors vs Happiness Score") +
theme_minimal() +
theme(strip.text.x = element_text(size = 10, face = "bold"),
strip.text.y = element_text(size = 10, face = "bold", angle = 0),
legend.position = "none")
Item 4 is similar to item 3 that we have to change these components: 1. axis.text.x = element_text()
and axis.text.y = element_text()
for the axis values 2. axis.title.x = element_text()
and axis.title.y = element_text()
for the axis titles
fig1 + labs(x = "Happiness Factors",
y = "Happiness Score",
title = "Figure 1. Happiness Factors vs Happiness Score") +
theme_minimal() +
theme(strip.text.x = element_text(size = 10),
strip.text.y = element_text(size = 10, angle = 0),
# Remove legend
legend.position = "none",
# Axis values
axis.text.x = element_text(size = 8),
axis.text.y = element_text(size = 8),
# Axis title
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold")
)
Changing colors of your plot is always fun, especially if you like UX and UI stuff!! Aesthetically pleasing graphs make reviewers like your results more!
To change the colors of your graph, we first have to look at where was the color implemented: Was it at fill
or color
? This will determine what code we will use to change the colors!
In the bar chart, we used fill = HappinessFactors
for the colors of the bars, hence fill is our target!
Now, to change the colors, we first have to set the colors in an object. Since we have 6 levels, we need at least 6 colors.
To add the code in, we can use scale_fill_manual()
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7")
fig1 + labs(x = "Happiness Factors",
y = "Happiness Score",
title = "Figure 1. Happiness Factors vs Happiness Score") +
theme_minimal() +
theme(strip.text.x = element_text(size = 10),
strip.text.y = element_text(size = 10, angle = 0),
# Remove legend
legend.position = "none",
# Axis values
axis.text.x = element_text(size = 8),
axis.text.y = element_text(size = 8),
# Axis title
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold")
) +
scale_fill_manual(values = cbbPalette)
In general, the code for scaling colors is scale_*_+
where * is the geom you want to modify, for example, fill or color, and + is what the modifier. Here, we are setting the properties manually. Of course you can also try out pre-defined colors:
fig1 + labs(x = "Happiness Factors",
y = "Happiness Score",
title = "Figure 1. Happiness Factors vs Happiness Score") +
theme_minimal() +
theme(strip.text.x = element_text(size = 10),
strip.text.y = element_text(size = 10, angle = 0),
# Remove legend
legend.position = "none",
# Axis values
axis.text.x = element_text(size = 8),
axis.text.y = element_text(size = 8),
# Axis title
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold")
) +
scale_fill_brewer(palette = "Set2")
It is also important to note if you expect people who may be color-blind to read your graphs. Then, you may want to search more color-blind friendly colors!
For your GDP vs Healthy life expectancy graph: 1. use labs()
to set your x, y and title 2. remove the legend 3. set the theme to theme_minimal()
4. Change the text size and bold the axis titles 5. Change the colors of your Continents
# Type your code after this line!
fig2 + labs(x = "Gross Domestic Product per Capita",
y = "Life Expectancy",
title = "Figure 2: GDP vs Healthy Life Expectancy") +
theme_minimal() +
theme(strip.text.x = element_text(size = 10),
strip.text.y = element_text(size = 10, angle = 0),
# Remove legend
legend.position = "none",
# Axis values
axis.text.x = element_text(size = 8),
axis.text.y = element_text(size = 8),
# Axis title
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold")
) +
scale_fill_brewer(palette = "Set2")
There are many resources online for ggplot themes! Here are some of them:
ggpubr
: Useful functions to prepare plots for publication
BBC style
: BBC style theme ggthemes
: Set of extra themes
ggthemr
: More themes
ggsci
: Color palettes for scales
paletteer
: A package of packages of themes
After you have finished plotting, it is important to save your plot!!
ggsave()
is a convenient function for saving a plot. It defaults to saving the last plot that you displayed, using the size of the current graphics device. It also guesses the type of graphics device from the extension.
First we have to set our plot as an object before saving.
fig1_save <- fig1 + labs(x = "Happiness Factors",
y = "Happiness Score",
title = "Figure 1. Happiness Factors vs Happiness Score") +
theme_minimal() +
theme(strip.text.x = element_text(size = 10),
strip.text.y = element_text(size = 10, angle = 0),
# Remove legend
legend.position = "none",
# Axis values
axis.text.x = element_text(size = 8),
axis.text.y = element_text(size = 8),
# Axis title
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold")
) +
scale_fill_brewer(palette = "Set2")
Now that we have set our plot as an object, we can now save it!
ggsave(
filename = "fig1_save.tiff",
plot = fig1_save,
path = "img",
scale = 1,
width = NA,
height = NA,
units = c("in", "cm", "mm"),
dpi = 300
)
You can also save it as a pdf!!
ggsave(
filename = "fig1_save.pdf",
plot = fig1_save,
path = "img",
scale = 1,
width = NA,
height = NA,
units = c("in", "cm", "mm"),
dpi = 300
)
For your GDP vs Healthy life expectancy graph: 1. Set it as an object called fig2_save
. 2. save the plot as a pdf, jpg, and tiff file.
# Type your code below this line!
fig2_save <- fig2 + labs(x = "Gross Domestic Product per Capita",
y = "Life Expectancy",
title = "Figure 2: GDP vs Healthy Life Expectancy") +
theme_minimal() +
theme(strip.text.x = element_text(size = 10),
strip.text.y = element_text(size = 10, angle = 0),
# Remove legend
legend.position = "none",
# Axis values
axis.text.x = element_text(size = 8),
axis.text.y = element_text(size = 8),
# Axis title
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold")
) +
scale_fill_brewer(palette = "Set2")
ggsave(
filename = "fig2_save.pdf",
plot = fig2_save,
path = "img",
scale = 1,
width = NA,
height = NA,
units = c("in", "cm", "mm"),
dpi = 300
)
ggsave(
filename = "fig2_save.jpg",
plot = fig2_save,
path = "img",
scale = 1,
width = NA,
height = NA,
units = c("in", "cm", "mm"),
dpi = 300
)
ggsave(
filename = "fig2_save.tiff",
plot = fig2_save,
path = "img",
scale = 1,
width = NA,
height = NA,
units = c("in", "cm", "mm"),
dpi = 300
)
There are several ggplot extensions available in R. The list is inexhaustible, but here are some of them:
ggbeewsarm
: An alternative strip one-dimensional scatter plot like “stripchart”
ggridges
: Partially overlapping line plots that create the impression of a mountain range
cowplot
: Combining plots
ggrepel
: Advanced text labels including overlap control
ggmap
: Dedicated to mapping
ggraph
: Network graphs
ggiraph
: Converting ggplot2 to interactive graphics
theme()
And the most important of all: Reading the documentation in R to understand more about the code or function.
You can use this code template to make thousands of graphs with ggplot2.
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(
mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>
) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION> +
<LABS>
<THEME>