This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
R code goes in code chunks, denoted by three backticks. Executing code chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Crtl+Shift+Enter (Windows) or Cmd+Shift+Enter (Mac).
Along the workshop, I will try to share as many best practices as possible. These will be denoted by an fyi box like this:
This is a best practice in R coding (or at least i think it is)
This is an exercise for you to complete
This is a is a hint for the exercise
The first chunk in an R Notebook is usually titled “setup,” and by convention includes the R packages you want to load. Remember, in order to use an R package you have to run some library()
code every session. Execute these lines of code to load the packages.
To install a package, you have to type install.packages("packagename")
.
I’ve actually set the packages up in a way that if the package is not installeed yet if(!require(packagename))
(!require()
means package does not exist), it will automatically install it install.packages("packagename")
, and run load the package library(packagename)
.
Here is an example: if (!require(knitr)) install.packages("knitr"); library(knitr)
.
if (!require(knitr)){install.packages("knitr")}; library(knitr)
## Loading required package: knitr
## Warning: package 'knitr' was built under R version 4.0.5
knitr::opts_chunk$set(comment = "#", warning = FALSE, message = FALSE)
if (!require(tidyverse)){install.packages("tidyverse")}; library(tidyverse)
if (!require(skimr)){install.packages("skimr")}; library(skimr)
if (!require(ggrepel)){install.packages("ggrepel")}; library(ggrepel)
if (!require(visNetwork)){install.packages("visNetwork")}; library(visNetwork)
if (!require(ggpubr)){install.packages("ggpubr")}; library(ggpubr)
if (!require(viridis)){install.packages("viridis")}; library(viridis)
if (!require(gghighlight)){install.packages("gghighlight")}; library(gghighlight)
We are going to load the pokemon.csv
dataset located in the data directory. Let’s load the data first.
# Set directories
dirs <- list()
dirs$root <- "../"
dirs$data <- "data"
dirs$output <- "output"
# Set files
csvfile <- "pokemon.csv"
# Load data
filter <- c("Grass", "Fire", "Water", "Psychic", "Normal", "Dragon", "Flying")
df <- read_csv(file.path(dirs$root, dirs$data, csvfile))
# Filter only seven types of pokemon
df2 <- df %>% filter(type_1 %in% filter)
Changing colors of your plot is always fun, especially if you like UI stuff!! Aesthetically pleasing graphs make reviewers like your results more!
To change the colors of your graph, we first have to look at where was the color implemented: Was it at fill
or color
? This will determine what code we will use to change the colors!
Let’s use our Weight bar chart for pokemon type as an example here:
gg.wtbar <- df %>% group_by(type_1) %>%
summarize(weight_kg = mean(weight_kg, na.rm = TRUE)) %>%
ggplot(aes(x=type_1, y=weight_kg)) +
geom_bar(aes(fill=type_1),
color="black", # Color adds border
stat="identity", position="dodge") +
labs(x="Pokemon Type",
y = "Weight (kg)",
title= "Weight of pokemon by type",
fill = "Type")
gg.wtbar
In the bar chart, we used fill = type_1
for the colors of the bars, hence fill is our target!
Now, to change the colors, we first have to set the colors in an object. Since we have 18 types of pokemon, we need 18 colors.
To add the code in, we can use scale_fill_manual()
The <name>
= “
poke_colors <- c(`Normal` = "#A8A77A", `Fire` = "#EE8130", `Water` = "#6390F0",
`Electric` = "#F7D02C", `Grass` = "#7AC74C", `Ice` = "#96D9D6",
`Fighting` = "#C22E28", `Poison` = "#A33EA1", `Ground` = "#E2BF65",
`Flying` = "#A98FF3", `Psychic` = "#F95587", `Bug` = "#A6B91A",
`Rock` = "#B6A136", `Ghost` = "#735797", `Dragon` = "#6F35FC",
`Dark` = "#705746", `Steel` = "#B7B7CE", `Fairy` = "#D685AD")
random_colors <- c("#A8A77A", "#EE8130", "#6390F0", "#F7D02C", "#7AC74C", "#96D9D6",
"#C22E28", "#A33EA1", "#E2BF65", "#A98FF3", "#F95587", "#A6B91A",
"#B6A136", "#735797", "#6F35FC", "#705746", "#B7B7CE", "#D685AD" )
gg.wtbar + scale_fill_manual(values = poke_colors)
gg.wtbar + scale_fill_manual(values = random_colors)
If colors of each group level matters, then you might want to parse a label to the color like I did above in poke_colors
.
In general, the code for scaling colors is scale_*_+
where * is the geom you want to modify, for example, fill or color, and + is what the modifier. Here, we are setting the properties manually. Of course you can also try out pre-defined colors:
gg.wtbar + scale_fill_brewer(palette = "Set2")
It is also important to note if you expect people who may be color-blind to read your graphs. Then, you may want to search more color-blind friendly colors!
status
) to these four colors: #3B27BA, #E847AE, #13CA91, and #FF9472legendcolor
for the colorsgroup_by(status) %>% summarize(speed=mean(speed))
to create the meansgeom_bar(aes(), stat="identity", position="dodge")
to create bar chartlabs(x ="", y="", title="", fill="")
to change labelsscale_fill_manual(values=legendcolor)
# Type your code after this line!
legendcolor <- c("#3B27BA", "#E847AE", "#13CA91", "#FF9472")
gg.spdbar <- df %>% group_by(status) %>%
summarize(speed = mean(speed)) %>%
ggplot(aes(x=status, y=speed)) +
geom_bar(aes(fill=status), position="dodge", stat="identity") +
labs(x ="Legendary Status", y="Speed",
title="Speed of pokemon by legendary status", fill="") +
scale_fill_manual(values=legendcolor)
gg.spdbar
There are several statistics we can add to our plot, but today we are going to focus on error bars. Error bars inform you about the variation in your data. You could add standard deviations or standard errors.
Before adding error bars, we need to create these summary statistics first. We can use mean()
, median()
, sd()
for means, medians, and standard deviation. There is no function for standard error, but you can easily create one using a function like this:
se <- function(x){sd(x)/sqrt(length(x))}
To create means and SD, we can pipe our data to group_by() %>% summarize()
to group the scores by some variable, and summarize the scores for each group.
df.atk <- df %>% group_by(type_1) %>%
summarize(atk_mean = mean(attack, na.rm = TRUE),
atk_median = median(attack, na.rm = TRUE),
atk_sd = sd(attack, na.rm = TRUE))
df.atk
Now that we have created our summary statistics, we can start plotting!
gg.atk <- df.atk %>% ggplot(aes(x=type_1,y=atk_mean, fill=type_1)) +
geom_bar(stat="identity", position="dodge") +
coord_flip() +
scale_fill_manual(values = poke_colors)
gg.atk + geom_errorbar(aes(ymin=atk_mean-atk_sd,
ymax=atk_mean+atk_sd)) +
labs(title = "Bar plot with error bars")
There are different variants of error bars you can create!
gg.atk + geom_linerange(aes(ymin=atk_mean-atk_sd,
ymax=atk_mean+atk_sd)) +
labs(title = "Bar plot with line range")
gg.atk + geom_crossbar(aes(ymin=atk_mean-atk_sd,
ymax=atk_mean+atk_sd)) +
labs(title = "Bar plot with crossbar")
gg.atk + geom_pointrange(aes(ymin=atk_mean-atk_sd,
ymax=atk_mean+atk_sd)) +
labs(title = "Bar plot with point-range")
We can also plot half of the error bars, by specifying ymin
or ymax
as the mean.
gg.atk + geom_errorbar(aes(ymin=atk_mean,
ymax=atk_mean+atk_sd)) +
labs(title = "Bar plot with half the error bars")
See that the lower range is still visible? This is because of the layering issue ggplot has. We need to plot geom_errorbar()
before plotting geom_bar()
so that the bar plot will overlay the error bar.
df.atk %>% ggplot(aes(x=type_1,y=atk_mean, fill=type_1)) +
geom_errorbar(aes(ymin=atk_mean-atk_sd,
ymax=atk_mean+atk_sd)) +
geom_bar(stat="identity", position="dodge") +
coord_flip() +
scale_fill_manual(values = poke_colors) +
labs(title = "Bar plot with upper error bars")
Another way to go about this is to add border to your bars to mask the lower error bars:
df.atk %>% ggplot(aes(x=type_1,y=atk_mean, fill=type_1)) +
geom_bar(stat="identity", position="dodge", color="black") +
geom_errorbar(aes(ymin=atk_mean,
ymax=atk_mean+atk_sd)) +
coord_flip() +
scale_fill_manual(values = poke_colors) +
labs(title = "Bar plot with upper error bars")
Hint: Try geom_errorbar(width=0)
.
# Type your code after this line
df.spd <- df %>% group_by(status) %>%
summarize(speed_mean = mean(speed),
speed_sd = sd(speed))
df.spd %>% ggplot(aes(x=status, y=speed_mean)) +
geom_bar(aes(fill=status), stat="identity", position="dodge", color="black") +
geom_errorbar(aes(ymin=speed_mean-speed_sd,
ymax=speed_mean+speed_sd),
width=0)
##
df.speed <- df %>% group_by(status) %>%
summarize(speed_mean = mean(speed, na.rm = TRUE),
speed_median = median(speed, na.rm = TRUE),
speed_sd = sd(speed, na.rm = TRUE))
df.speed %>% ggplot(aes(x=status, y=speed_mean)) +
geom_bar(aes(fill=status), stat="identity", position="dodge", color="black") +
geom_crossbar(aes(ymin=speed_mean-speed_sd,
ymax=speed_mean+speed_sd)) +
labs(title = "Pokemon Speed by Status (with Crossbar")
We can also add text to the plots! Text geoms are useful for labeling plots. They can be used by themselves as scatterplots or in combination with other geoms, for example, for labeling points or for annotating the height of bars. geom_text()
adds only text to the plot. geom_label()
draws a rectangle behind the text, making it easier to read.
geom_text()
and geom_label()
take in the same aesthetics as any other geoms (x, y, color, fill, etc.). However, for the text to show, you need to supply it with label=??
. The label argument is where you put the text you want to show.
df %>% group_by(generation) %>%
summarize(atk_mean = mean(attack, na.rm = TRUE),
atk_median = median(attack, na.rm = TRUE),
atk_sd = sd(attack, na.rm = TRUE)) %>%
ggplot(aes(x=generation, y=atk_mean, fill=factor(generation))) +
geom_bar(stat="identity", position="dodge") +
geom_text(aes(y=atk_mean+5, # Remember to leave some gap because the y-argument is the y-coordinates where the text will be displayed.
label=round(atk_mean,0))) # Here, we show the mean attack on top of the bar for each generation.
geom_text()
or geom_label()
)Hint: Use geom_text(aes(y= atk_mean + atk_sd + 5))
to set the y-coordinates of the text above the error bar
# Type your code below this line
df.spd <- df %>% group_by(status) %>%
summarize(speed_mean = mean(speed),
speed_sd = sd(speed))
df.spd %>% ggplot(aes(x=status, y=speed_mean)) +
geom_bar(aes(fill=status), stat="identity", position="dodge", color="black") +
geom_errorbar(aes(ymin=speed_mean-speed_sd,
ymax=speed_mean+speed_sd),
width=0) +
geom_label(aes(y= speed_mean + speed_sd + 10, fill=status,
label=round(speed_mean, 2)))
Now, we have come to the most artistic part of the workshop: Setting the theme.
ggplot2 supports the notion of themes for adjusting non-data appearance aspects of a plot, such as
Theme elements can be customized in several ways:
The full documentation of the theme
function lists many customizable elements.
There are some available themes provided by ggplot2
:
theme_bw theme_gray theme_minimal theme_void
theme_classic theme_grey theme_dark theme_light
Here is how each theme looks like:
gg.basic <- df %>% ggplot(aes(x=attack, y=defense)) + geom_point() # basic scatterplot
gg.bw <- gg.basic + theme_bw() + labs(title= "Theme bw")
gg.gray <- gg.basic + theme_gray() + labs(title= "Theme gray")
gg.minimal <- gg.basic + theme_minimal() + labs(title= "Theme minimal")
gg.void <- gg.basic + theme_void() + labs(title= "Theme void")
gg.classic <- gg.basic + theme_classic() + labs(title= "Theme classic")
gg.grey <- gg.basic + theme_grey() + labs(title= "Theme grey")
gg.dark <- gg.basic + theme_dark() + labs(title= "Theme dark")
gg.light <- gg.basic + theme_light() + labs(title= "Theme light")
ggarrange(gg.bw, gg.gray, gg.minimal, gg.void, gg.classic, gg.grey, gg.dark, gg.light,
ncol = 4, nrow=2)
gg.wtbar +
scale_fill_manual(values = poke_colors) +
theme_minimal()
Things we need to fix and change: 1. Title is meaningless 2. Overlapping X-axis labels (can’t see the pokemon type) 3. Legend has no meaning in this plot 4. I don’t like grid lines 5. I want to change my fonts
Item 1 is relatively quick fix since we have gone through them earlier! So, let’s do this first!
# Set new object for your plot first.
gg.wtbar2 <- gg.wtbar + scale_fill_manual(values = poke_colors)
gg.wtbar2 +
labs(title = "") +
theme_minimal()
For item 2, we can rotate the axis so we can see the pokemon type more clearly. We can do this using coord_flip()
to flip the axis.
gg.wtbar2 +
labs(title = "") +
theme_minimal() +
coord_flip()
There are other coordinate system functions such as:
coord_cartesian coord_flip coord_polar coord_trans
coord_equal coord_map coord_quickmap
coord_fixed coord_munch coord_sf
For item 3, we can remove the legend with legend.position = "none"
in the theme setting.
gg.wtbar2 +
labs(title = "") +
theme_minimal() +
coord_flip() +
theme(legend.position = "none")
To remove grid lines, there are several options:
panel.grid.major.x panel.grid.minor.x
panel.grid.major.y panel.grid.minor.y
These options pertain to the major and minor grid lines on the x and y axes. To modify these, you need to set them like this: panel.grid.major.x = element_line()
. element_line()
is used to modify line elements like the grids or x/y axes. The properties of element_line()
can be found in the documentation. To remove the grid, we can simply use element_blank()
as blank default.
gg.wtbar2 +
labs(title = "") +
theme_minimal() +
coord_flip() +
theme(legend.position = "none",
# Modify vertical grid lines
panel.grid.major.x = element_line(size=1),
panel.grid.minor.x = element_line(linetype = "dashed", size=0.5),
# Set blank horizontal grid lines
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
)
Changing fonts can be complicated in ggplot, because there are so many elements where fonts or texts are used! We are going to some today:
axis.text axis.text.x axis.text.y
axis.title axis.title.x axis.title.y
Simply put, if there is no suffix at the end, the option applies to both x and y axes! We use element_text()
to edit the properties of texts.
gg.wtbar2 +
labs(title = "") +
theme_minimal() +
coord_flip() +
theme(legend.position = "none",
# Modify vertical grid lines
panel.grid.major.x = element_line(size=1),
panel.grid.minor.x = element_line(linetype = "dashed", size=0.5),
# Set blank horizontal grid lines
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
# tick labels along axes
axis.text = element_text(size = 8, family = "mono"),
# Axis title
axis.title = element_text(size = 12, face = "bold", family = "mono"),
)
Hint: for overall settings, you can use axis.text = element_text()
or axis.title = element_text()
.
# Type your code after this line!
gg.spdbar + labs(title="") +
theme(legend.position = "none",
# Modify vertical grid lines
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
# Modify horizontal grid lines
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
# tick labels along axes
axis.text.x = element_text(size = 14, family = "serif"),
axis.text.y = element_text(size = 14, family = "serif"),
# Axis title
axis.title = element_text(size = 16, face = "bold", family = "serif"),
)
There are many resources online for ggplot themes! Here are some of them:
ggpubr
: Useful functions to prepare plots for publication
BBC style
: BBC style theme ggthemes
: Set of extra themes
ggthemr
: More themes
ggsci
: Color palettes for scales
paletteer
: A package of packages of themes
Here are some tips and tricks for you to explore when you plot your data. The list is inexhaustive, but here are some of the common ones most people will use.
Sometimes, we have need to plot multiple plots. You might have many associations to explore, or finding some patterns in your data. If that is the case, we can merge these plots together using ggarrange
. ggarrange
comes from the ggpubr
package. There are also other packages out there for you to explore!
For example, these are the plots I wish to create in one single plot:
Notice, that one common variable is attack
? In that case, we can first create a sample plot, then make it into a function so we don’t have to copy-paste the code every single time.
df %>% ggplot(aes(x=attack, y=defense)) +
geom_smooth(method="lm", se=FALSE) +
geom_point(aes(color=status), size=2) +
labs(x="Attack", y="Defense") +
theme_bw() +
theme(legend.position = "top") +
scale_color_manual(values = c(`Normal` = "#DDDDDD",
`Sub Legendary` = "#96CAF7",
`Mythical` = "#FF51EB",
`Legendary` = "#AC27A4"))
Now that we have our plot, we need to turn it into a function! Don’t worry if you do not know how a function works or how to write one. We have been working with functions all this while actually. What actually goes into the function are arguments which the function will take and use it in a specific place.
Here, we want to modify the x
, y
, x title
, and y title
as these are the only parts where we want to plot for different combinations. So, we need to modify the plot a little bit to get this out. To create a function, we can parse the function(){}
into an object like this:
# we parse our arguments: df, x, y, x_title, y_title
# These are the parts we will tweak with the function when we plot next.
Scatter <- function(df, x, y, x_title, y_title){
df %>% mutate(status = factor(status, levels = c("Normal", "Sub Legendary", "Mythical", "Legendary"))) %>%
ggplot(aes_string(x=x, y=y)) + # we use aes_string for function-based ggplot
geom_jitter(aes(color=status), size=2) + # Use jitter to plots don't overlay on each other perfectly
geom_smooth(method="lm", se=FALSE) +
labs(x=x_title, y=y_title) +
theme_bw() +
scale_color_manual(values = c(`Normal` = "#DDDDDD",
`Sub Legendary` = "#96CAF7",
`Mythical` = "#FF51EB",
`Legendary` = "#AC27A4"))
}
Notice that I use aes_string()
instead of aes()
? This is because we are using strings in the argument, but ggplot takes in object as the argument. Now, we can finally plot four plots right now and merge them!
gg.atk.def <- Scatter(df, x="attack", y="defense", x_title="Attack", y_title="Defense")
gg.atk.hp <- Scatter(df, x="attack", y="hp", x_title="Attack", y_title="Hit Points")
gg.atk.spd <- Scatter(df, x="attack", y="speed", x_title="Attack", y_title="Speed")
gg.atk.cr <- Scatter(df, x="attack", y="catch_rate", x_title="Attack", y_title="Catch Rate")
ggarrange(gg.atk.def, gg.atk.hp, gg.atk.spd, gg.atk.cr,
ncol=2, nrow=2, common.legend = TRUE, legend="bottom")
Now that we got through most of the basic and some advanced plotting techniques, it is time to put everything together! This is a chance for you to be creative with your plots and showcase your personality.
Hint: for overall settings, you can use axis.text = element_text()
or axis.title = element_text()
.
# Type your code after this line!
gg.spdbar + labs(title="") +
geom_bar(aes(fill=status),
color= "black", position="dodge", stat="identity",
size=1) +
theme(legend.position = "none",
# Modify vertical grid lines
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
# Modify horizontal grid lines
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
# tick labels along axes
axis.text.x = element_text(size = 12, family = "serif"),
axis.text.y = element_text(size = 12, family = "serif"),
# Axis title
axis.title = element_text(size = 12, face = "bold", family = "serif"),
) +
scale_fill_manual(values = c("gray", "red", "purple", "orange"))
The important thing about data visualization is telling a story. There may be a lot to unpack in one single graph, so what is it you want to tell to your audience? This is important that emphasis directs the attention away from distractions and towards the impact of the data.
For example, we can plot Height vs Weight for Pokemon. There isn’t anything spectacular about weight and height plot you see below.
df %>% filter(generation==1) %>%
ggplot(aes(x=weight_kg, y=height_m)) +
geom_point(aes(fill=speed), size=4, pch=21, color="black") +
geom_smooth(method="glm", se=FALSE, color="black") +
labs(x="Weight (kg)", y= "Height (m)", fill = "Speed") +
theme_minimal()
There are just a couple of outliers. However, if you look closely, you’ll see an unusual pattern!
df %>% filter(generation==1) %>%
ggplot(aes(x=weight_kg, y=height_m)) +
geom_point(aes(fill=speed), size=4, pch=21, color="black") +
geom_smooth(method="glm", se=FALSE, color="black") +
labs(x="Weight (kg)", y= "Height (m)", fill = "Speed") +
# Add labels
geom_label_repel(data=subset(df %>% filter(generation==1),
(weight_kg > 200 & height_m >6) |
(weight_kg >250 & height_m < 3) |
(weight_kg <100 & height_m > 2.5)),
aes(label=name),
box.padding = 0.35,
point.padding = 0.5,
segment.color = 'grey50') +
theme_minimal()
Notice that there two different groups of outliers? 1) short and heavy and
2) Tall and heavy
By identifying these pokemon, the focus is now on them rather than the other 150 Generation 1 pokemons! You can go even further to highlight these pokemons using gghighlight()
.
df %>% filter(generation==1) %>%
select(weight_kg, height_m, speed) %>%
filter(!is.na(weight_kg)) %>%
ggplot(aes(x=weight_kg, y=height_m)) +
geom_smooth(method="glm", se=FALSE, color="black") +
geom_point(aes(fill=speed), size=4, pch=21, color="black") +
gghighlight(n=1, (weight_kg > 200 & height_m >6) |
(weight_kg >250 & height_m < 3) |
(weight_kg <100 & height_m > 2.5)) +
labs(x="Weight (kg)", y= "Height (m)", fill = "Speed") +
geom_label_repel(data=subset(df %>% filter(generation==1),
(weight_kg > 200 & height_m >6) |
(weight_kg >250 & height_m < 3) |
(weight_kg <100 & height_m > 2.5)),
aes(label=name),
box.padding = 0.35,
point.padding = 0.5,
segment.color = 'grey50') +
scale_fill_viridis() + # Change the color
theme_minimal()
Here are some demonstrations on domain-specific plots for research in different fields. The list is non-exhaustive, and you can search for even more plots in Google!
You can use polar plots to show how different stats rank up for different types of pokemon! This is just a variant of the bar chart or time series plot for highlighting different parts or making plots more presentable.
df %>% select(type_1, hp:speed) %>%
group_by(type_1) %>% summarize_all(mean) %>%
gather(key="Stats", value="Points", -type_1) %>%
rename(Type = type_1) %>%
ggplot(aes(x = Stats, y = Points, fill = Stats)) +
geom_col(position = "dodge") +
coord_polar() +
facet_wrap(.~Type, ncol=6) +
scale_fill_brewer(palette = "Set2") +
theme_bw()
Heat Maps are graphical representations of data that utilize color-coded systems. The primary purpose of Heat Maps is to better visualize the volume of locations/events within a dataset and assist in directing viewers towards areas on data visualizations that matter most. Here, Heat Maps can be great to examine effectiveness of every pokemon type. One way to visualize heatmap is to use geom_tile()
.
df %>% select(type_1, against_normal:against_fairy) %>%
group_by(type_1) %>%
summarize_all(median) %>%
reshape2::melt() %>%
mutate(Against = str_to_title(gsub("against_", "", variable))) %>%
ggplot(aes(x=type_1, y=Against)) +
geom_tile(aes(fill = value)) +
theme_minimal() +
scale_fill_gradient2() +
labs(x='Pokemon Type',
y='Against',
fill = "Effectiveness",
title = 'Damage taken of different types of Pokemons') +
theme(text = element_text(family= "mono"),
axis.text.x = element_text(angle = 45,
hjust = 1,
vjust = 1))
Network plots are interesting as data can also be interpreted as networks! Facebook friends, LinkedIn connections, and relationships among companies or people consist of networks. This is the same for pokemon. We can also plot the Effectiveness of pokemon as a network.
the against_*
variables denotes the amount of damage taken against an attack of a particular type. So if Fire against water is 2, then fire-type pokemon will take double damage if facing against a water-type pokemon.
First we will shift the values by -1 so that pokemon that are equally effective against each other (value=1) will now be 0. Then, any values falling below 0 are not effective, and any falling above 0 is effective. Let’s see what the resulting plot is.
# Set up data frame
df.graph <- df %>% select(type_1, against_normal:against_fairy) %>%
group_by(type_1) %>%
summarize_all(median) %>%
reshape2::melt() %>%
mutate(Against = type_1,
Type = str_to_title(gsub("against_", "", variable)),
Effectiveness = value) %>%
select(Type, Against, Effectiveness)
df.graph$Type[df.graph$Type=="Fight"] <- "Fighting" # Need to change Fight to Fighting
df.graph
After we set up our data frame, we need to set up our nodes and edges! Here’s an example:
# Set up nodes
path_to_images <- "https://raw.githubusercontent.com/winsonfzyang/RVisWorkshop/main/handouts/img/pokemon/"
nodes <- data.frame(id = unique(df.graph$Type))
# nodes$group <- nodes$id
nodes$label <- nodes$id
nodes$image <- paste0(path_to_images, nodes$id,".jpg")
nodes$color <- poke_colors[match(nodes$id, names(poke_colors))] # Match colors
nodes$shape <- "circularImage"
nodes$size <- 25
# Set up edges
edges <- df.graph %>% filter(Effectiveness>1)
names(edges) <- c("from", "to", "value")
edges$value <- NULL
edges$arrows <- "middle"
edges$width <- 2
# Compute node size base on in/out edges
from.size <- edges %>% group_by(from) %>% count(); names(from.size) <- c("type", "size")
to.size <- edges %>% group_by(to) %>% count(); names(to.size) <- c("type", "size")
size.df <- full_join(from.size, to.size, by = "type")
size.df[is.na(size.df)] <- 0
names(size.df) <- c("Type", "From", "To")
size.df$nodesize <- size.df$From - size.df$To
size.df$nodesize_scale <- size.df$nodesize + abs(min(size.df$nodesize)) + 1
size.df <- size.df[match(nodes$id, size.df$Type),] # Make sure that the id falls in line for both data frames
nodes$size <- size.df$nodesize_scale * 6
Now we can plot our network!
# Plot
visNetwork(nodes, edges, height = "500px", width = "100%") %>%
visNodes(font=list(size=20),
shapeProperties = list(useBorderWithImage = TRUE)) %>%
visEdges(arrows = list(to = list(enabled = TRUE, scaleFactor = 2)),
width=0.1) %>%
visInteraction(dragNodes = FALSE,
dragView = TRUE,
zoomView = TRUE) %>%
visPhysics(solver = "forceAtlas2Based",
forceAtlas2Based = list(gravitationalConstant = -60))%>%
visLayout(randomSeed = 2069)%>%
visIgraphLayout(layout = "layout_in_circle") %>%
visOptions(highlightNearest = list(enabled=TRUE,degree=0,hover=TRUE))
Ggplot can also plot geographical data!! Here is an example for volcanoes in the world.
volcano <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-12/volcano.csv")
# Load volcano data
volcano %>%
select(primary_volcano_type, longitude,
latitude, population_within_100_km) %>%
head()
# Load World map
world <- map_data("world")
world %>% head()
Now we have all the data to make world map. Let’s plot a simple world map using geom_map()
and see what we get. For plotting maps, we need two important information: longitude and lattitude.
ggplot() +
geom_map(data = world,
map = world,
aes(x=long, y=lat, map_id = region),
color = "black", fill = "lightgray", size = 0.1) +
theme_void() +
theme(legend.position = "none")
Now that we know how to make a simple world map, we can overlay data points on the world map to make the map more useful.
Here, we overlay different types of volcano from the volcano data on the world map. To do this, use geom_point()
and provide volcano data and longitude/latittude inside aes()
. We can also add colors for different primary_volcano_type
.
ggplot() +
geom_map(data = world,
map = world,
aes(x=long, y=lat, map_id = region),
color = "black", fill = "lightgray", size = 0.1) +
geom_point(data = volcano,
aes(x=longitude, y=latitude, color = primary_volcano_type),
alpha = 0.7) +
theme_void() +
theme(legend.position = "none")
Let us customize the world map with volcano locations further, by adding size variable based on the number of people affected by volcano. We can add size argument with population_within_100_km
to make bubbles on the world map.
ggplot() + geom_map(data = world,
map = world,
aes(x=long, y=lat, map_id = region),
color = "white", fill = "lightgray", size = 0.1) +
geom_point(data = volcano,
aes(x=longitude, y=latitude,
color=primary_volcano_type, size=population_within_100_km),
alpha = 0.5) +
theme_void() +
theme(legend.position = "none",
text=element_text(family="serif"))+
labs(title="Volcano Locations")
There are several ggplot extensions available in R. The list is inexhaustible, but here are some of them:
ggbeewsarm
: An alternative strip one-dimensional scatter plot like “stripchart”
ggridges
: Partially overlapping line plots that create the impression of a mountain range
cowplot
: Combining plots
ggrepel
: Advanced text labels including overlap control
ggmap
: Dedicated to mapping
ggraph
: Network graphs
ggiraph
: Converting ggplot2 to interactive graphics