1NDMC, EPHI; 2SPH, AAU; 3DMU & C4ED
October 14 - 17, 2025
A few packages to create figures in R are
ggplot2
: The core package for creating graphics based on the grammar of graphics.cowplot
for composing ggplotsggtext
for advanced text renderingggthemes
for additional themesgrid
for creating graphical objectsgridExtra
additional functions grid
patchwork
for multi-panel plotsggiraph
interactive visualizationshighcharter
interactive visualizationsplotly
interactive visualizationsggplot2
is a system for declaratively creating graphics, based on the Grammar of Graphics.
You provide the data, tell ggplot2
how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
Why ggplot2?
A ggplot is built up from a few basic elements:
geom_
: The geometric shapes that will represent the data.aes()
: Aesthetics of the geometric and statistical objects, such as position, color, size, shape, and transparencyscale_
: Maps between the data and the aesthetic dimensions, such as data range to plot width or factor values to colors.stat_
: Statistical summaries of the data, such as quantiles, fitted curves, and sums.coord_
: The transformation used for mapping data coordinates into the plane of the data rectangle.facet_
: The arrangement of the data into a grid of plots.theme()
: The overall visual defaults of a plot, such as background, grids, axes, default typeface, sizes and colors.aes()
) (a.k.a. mapping)x
, y
: variablescolour
: colours the lines of geometriesfill
: fill geometries or fill colorgroup
: groups based on the datashape
: shape of point, an integer value 0 to 24, or NAlinetype
: type of line, a integer value 0 to 6 or a stringsize
: sizes of elements, a non-negative numeric valuealpha
: changes the transparency,a numeric value 0 to 1The general syntax is:
ggplot(data = data, mapping = aes(mapings))+ geom_function()
Geom Components
Geom | Description | Input |
---|---|---|
geom_histogram |
Histograms | Continous x |
geom_bar |
Bar plot with frequncies | Discrete x |
geom_point |
Points/scattorplots | Discrete/continuous x and y |
geom_boxplot |
Box plot | Disc. x and cont. y |
geom_smooth |
Adds a smoothed conditional mean / regression line | Continuous x and y |
geom_line |
Line plots | Discrete/continuous x and y |
geom_abline |
Reference | line intercept and slope value |
geom_hline , geom_vline
|
Horizontal and vertical reference lines | yintercept or xintercept |
palmerpenguins
We will use the palmerpenguins
data set:
This data set contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.
Let us take a look at the variables in the penguins data set:
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
ggplot2
ggplot2
requires you to prepare the data as an object of class data.frame
or tibble
(common in the tidyverse
).More complex plots in ggplot2
require the long data frame format.
Scientific questions
Is there a relationship between the length & the depth of bills?
Does the size of the bill & flipper vary together ?
How are these measures distributed among the 3 penguin species ?
How can we graphically address these questions with ggplot2
?
Title and axes components: changing size, colour and face
How you visualize a variable’s distribution depends on its type: categorical or numerical.
The most common way to visualize a single categorical variable is with a bar chart, using geom_bar()
.
By default, bars are ordered alphabetically. For better comparisons, it’s often useful to order them by frequency using forcats::fct_infreq()
.
library(tidyverse)
library(haven)
library(labelled)
library(forcats)
sc_data <- read_dta("data/ESPA_2021/ETSC81FLSP.DTA")
sc_data_fct <- to_factor(sc_data)
ggplot(sc_data_fct, aes(x = fct_infreq(cfactype))) +
geom_bar(fill = "#1f77b4", width = 0.5) +
geom_text(stat = 'count', aes(label = after_stat(count)),
vjust = -0.5, size = 4) +
labs(
title = "Number of Sick-Child Consultations by Facility Type",
x = "Facility type",
y = "Count of consultations",
caption = "Source: ESPA 2021 — Sick child module") +
theme_classic(base_size = 15)
# Horizontal bar chart of penguin species with count labels
ggplot(penguins, aes(x = fct_rev(fct_infreq(species)), fill = species)) +
geom_bar(show.legend = FALSE, width = 0.6) +
coord_flip() + # Flip coordinates for horizontal bars
labs(title = "Penguin Species Count",
x = "Species",
y = "Count") +
theme_bw(base_size = 14)
Histograms are the standard way to view the distribution of a single numerical variable. They show “bins” of data to reveal the underlying frequency.
A density plot is a smoothed version of a histogram. It’s great for comparing distributions between groups.
Multiple Groups
How does a numerical variable’s distribution change across different categories? Boxplots are perfect for this.
A scatter plot is the classic way to show the relationship between two numerical variables.
We can add more variables to a plot using aesthetics like color
and shape
, or by using facets.
. . .
color
, shape
)We can add more variables to a plot using aesthetics like color
and shape
, or by using facets.
facet_wrap
)Facets create sub-plots for each category of a variable.
To save the most recently displayed plot, use ggsave()
. You can specify the filename, dimensions, and format.
# First, create the plot
penguin_plot <- ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(aes(color = species)) +
theme_minimal()
# Now, save it to a file
ggsave(
filename = "penguin_plot.png",
plot = penguin_plot,
width = 8,
height = 6,
dpi = 300 # Set resolution for high quality
)
This will save a file named penguin_plot.png
in your working directory.
To visualize how a numerical variable changes over time, we use a line chart with geom_line()
. The economics
dataset from ggplot2
is perfect for this.
p <- ggplot(economics, aes(x = date, y = unemploy)) +
geom_line(color = "#1f77b4", linewidth = 0.6) + # Classic blue color
labs(
title = "US Unemployment Over Time",
subtitle = "Number of unemployed (in thousands)",
x = "Year",
y = "Unemployed (thousands)",
caption = "Source: US Economic Time Series Data"
) +
scale_x_date(date_breaks = "5 years", date_labels = "%Y") +
scale_y_continuous(labels = scales::comma) +
theme_bw() +
theme(
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12, color = "gray50"),
# panel.grid.major = element_line(color = "gray90", linewidth = 0.2),
panel.grid.major = element_blank(),
panel.grid.minor = element_line(linewidth = 0.2)
)
p
Interactive plots enhance user experience with dynamic, engaging graphics
The plotly
package easily converts ggplot2
plots into interactive versions
Task: Create a customized bar chart showing the count of penguin species.
Requirements:
Task: Create a customized bar chart showing the count of facility types using the ESPA
, facility data.
Requirements: