Data Visualization
NDMC, EPHI
July 21 - 25, 2025
What are the key principles, methods, and concepts required to visualize data for publications, reports, or presentations?
The effectiveness of data visualization depends on several factors
What would you like to communicate?
Who is your audience? Researchers? Journalists? General public? Grant reviewers?
What is the best way to represent your data and your message?
A few packages to create figures in R are
ggplot2
grammer of graphicscowplot
for composing ggplotsggtext
for advanced text renderingggthemes
for additional themesgrid
for creating graphical objectsgridExtra
additional functions grid
patchwork
for multi-panel plotsggiraph
interactive visualizationshighcharter
interactive visualizationsplotly
interactive visualizationsggplot2
is a system for declaratively creating graphics, based on the Grammar of Graphics.
You provide the data, tell ggplot2
how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
Why ggplot2?
A ggplot is built up from a few basic elements:
geom_
: The geometric shapes that will represent the data.aes()
: Aesthetics of the geometric and statistical objects, such as position, color, size, shape, and transparencyscale_
: Maps between the data and the aesthetic dimensions, such as data range to plot width or factor values to colors.stat_
: Statistical summaries of the data, such as quantiles, fitted curves, and sums.coord_
: The transformation used for mapping data coordinates into the plane of the data rectangle.facet_
: The arrangement of the data into a grid of plots.theme()
: The overall visual defaults of a plot, such as background, grids, axes, default typeface, sizes and colors.aes()
) (a.k.a. mapping)x
, y
: variablescolour
: colours the lines of geometriesfill
: fill geometries or fill colorgroup
: groups based on the datashape
: shape of point, an integer value 0 to 24, or NAlinetype
: type of line, a integer value 0 to 6 or a stringsize
: sizes of elements, a non-negative numeric valuealpha
: changes the transparency,a numeric value 0 to 1The general syntax is:
ggplot(data = data, mapping = aes(mapings))+ geom_function()
Geom Components
Geom | Description | Input |
---|---|---|
geom_histogram | Histograms | Continous x |
geom_bar |
Bar plot with frequncies | Discrete x |
geom_point |
Points/scattorplots | Discrete/continuous x and y |
geom_boxplot |
Box plot | Disc. x and cont. y |
geom_smooth |
function line based on data | |
geom_line |
Line plots | Discrete/continuous x and y |
geom_abline |
Reference | line intercept and slope value |
geom_hline |
geom_vline | Reference lines xintercept or yintercept |
Positions
geom_bar(position = "<position >")
facet_grid
vs facet_wrap
facet_grid()
facets the plot with a variable in a single direction (horizontal or vertical)facet_wrap()
simply places the facets next to each other and wraps them according to the provided number of columns and/or rows.The following table describes how facet formulas work in facet_grid()
and facet_wrap()
:
Type | Formula | Description |
---|---|---|
Grid | facet_grid(. ~ x) | Facet horizontally across x values |
Grid |
facet_grid(y ~ .) | Facet vertically across y values |
Grid |
facet_grid(y ~ x) | Facet 2-dimensionally |
Wrap |
facet_wrap(~ x) | Facet across x values |
Wrap |
facet_wrap(~ x + y) | Facet across x and y values |
stat_*()
) computed on the data.
stat_*()
-like functions perform computations such as means, counts, linear models, and other statistical summaries of data.coord_*()
) establish representation rules to print the data
coord_cartesian()
for the Cartesian plane;coord_polar()
for circular plots;coord_map()
for different map projections.plot.object <- ggplot()
plot.object <- plot.object + geom_*()
plot.object <- plot.object + coord_*() + theme()
plot.object
or print(plot.object)
palmerpenguins
We will use the palmerpenguins
data set:
This data set contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.
Let us take a look at the variables in the penguins data set:
ggplot2
ggplot2
requires you to prepare the data as an object of class data.frame
or tibble
(common in the tidyverse
).More complex plots in ggplot2
require the long data frame format.
Scientific questions
Is there a relationship between the length & the depth of bills?
Does the size of the bill & flipper vary together ?
How are these measures distributed among the 3 penguin species ?
How can we graphically address these questions with ggplot2
?
Title and axes components: changing size, colour and face
labs()
xlab()
and ylab()
: These functions specifically set the x-axis and y-axis labels, respectively.Increasing Space Between Axis and Axis Titles
element_text()
: While primarily used in theme()
for overall theme customization, element_text()
can be used to specify text properties such as size, color, and font face for axis labels.vjust
which controls the vertical alignment, typically ranging between 0 and 1, but can extend beyond this range.face
argument can be set to bold, italic, or bold.italic:angle
, hjust
and vjust
can rotate any text element. hjust
and vjust
used to adjust the position horizontally (0 = left, 1 = right) and vertically (0 = top, 1 = bottom):element_blank()
used to remove axis text and ticks," "
in the labs()
function:💡 Using NULL removes the element, while empty quotes ” ” keep the space for the axis title but print nothing.
ggtitle()
, labs()
, and theme()
functions. Below is a list of the main functions and their key arguments for title customization:Main Functions and Arguments
ggtitle()
: used to label
the text for the main title.
ggtitle("Main Title")
labs()
:
title
: The text for the main title.subtitle
: The text for the subtitle.caption
: The text for the caption.tag
: The text for a tag.labs(title = "Main Title", subtitle = "Subtitle", caption = "Caption", tag = "Fig. 1")
theme()
: Customize the appearance of the text elements.
plot.title
: Customize the main title text, subtitle, caption and tag text. Example:theme(plot.title = element_text(face = "bold", size = 14, hjust = 0.5))
theme(plot.subtitle = element_text(size = 12, hjust = 0.5))
theme(plot.caption = element_text(size = 10, hjust = 0))
theme(plot.tag = element_text(size = 8, hjust = 1))
element_text(face, size, family, hjust, vjust, margin, lineheight)
: Control the font face, size, family, alignment, margin, and line height.ggplot2
is that it adds a legend by default when mapping a variable to an aesthetic. You can see that by default the legend title is what we specified in the color argument:The main functions and methods to customize legends in ggplot2:
To Turn Off the Legend: we can use the following code
theme(legend.position = "none")
guides(color = "none")
scale_color_discrete(guide = "none")
theme(legend.title = element_blank())
scale_color_discrete(name = NULL)
labs(color = NULL)
- `theme(legend.position = "top")`
- `theme(legend.position = c(x, y), legend.background = element_rect(fill = "transparent"))` to add legend inside the plot
- `factor(penguins$species, levels = c("Chinstrap", "Gentoo", "Adelie"))`
theme()
Default theme: The default theme is theme_gray()
.
base_family
).plot+theme_gray()
plot+theme_bw()
plot+theme_linedraw()
plot+theme_light()
plot + theme_dark()
plot + theme_minimal()
plot + theme_classic()
plot + theme_void()
theme()
has many arguments to control and modify individual components of a plot theme, including:The main functions to customize the background of the plot in the provided code and explanation involve modifying elements of the theme
function in ggplot2
. Here are the key functions and elements used:
The panel background refers to the area where the data is plotted.
panel.background
: Adjusts the background color and outline of the panel area.theme(panel.background = element_rect(fill = "#64D2AA", color = "#64D2AA", linewidth = 2))
The panel border is an overlay on top of the panel.background
which outlines the panel.
panel.border
: Sets the border properties of the panel.
theme(panel.border = element_rect(fill = "#64D2AA99", color = "#64D2AA", linewidth = 2))
Grid lines help in referencing the data points against the axes.
panel.grid
: Changes properties for all grid lines.panel.grid.major
: Changes properties for major grid lines.panel.grid.minor
: Changes properties for minor grid lines.panel.grid.major.x
and panel.grid.major.y
: Change properties for major grid lines on the x and y axes separately.panel.grid.minor.x
and panel.grid.minor.y
: Change properties for minor grid lines on the x and y axes separately.ggplot(data = penguins) +
geom_point(aes(x= bill_length_mm, y= bill_depth_mm, color= body_mass_g)) +
labs(x = "Bill length (mm)", y = "Bill depth (mm)")+
theme(panel.grid.major = element_line(linewidth = .5, linetype= "dashed"),
panel.grid.minor = element_line(linewidth = .25, linetype= "dotted"),
panel.grid.major.x = element_line(color = "red1"),
panel.grid.major.y = element_line(color = "blue1"),
panel.grid.minor.x = element_line(color = "red4"),
panel.grid.minor.y = element_line(color = "blue4"))
Grid lines can be selectively removed.
element_blank()
: Used to remove specific theme elements.
theme(panel.grid.minor = element_blank())
theme(panel.grid = element_blank())
When creating multi-panel plots in ggplot2
, there are several functions and themes available to customize their appearance. Here’s a breakdown of the main functions and customization options based on the provided code:
Creating Facets with facet_grid
and facet_wrap
facet_wrap(variable ~ .)
:ncol
and nrow
:
facet_wrap
.scales
:scales = "free"
or control specific axis with scales = "free_x"
or scales = "free_y"
.{gridExtra}
package:
Several functions and techniques are highlighted for customizing colors in ggplot2 plots.
color
and fill
Arguments: Define the outline color (color
) and the filling color (fill
) of plot elements.
geom_point(color = "steelblue", size = 2)
geom_point(shape = 21, size = 2, stroke = 1, color = "#3cc08f", fill = "#c08f3c")
scale_color_*
and scale_fill_*
Functions: Modify colors when they are mapped to variables. - These functions differ based on whether the variable is categorical (qualitative) or continuous (quantitative).**`scale_color_manual` and `scale_fill_manual`**: Manually specify colors for categorical variables. `scale_color_manual(values = c("dodgerblue4", "darkolivegreen4", "darkorchid3", "goldenrod1"))`
scale_color_brewer
and scale_fill_brewer
: Use predefined color palettes from ColorBrewer.scale_color_gradient
and scale_fill_gradient
: Apply a sequential gradient color scheme for continuous variables.
scale_color_gradient(low = "darkkhaki", high = "darkgreen")
scale_color_viridis_c
and scale_fill_viridis_c
: Use the Viridis color palettes, which are perceptually uniform and suitable for colorblind viewers.scale_color_viridis_c(option = "inferno")
geom_hline()
: Adds horizontal lines to a plot at specified y-axis values.
yintercept
: A numeric vector indicating where to draw the horizontal lines. geom_hline(yintercept = c(12, 23))
geom_vline()
: Adds vertical lines to a plot at specified x-axis values.
xintercept
: A numeric vector or aesthetic mapping for x-axis intercepts.color
, linewidth
, linetype
: Aesthetics for customizing the appearance of the line.geom_vline(aes(xintercept = 45), linewidth = 1.5, color = "firebrick", linetype = "dashed")
geom_abline()
: Adds lines with a specified slope and intercept to a plot.
intercept
: The intercept of the line.slope
: The slope of the line.color
, linewidth
: Aesthetics for customizing the appearance of the line. geom_abline(intercept = coefficients(reg)[1], slope = coefficients(reg)[2], color = "darkorange2", linewidth = 1.5)
Though the default is a LOESS or GAM smoothing, it is also easy to add a standard linear fit:
ggplot2
or on their own to create interactive visualizations:There are different interactive Plot Libraries. The following are among the few
Plot.ly}
is a tool for creating online, interactive graphics and web apps. The plotly
package in R allows you to easily convert your ggplot2
plots into interactive plots.
p1 <- ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, colour = species)) +
geom_point()
p2 <- ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, colour = species)) +
geom_density2d()
p3 <- ggplot(penguins, aes(x = species, fill = island)) +
geom_bar()
p4 <- ggplot(penguins, aes(x = species, y = bill_depth_mm, fill = species)) +
geom_boxplot()
library(patchwork)
p1 + p2 + p3 + p4
This is a blank plot, before we add any geom_*
to represent variables in the dataset.
A histogram is an accurate graphical representation of the distribution of numeric data. There is only one aesthetic required: the x
variable.
geom_boxplot()
and geom_signif()
You can use geom_line()
for line plots or time series plot to display values over time.
ggplot(economics, aes(x = date, y = unemploy)) +
geom_line(color = "#1f77b4", linewidth = 1) + # Classic blue color
labs(
title = "US Unemployment Over Time",
subtitle = "Number of unemployed (in thousands)",
x = "Year",
y = "Unemployed (thousands)",
caption = "Source: US Economic Time Series Data"
) +
scale_x_date(date_breaks = "5 years", date_labels = "%Y") +
scale_y_continuous(labels = scales::comma) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12, color = "gray50"),
panel.grid.major = element_line(color = "gray90", linewidth = 0.2),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white", color = NA)
)