1NDMC, EPHI and 2DMU & C4ED
April 28 - May 1, 2026

Take few minutes to introduce ourselves.
Please share …
01. Module 1
02. Module 2
03. Module 3
04. Module 4
Set up and utilize R & RStudio
Navigate R & Rmarkdown/Quarto scripts and RStudio projects
Basic operations in R/RStudio
Import and inspect data sets in R
Understand R data structures
Know how to get help
Managing data through filtering, summarizing, transforming, and joining
Visualizing data using the renowned ggplot2 package
Produce descriptive statistics and basic data analysis
Create reproducible Reporting using Quarto
By the end of this module you will be able to:
It’s free and open-source software
It runs on many operating systems: Windows, Unix and (Mac) OS X.
Reproducible — your analysis code is a full audit trail
R produces high-quality graphics. Publication quality figures
It has a large and welcoming community of users.
Rich ecosystem — 23,000+ packages for survival analysis, GIS mapping, disease modelling, and more
It is flexible enough to be used to create interactive web pages and automated reports.
Widely used by WHO, CDC, and academic researchers

“R is not just a tool — it’s a way of thinking clearly about data.”
Best Integrated Development Environment (IDE) for R.
Powerful and makes using R easier
RStudio can:
User-friendly interfaces
R is like a car’s engine
Rstudio is like a car’s dashboard (steering wheel, GPS, etc.) that makes the engine easier to use.



Tip
Always install R before RStudio.

| Pane | Location | Purpose |
|---|---|---|
| Source | Top-left | Write and save scripts |
| Console | Bottom-left | Run code interactively |
| Environment / History | Top-right | See objects; browse past commands |
| Files / Plots / Help | Bottom-right | Navigate files; view plots and docs |
Tip
Shortcut to run code: Place cursor on a line → Ctrl + Enter (Windows) or Cmd + Enter (Mac)
Before doing anything else, apply these settings for reproducibility:
Tools > Global Options > General > Basic
This ensures a clean, reproducible workspace every session.

A Project keeps all your files (data, scripts, figures) in one folder and sets the working directory automatically.
Basic-R-Training) and choose a locationBasic-R-Training/
├── data/ # Raw and cleaned datasets
├── scripts/ # R analysis scripts
├── figures/ # Saved plots
└── documents/ # Notes, reports
"data/my_data.csv").R is an object-oriented language. This means everything you create and manipulate in R—like numbers, text, datasets, and plots—is considered an object.
During an R session, objects are created and stored by name.
Results of calculations can be stored in objects using the assignment operators:
An arrow (<-) formed by a less than character and a hyphen without a space!. In RStudio (Alt + -)
The equal character (=).
Naming rules:
Object names cannot contain `strange’ symbols like !, +, -, #.
A dot (.) and an underscore ( _ ) are allowed, also a name starting with a dot.
Object names can contain a number but cannot start with a number.
R is case sensitive, X and x are two different objects
First create a new object called ‘x’
Objects that you create during an R session are hold in memory, the collection of objects that you currently have is called the workspace.
R has a wide variety of data types including:
A set of scalars arranged in a one-dimensional array.
Data values are all the same mode(data type), but can hold any mode.
Vectors can be created using the following functions:
class(x): returns class/type of vector xlength(x): returns the total number of elementsx[length(x)]: returns last value of vector xrev(x): returns reversed vectorsort(x): returns sorted vectorunique(x): returns vector without multiple elementsrange(x): Range of xquantile(x): Quantiles of x for the given probabilitieswhich.max(x): index of maximumwhich.min(x): index of minimumFactors in R are used to represent categorical data.
Factors can be ordered or unordered and are an important class for statistical analysis and for plotting.
Factors are stored as integers, and have labels associated with these unique integers.
Once created, factors can only contain pre-defined set values, known as levels. By default, R always sorts levels in alphabetical order.
Factors can be created using factor()
[1] Mild Severe None Moderate Moderate None
Levels: None < Mild < Moderate < Severe
levels().Matrix is a rectangular array arranged in rows and columns.
All columns in a matrix must have the same mode(numeric, character, etc.) and the same length.
Matrices can be created by:
e.g.
Assign names to rows and columns of a matrix
A data set in R is stored as a data frame.
Two-dimensional, arranged in rows and columns created using the function: data.frame()
A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.).
Example
Use head() and tail() to view the first (and last) five rows
Use View() to view an entire data frame object
Use str() to view the structure of data frame object
Use colnames() or names() to look variable names
Use colSums(is.na()) to sum missing data
Use subset() to subset data.
Use dim() or ncol() and nrow() to see dimensions of the dataframe
Use summary() to see basic statistics for each variables
iris data, a built-in data frame with 150 rows and 5 columns.iris # the whole data frame
iris[1, 1] # 1st element in 1st column
iris[1, 6] # 1st element in the 6th column
iris[, 1] # first column in the data frame
iris[1] # first column in the data frame
iris[1:3, 3]
iris[3, ] # the 3rd row
iris[1:6, ] # the 1st to 6th rows
iris[c(1,4), ] # rows 1 and 4 only
iris[c(1,4), c(1,3) ]
iris[, -1] # the whole except first column
iris$Sepal.Length # Also extracts a column 'Sepal.Length'
iris[,c("Sepal.Width", "Petal.Width")]# extract by name of columnPackages are collections of R functions, data, and compiled code in a well-defined format.
There are three categories of packages.
1. Base Packages: Providing the basic functionality, maintained by the R Core Development group. Currently, there are 14 packages, these are
2. Recommended Packages: also a default package, mainly including additional more complex statistical procedures. These are 15 packages
[1] "boot" "class" "cluster" "codetools" "foreign"
[6] "KernSmooth" "lattice" "MASS" "Matrix" "mgcv"
[11] "nlme" "nnet" "rpart" "spatial" "survival"
3. Contributed packages: This is where the real power lies! The CRAN repository features thousands of packages for every imaginable task.
install.packages("package_name") directly into the Console.library() is required in each new R session to use its functions.Note
"there is no package called 'dplyr'", you need to install it firstImporting data is rather easy in R but that may also depend on the nature of the data to be imported and from what format.
Most data are in tabular form such as a spreadsheet or a comma-separated file (.csv).
Base R has a series of read functions to import tabular data from plain text files with columns delimited by: space, tab, and comma, with or without a header containing the column names.
With an added package it is also possible to import directly from a Microsoft Excel spreadsheet format or other foreign formats from various sources.
In base R the standard commands to read text files are based on the read.table()function.
The following table lists the collection of the base R read functions.
For more details use the help command help(read.table) that will display help for all.
| Function name | Assumes header | Separator | Decimal | File type |
|---|---|---|---|---|
| read.table() | No | ” ” | . | .text |
| read.csv() | Yes | “,” | . | .csv |
| read.csv2() | Yes | “;” | , | .csv |
| read.delim() | Yes | “tab” | . | .text |
| read.delim2() | Yes | “tab” | , | .text |
From A Comma Delimited Text File (csv files) to R
read_csv() from readr packageFrom Excel to R: Using read_xlsx() from readxl package
The haven package handles all major statistical software formats:
The data import wizard is a quick and easy way to import your data
Composing the data import code…
Writing the import data function can be tricky. Try the import wizard pictured above. THEN, paste the code from the Code Preview section into your script.

R to csv: Use readr package
R to a text file:
R to Excel: The readxl package is for reading Excel files only. For writing to Excel, the writexl package is a great modern and simple option.
R to Stata, SPSS, and SAS: Use haven library
Export imported yrb_data data to different formats
library(readr)
library(writexl)
library(haven)
yrb_data <- read_csv("data/yrbss.csv")
write_csv(yrb_data, "data/yrb_data.csv")
write_xlsx(yrb_data, "data/yrb_data.xlsx")
write_dta(yrb_data, "data/yrb_data.dta")
write_sav(yrb_data, "data/yrb_data.sav")
# write_sas(yrb_data, "data/yrb_data.sas7bdat")
write_xpt(yrb_data, "data/yrb_data.sas7bdat")birthwt data (from R package MASS) as a CSV file named "birthwt.csv" and an SPSS file named "birthwt.sav" in your data folder.birthwt.csv) as infant_birthwt.birthwt.sav) as infant_birthwt_sav.