10:00
1NDMC, EPHI; 2SPH, AAU; 3DMU & C4ED
October 14 - 17, 2025
Take few minutes to introduce ourselves.
Please share …
10:00
01. Module 1
02. Module 2
03. Module 3
04. Module 4
Set up and utilize R & RStudio
Navigate R & Rmarkdown scripts and RStudio projects
Basic operations in R/RStudio
Import and inspect data sets in R
Understand data structures
Know how to get help
Manipulating data through filtering, summarizing, transforming, and joining
Visualizing data using the renowned ggplot2 package
It’s free and open-source software
It runs on many operating systems: Windows, Unix and (Mac) OS X.
R does not involve lots of pointing and clicking, and that’s a good thing
R produces high-quality graphics. Publication quality figures
It has a large and welcoming community of users.
R is interdisciplinary and extensible, with over 22,000 user-contributed packages that can be installed to extend its capabilities
R code is great for reproducibility
It is supported by comprehensive technical documentation and user contributed tutorials.
It is flexible enough to be used to create interactive web pages and automated reports.
It is currently one of the best and most popular tools for data analysis.
It stimulates critical thinking about problem-solving rather than a “push the button” mentality.
Best Integrated Development Environment (IDE) for R.
Powerful and makes using R easier
RStudio can:
User-friendly interfaces
R is like a car’s engine
Rstudio is like a car’s dashboard (steering wheel, GPS, etc.) that makes the engine easier to use.
Go to cran.r-project.org to access the R installation page. Then click the download link for Windows:
Choose the base sub-directory.
Then click on the download link at the top of the page to download the latest version of R:
File -> New -> R Script
, or click on the icon with the +
sign and select R Script
, or simply press Ctrl+Shift+N
.By default, RStudio is arranged into four window panes.
If you only see three panes, open a new script with File > New File > R Script
. This should reveal one more pane.
First, open a new script under the File menu if one is not yet open: File > New File > R Script
. In the script, type the following:
To run code, place your cursor anywhere in the line of code, then press Ctrl
+ Enter
(on Windows/Linux) or Cmd
+ Enter
(on Mac).
This should send the code to the Console and run it.
You can also run multiple lines at once.
Now drag your cursor to highlight both lines and press Control
+ Enter
.
To run the entire script, you can use Control
+ A
to select all code, then press `Control
+ Enter
.
Next, save the script. Hit Control
+ S
to bring up the Save dialog box.
The console, at the bottom left, is where code is executed. You can type code directly here, but it will not be saved.
Type a random piece of code (maybe a calculation like 3 + 3
) and press ‘Enter’.
If you place your cursor on the last line of the console, and you press the up arrow, you can go back to the last code that was run. Keep pressing it to scroll the previous commands.
To run any of these previous commands, press Enter.
At the top right of the RStudio Window, you should see the Environment tab.
The Environment tab shows datasets and other objects that are loaded into R’s working memory, or “workspace”.
To explore this tab, let’s import a dataset into your environment from R.
Type the code below into your script and run it:
You have now imported the dataset and stored it in an object named data
. (You could have named the object anything you want.)
data
dataset from the Environment tab. This opens it in a ‘View’ tab.rm()
function.Notice that the data
object no longer shows up in your environment after having run that code.
You can click a line to highlight it, then send it to the console or to your script with the “To Console” and “To Source” icons at the top of this tab.
To select multiple lines, use the “Shift-click” method: click the first item you want to select, then hold down the “Shift” key and click the last item you want to select.
Finally, notice that there is a search bar at the top right of the History pane where you can search for past commands that you have run.
The tab allows you to interact with your computer’s file system.
Try playing with some of the buttons here, to see what they do. You should try at least the following:
Make a new folder
Delete that folder
Make a new R Script
Rename that script
That code creates a time series plot of air line passengers in the AirPassengers
dataset.
You should see this figure in the Plots tab.
Now, test out the buttons at the top of this tab to explore what they do.
In particular, try to export a plot to your computer.
Packages are collections of R functions, data, and code that extend R’s capabilities.
Think of them as apps for your phone: you first install them, then you open (load) them to use them.
Packages need to be installed only once, but must be loaded in each new R session.
All the package names you see (in blue font) are packages that are installed on your system. And packages with a check mark are packages which are loaded in the current session.
Go to CRAN and download new version
On Windows: A more efficient way is to use the installr
package. install and load it, then run the updateR()
function.
basic Rgui
Version should update automatically in RStudio
On Mac: It’s generally recommended to download the new version directly from the CRAN website.
Then update the R packages with the code:
To update RStudio: Go to RStudio and download new version
Click on Help>Check for Updates, follow menu prompts
The Help tab shows the documentation for different R objects. Try typing out and running each line below to see what this documentation looks like.
This tab allows you to preview HTML files and interactive objects.
Tools > Global Options
to customize RStudio.Appearance
tab, you can choose a different Editor theme to change the look of your code. Many people find a dark theme easier on the eyes.Under Code > Display
, check these two boxes:
Finally, under General > Basic
, apply these critical settings for reproducibility:
Uncheck the box for “Restore .RData into workspace at startup”.
Set “Save workspace to .RData on exit” to Never.
This ensures you start with a clean slate every time, which prevents many common errors.
Organize your projects into self-contained folders from the start.
A well-organized project is easier to navigate, reproducible, and shareable.
Use sub-folders to keep things tidy.
data/
(raw/cleaned data)scripts/
(R code)figures/
(output visuals)documents/
(notes)Workshop Folder Structure
We will use a basic structure that is a good starting point and can be extended as needed:
Basic-R-training/
├── scripts/ # All R analysis scripts
├── data/ # Raw and cleaned datasets
├── figures/ # Figures, plots, and visualizations
└── documents/ # Notes, manuscripts, metadata
"data/my_data.csv"
).Basic-R-training
)..Rproj
file in your project folder. Or Use the 3D cube icon in RStudio.R is an object-oriented language. This means everything you create and manipulate in R—like numbers, text, datasets, and plots—is considered an object.
During an R session, objects are created and stored by name.
Results of calculations can be stored in objects using the assignment operators
:
An arrow (<-) formed by a less than character and a hyphen without a space!. In RStudio (Alt + -
)
The equal character (=).
There are some restrictions when giving an object a name:
Object names cannot contain `strange’ symbols like !, +, -, #.
A dot (.) and an underscore ( _ ) are allowed, also a name starting with a dot.
Object names can contain a number but cannot start with a number.
R is case sensitive, X and x are two different objects
First create a new object called ‘x’
Objects that you create during an R session are hold in memory, the collection of objects that you currently have is called the workspace.
R has a wide variety of data types including:
A set of scalars arranged in a one-dimensional array.
Data values are all the same mode(data type), but can hold any mode.
Vectors can be created using the following functions:
class(x):
returns class/type of vector xlength(x):
returns the total number of elementsx[length(x)]:
returns last value of vector xrev(x):
returns reversed vectorsort(x):
returns sorted vectorunique(x):
returns vector without multiple elementsrange(x):
Range of xquantile(x):
Quantiles of x for the given probabilitieswhich.max(x):
index of maximumwhich.min(x):
index of minimumFactors in R are used to represent categorical data.
Factors can be ordered or unordered and are an important class for statistical analysis and for plotting.
Factors are stored as integers, and have labels associated with these unique integers.
Once created, factors can only contain pre-defined set values, known as levels. By default, R always sorts levels in alphabetical order.
Factors can be created using factor()
levels()
.Matrix is a rectangular array arranged in rows and columns.
All columns in a matrix must have the same mode(numeric, character, etc.) and the same length.
Matrices can be created by:
e.g.
Assign names to rows and columns of a matrix
A data set in R is stored as a data frame.
Two-dimensional, arranged in rows and columns created using the function: data.frame()
A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.).
Example
Use head()
and tail()
to view the first (and last) five rows
Use View()
to view an entire data frame
object
Use str()
to view the structure of data frame
object
Use colnames()
or names()
to look variable names
Use colSums(is.na())
to sum missing data
Use subset()
to subset data.
Use dim()
or ncol()
and nrow()
to see dimensions of the dataframe
Use summary()
to see basic statistics for each variables
iris
data, a built-in data frame with 150 rows and 5 columns.iris # the whole data frame
iris[1, 1] # 1st element in 1st column
iris[1, 6] # 1st element in the 6th column
iris[, 1] # first column in the data frame
iris[1] # first column in the data frame
iris[1:3, 3]
iris[3, ] # the 3rd row
iris[1:6, ] # the 1st to 6th rows
iris[c(1,4), ] # rows 1 and 4 only
iris[c(1,4), c(1,3) ]
iris[, -1] # the whole except first column
iris$Sepal.Length # Also extracts a column 'Sepal.Length'
iris[,c("Sepal.Width", "Petal.Width")]# extract by name of column
Packages are collections of R functions, data, and compiled code in a well-defined format.
There are three
categories of packages.
1. Base Packages: Providing the basic functionality, maintained by the R Core Development group. Currently, there are 14 packages, these are
2. Recommended Packages: also a default package, mainly including additional more complex statistical procedures. These are 15 packages
[1] "boot" "class" "cluster" "codetools" "foreign"
[6] "KernSmooth" "lattice" "MASS" "Matrix" "mgcv"
[11] "nlme" "nnet" "rpart" "spatial" "survival"
3. Contributed packages: This is where the real power lies! The CRAN repository features thousands of packages for every imaginable task.
install.packages("package_name")
directly into the Console.library()
is required in each new R session to use its functions.Note
"there is no package called 'dplyr'"
, you need to install it firstImporting data is rather easy in R but that may also depend on the nature of the data to be imported and from what format.
Most data are in tabular form such as a spreadsheet or a comma-separated file (.csv).
Base R has a series of read functions to import tabular data from plain text files with columns delimited by: space, tab, and comma, with or without a header containing the column names.
With an added package it is also possible to import directly from a Microsoft Excel spreadsheet format or other foreign formats from various sources.
In base R the standard commands to read text files are based on the read.table()
function.
The following table lists the collection of the base R read functions.
For more details use the help command help(read.table) that will display help for all.
Function name | Assumes header | Separator | Decimal | File type |
---|---|---|---|---|
read.table() | No | ” ” | . | .text |
read.csv() | Yes | “,” | . | .csv |
read.csv2() | Yes | “;” | , | .csv |
read.delim() | Yes | “tab” | . | .text |
read.delim2() | Yes | “tab” | , | .text |
read.table()
and comma separated files using read.csv()
functions.file.choose()
function can be used to select the file interactively. i.e.From A Comma Delimited Text File (csv files) to R
read_csv()
from readr
packageFrom Excel to R: Using read_xlsx()
from readxl
package
The haven
package is the modern standard for importing data from other statistical software. It handles all recent versions of Stata, SPSS (.sav
), and SAS (.sas7bdat
).
Example: Import the Facility Characteristics data from the Ethiopia Service Provision Assessment (ESPA) 2021-22
The data import wizard is a quick and easy way to import your data
Composing the data import code…
Writing the import data function can be tricky. Try the import wizard pictured above. THEN, paste the code from the Code Preview section into your script.
The data import wizard will help you find the proper package for importing your data. For example, use…
library(readxl)
for Excel datalibrary(haven)
for SPSS, SAS, Statalibrary(readr)
for CSV or other delimitersJust start with File > Import Dataset
to get started composing that code, then paste your code into a script.
R to csv: Use readr
package
R to a text file:
R to Excel: The readxl
package is for reading Excel files only. For writing to Excel, the writexl
package is a great modern and simple option.
R to Stata, SPSS, and SAS: Use haven
library
Export imported yrb_data
data to different formats
library(readr)
library(writexl)
library(haven)
yrb_data <- read_csv("data/yrbss.csv")
write_csv(yrb_data, "data/yrb_data.csv")
write_xlsx(yrb_data, "data/yrb_data.xlsx")
write_dta(yrb_data, "data/yrb_data.dta")
write_sav(yrb_data, "data/yrb_data.sav")
# write_sas(yrb_data, "data/yrb_data.sas7bdat")
write_xpt(yrb_data, "data/yrb_data.sas7bdat")
"anc_data"
"fp_service"
anc_data
) as a CSV file named "antenatal.csv"
in your data
folderfp_service
) as a SPSS file (.sav) named "fp_services.sav"
in your data
folder