Introduction to R and RStudio

Leykun (MSc)1, Tesfamichael (MSc)2 & Yebelay (MSc)3

1NDMC, EPHI; 2SPH, AAU; 3DMU & C4ED


October 14 - 17, 2025

Introductions

Take few minutes to introduce ourselves.

Please share …

  1. Your name
  2. Your experience in R
  3. What you expect by the end of the training
10:00

Outlines

01. Module 1

  • Overview of R and R Studio
  • Workspace and R Objects
  • Reading and Writing data

02. Module 2

  • Data Management (dplyr, tidyverse)
    • Data Manipulation
  • Recoding Variables
  • Data merging
  • Data cleaning

03. Module 3

  • Data visualization (ggplot2)
  • EDA and Summary statistics

04. Module 4

  • Statistical Analysis
  • Creating reproducible reports (Quarto)

Getting started

Learning objectives

  • Set up and utilize R & RStudio

  • Navigate R & Rmarkdown scripts and RStudio projects

  • Basic operations in R/RStudio

  • Import and inspect data sets in R

  • Understand data structures

  • Know how to get help

  • Manipulating data through filtering, summarizing, transforming, and joining

  • Visualizing data using the renowned ggplot2 package

1. Why R?

  • It’s free and open-source software

  • It runs on many operating systems: Windows, Unix and (Mac) OS X.

  • R does not involve lots of pointing and clicking, and that’s a good thing

  • R produces high-quality graphics. Publication quality figures

  • It has a large and welcoming community of users.

  • R is interdisciplinary and extensible, with over 22,000 user-contributed packages that can be installed to extend its capabilities

Cont.

  • R code is great for reproducibility

  • It is supported by comprehensive technical documentation and user contributed tutorials.

  • It is flexible enough to be used to create interactive web pages and automated reports.

  • It is currently one of the best and most popular tools for data analysis.

  • It stimulates critical thinking about problem-solving rather than a “push the button” mentality.

2. RStudio

What is RStudio? Why use it?

  • Best Integrated Development Environment (IDE) for R.

  • Powerful and makes using R easier

  • RStudio can:

    • Organize your code, output, and plots.
    • Auto-complete code and highlight syntax.
    • Help view data and objects.
    • Enable easy integration of R code into documents.
  • User-friendly interfaces

  • R is like a car’s engine

  • Rstudio is like a car’s dashboard (steering wheel, GPS, etc.) that makes the engine easier to use.

Set up on Windows

  • Follow the steps below to download and install R:
  1. Go to cran.r-project.org to access the R installation page. Then click the download link for Windows:

  2. Choose the base sub-directory.

  3. Then click on the download link at the top of the page to download the latest version of R:

  • Or Download R from the CRAN website
  • During installation, accept the defaults; just keep clicking Next until the installation is done.

Download, install & run RStudio

  • To download RStudio,
  • Then click on the downloaded file and follow the installation instructions.

  • After installation, click to open the app from the start menu:

RStudio Overview

Getting Started

  • RStudio will open with 4 sections (called panes):

1. Source editor pane

  • You will write your R code/script here and it will be run in the console.
  • To create a new R script you can either go to File -> New -> R Script, or click on the icon with the + sign and select R Script, or simply press Ctrl+Shift+N.
  • Make sure to save the script.

2. Console pane

  • Interactively run R commands

3. Environment/history pane

  • Environment: view objects in the global environment
  • History: search and view command history

4. Files/Plots/Packages/Help pane

  • Files: navigate directories and
  • Plots: view generated plots.
  • Packages: manage packages (install or update)
  • Help: View help documentations for any package/function

The RStudio panes

  • By default, RStudio is arranged into four window panes.

  • If you only see three panes, open a new script with File > New File > R Script . This should reveal one more pane.

The RStudio panes

First, open a new script under the File menu if one is not yet open: File > New File > R Script. In the script, type the following:

Code
print("excited for R!")
  • To run code, place your cursor anywhere in the line of code, then press Ctrl + Enter (on Windows/Linux) or Cmd + Enter (on Mac).

  • This should send the code to the Console and run it.

You can also run multiple lines at once.

Code
print("excited for R!")
print("and RStudio!")
  • Now drag your cursor to highlight both lines and press Control + Enter.

  • To run the entire script, you can use Control + A to select all code, then press `Control + Enter.

  • Next, save the script. Hit Control + S to bring up the Save dialog box.

Console

  • The console, at the bottom left, is where code is executed. You can type code directly here, but it will not be saved.

  • Type a random piece of code (maybe a calculation like 3 + 3) and press ‘Enter’.

  • If you place your cursor on the last line of the console, and you press the up arrow, you can go back to the last code that was run. Keep pressing it to scroll the previous commands.

  • To run any of these previous commands, press Enter.

Environment

  • At the top right of the RStudio Window, you should see the Environment tab.

  • The Environment tab shows datasets and other objects that are loaded into R’s working memory, or “workspace”.

  • To explore this tab, let’s import a dataset into your environment from R.

  • Type the code below into your script and run it:

Code
data <- iris

You have now imported the dataset and stored it in an object named data. (You could have named the object anything you want.)

Environment

  • Now that the dataset is stored by R, you should be able to see it in the Environment pane.
  • If you click on the blue drop-down icon beside the object’s name in the Environment tab to reveal a summary.
  • Try clicking directly on the data dataset from the Environment tab. This opens it in a ‘View’ tab.

Environment

  • The broom icon, at the top of the Environment pane is used to clear your workspace.
  • You can also remove an object from the workspace with the rm() function.
  • Type and run the following in a new line on your R script.
Code
rm(data)

Notice that the data object no longer shows up in your environment after having run that code.

History

  • Next, the History tab shows previous commands you have run.
  • You can click a line to highlight it, then send it to the console or to your script with the “To Console” and “To Source” icons at the top of this tab.

  • To select multiple lines, use the “Shift-click” method: click the first item you want to select, then hold down the “Shift” key and click the last item you want to select.

  • Finally, notice that there is a search bar at the top right of the History pane where you can search for past commands that you have run.

Files

  • Next, the Files tab. This shows the files and folders in the folder you are working in.
  • The tab allows you to interact with your computer’s file system.

  • Try playing with some of the buttons here, to see what they do. You should try at least the following:

    • Make a new folder

    • Delete that folder

    • Make a new R Script

    • Rename that script

Plots

  • Next, the Plots tab. This is where figures that are generated by R will show up.
  • Try creating a simple plot with the following code:
Code
plot(AirPassengers)

  • That code creates a time series plot of air line passengers in the AirPassengers dataset.

  • You should see this figure in the Plots tab.

  • Now, test out the buttons at the top of this tab to explore what they do.

  • In particular, try to export a plot to your computer.

Packages

  • Next, let’s look at the Packages tab.

  • Packages are collections of R functions, data, and code that extend R’s capabilities.

  • Think of them as apps for your phone: you first install them, then you open (load) them to use them.

  • Packages need to be installed only once, but must be loaded in each new R session.

  • All the package names you see (in blue font) are packages that are installed on your system. And packages with a check mark are packages which are loaded in the current session.

Updating R and RStudio

Updating R

  • Go to CRAN and download new version

  • On Windows: A more efficient way is to use the installr package. install and load it, then run the updateR() function.

    • Updates R and Optionally updates all packages
    • May be better to do this in basic Rgui
  • Version should update automatically in RStudio

    • Check/change R version under Tools>Global Options>R version
  • On Mac: It’s generally recommended to download the new version directly from the CRAN website.

  • Then update the R packages with the code:

Code
update.packages(ask = FALSE, checkBuilt = TRUE)
  • To update RStudio: Go to RStudio and download new version

  • Click on Help>Check for Updates, follow menu prompts

Help

The Help tab shows the documentation for different R objects. Try typing out and running each line below to see what this documentation looks like.

Code
?t.test
?AirPassengers
?read.csv

  • Help files are not always very easy to understand for beginners, but with time they will become more useful.

Viewer

This tab allows you to preview HTML files and interactive objects.

RStudio Options for a Better Workflow

  • Go to Tools > Global Options to customize RStudio.

Appearance

  • Under the Appearance tab, you can choose a different Editor theme to change the look of your code. Many people find a dark theme easier on the eyes.

RStudio options

Code Settings

  • Under Code > Display, check these two boxes:

    • *Highlight R function calls**: Makes functions a different color so they stand out.
    • *Rainbow parentheses**: Assigns matching colors to nested parentheses, making them easier to track.

RStudio options

Workspace (Very Important!)

  • Finally, under General > Basic, apply these critical settings for reproducibility:

    • Uncheck the box for “Restore .RData into workspace at startup”.

    • Set “Save workspace to .RData on exit” to Never.

  • This ensures you start with a clean slate every time, which prevents many common errors.

Set 'Save workspace to .RData on exit' to'Never'

Wrapping up

  • Of course, you have only scratched the surface of RStudio functionality and you can find more on the cheatsheet below:

Getting Set Up: RStudio Projects

  • Organize your projects into self-contained folders from the start.

  • A well-organized project is easier to navigate, reproducible, and shareable.

  • Use sub-folders to keep things tidy.

    • data/ (raw/cleaned data)
    • scripts/ (R code)
    • figures/ (output visuals)
    • documents/ (notes)

Workshop Folder Structure
We will use a basic structure that is a good starting point and can be extended as needed:

Basic-R-training/
├── scripts/          # All R analysis scripts
├── data/             # Raw and cleaned datasets
├── figures/          # Figures, plots, and visualizations
└── documents/        # Notes, manuscripts, metadata

Key Benefits of RStudio Projects:

  • Automatically sets your working directory to your project folder
  • Makes file paths simple and relative (e.g., "data/my_data.csv").
  • Enhances reproducibility and collaboration.

Creating a New RStudio Project

  1. Open RStudio.
  2. In the top right, click the blue 3D cube labeled “Project: (None)”.
  3. Select New Project > New Directory > New Project.
  4. Enter a project name (e.g., Basic-R-training).
  5. Choose a location (e.g., Desktop).
  6. Click Create Project — your new project opens!
  7. To open the project later, simply double-click the .Rproj file in your project folder. Or Use the 3D cube icon in RStudio.

Objects in R

  • R is an object-oriented language. This means everything you create and manipulate in R—like numbers, text, datasets, and plots—is considered an object.

  • During an R session, objects are created and stored by name.

Results of calculations can be stored in objects using the assignment operators:

  • An arrow (<-) formed by a less than character and a hyphen without a space!. In RStudio (Alt + -)

  • The equal character (=).

Objects in R

There are some restrictions when giving an object a name:

  • Object names cannot contain `strange’ symbols like !, +, -, #.

  • A dot (.) and an underscore ( _ ) are allowed, also a name starting with a dot.

  • Object names can contain a number but cannot start with a number.

  • R is case sensitive, X and x are two different objects

Examples:

First create a new object called ‘x’

Code
    x<-5
    x=5
    5->x # or

R Workspace

Objects that you create during an R session are hold in memory, the collection of objects that you currently have is called the workspace.

Code
ls()             # list the objects in the current workspace

Data Type

R has a wide variety of data types including:

  • Scalars,
  • vectors (numerical, character, logical),
  • matrices and arrays,
  • data frames, and
  • lists.

Vector

  • A set of scalars arranged in a one-dimensional array.

  • Data values are all the same mode(data type), but can hold any mode.

    • e.g:(-2, 3.4, 3), (TRUE, FALSE, TRUE), (“blue”, “gray”, “red”)
  • Vectors can be created using the following functions:

    • c() function to combine individual values

      • x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
    • seq() to create more complex sequences

      • seq(from=1, to=10, by=2) or seq(1,10 )
    • rep() to create replicates of values: rep(1:4, times=2, each=2)

Some useful functions in vector

  • class(x): returns class/type of vector x
  • length(x): returns the total number of elements
  • x[length(x)]: returns last value of vector x
  • rev(x): returns reversed vector
  • sort(x): returns sorted vector
  • unique(x): returns vector without multiple elements
  • range(x): Range of x
  • quantile(x): Quantiles of x for the given probabilities
  • which.max(x): index of maximum
  • which.min(x): index of minimum

Factors

  • Factors in R are used to represent categorical data.

  • Factors can be ordered or unordered and are an important class for statistical analysis and for plotting.

  • Factors are stored as integers, and have labels associated with these unique integers.

  • Once created, factors can only contain pre-defined set values, known as levels. By default, R always sorts levels in alphabetical order.

  • Factors can be created using factor()

Code
size <- factor(c("small", "large", "small", "medium"))
  • The levels of a factor can be displayed using levels().

Matrix

  • Matrix is a rectangular array arranged in rows and columns.

  • All columns in a matrix must have the same mode(numeric, character, etc.) and the same length.

  • Matrices can be created by:

  1. matrix()
Code
mymatrix  <- matrix(vector, nrow=r, ncol=c, byrow=FALSE)
  • byrow=TRUE indicates that the matrix should be filled by rows.
  • byrow=FALSE indicates that the matrix should be filled by columns (the default).
  1. binding together vectors

Matrix

e.g.

Code
A <- matrix(data = 1:6, nrow = 3, ncol = 2)
B <- cbind(1:3,5:7,10:12)

Assign names to rows and columns of a matrix

Code
rownames(A) <- c("A", "B", "C") 
colnames(B)<- c("a", "b", "c")

Data frames

  • A data set in R is stored as a data frame.

  • Two-dimensional, arranged in rows and columns created using the function: data.frame()

  • A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.).

Example

Code
age <- c(25, 30, 56)
gender <- c("male", "female", "male")
weight <- c(160, 110, 220) 
mydata <- data.frame(age, gender, weight) 
  • We can enter data directly by access the editor using either the edit() or fix()
Code
 new_data<-data.frame()  # creates an "empty" data frame
 new_data<-edit(new_data) # request the changes or  `fix(new.data)`

Some functions for inspecting the data

  • Use head() and tail() to view the first (and last) five rows

  • Use View() to view an entire data frame object

  • Use str() to view the structure of data frame object

  • Use colnames() or names() to look variable names

  • Use colSums(is.na()) to sum missing data

  • Use subset() to subset data.

  • Use dim() or ncol() and nrow() to see dimensions of the dataframe

  • Use summary() to see basic statistics for each variables

Subsetting

  • Using iris data, a built-in data frame with 150 rows and 5 columns.
Code
iris # the whole data frame 
iris[1, 1] # 1st element in 1st column 
iris[1, 6] # 1st element in the 6th column 
iris[, 1] # first column in the data frame 
iris[1] # first column in the data frame 
iris[1:3, 3] 
iris[3, ] # the 3rd row 
iris[1:6, ] # the 1st to 6th rows
iris[c(1,4), ] # rows 1 and 4 only 
iris[c(1,4), c(1,3) ] 
iris[, -1] # the whole except first column
iris$Sepal.Length # Also extracts a column 'Sepal.Length'
iris[,c("Sepal.Width", "Petal.Width")]# extract by name of column

Installing and Loading Packages

  • Packages are collections of R functions, data, and compiled code in a well-defined format.

  • There are three categories of packages.

1. Base Packages: Providing the basic functionality, maintained by the R Core Development group. Currently, there are 14 packages, these are

Code
rownames(installed.packages(priority="base"))
 [1] "base"      "compiler"  "datasets"  "graphics"  "grDevices" "grid"     
 [7] "methods"   "parallel"  "splines"   "stats"     "stats4"    "tcltk"    
[13] "tools"     "utils"    

2. Recommended Packages: also a default package, mainly including additional more complex statistical procedures. These are 15 packages

Code
rownames(installed.packages(priority="recommended"))
 [1] "boot"       "class"      "cluster"    "codetools"  "foreign"   
 [6] "KernSmooth" "lattice"    "MASS"       "Matrix"     "mgcv"      
[11] "nlme"       "nnet"       "rpart"      "spatial"    "survival"  

3. Contributed packages: This is where the real power lies! The CRAN repository features thousands of packages for every imaginable task.

  • To see how many packages are currently available, you can run:
Code
nrow(available.packages())

Installing Packages

  • Option 1: Code (Recommended)
    • Type install.packages("package_name") directly into the Console.
Code
install.packages("tidyverse")
install.packages("readxl") 
install.packages("writexl")
install.packages("labelled")
  • Option 2: Menu
  • Option 3: Packages Window

Loading Packages

  • Installing a package is a one-time setup to download it onto your computer.
  • Loading a package with library() is required in each new R session to use its functions.
Code
# LOAD (do this every time you start a new script)
library(tidyverse)
library(readxl)
library(writexl)
library(labelled)

Note

  • If you get an error like "there is no package called 'dplyr'", you need to install it first
  • Installation downloads the package; loading makes it available for use
  • Only need to install once, but must load in every new session or script

Reading and Writing data

  • Importing data is rather easy in R but that may also depend on the nature of the data to be imported and from what format.

  • Most data are in tabular form such as a spreadsheet or a comma-separated file (.csv).

  • Base R has a series of read functions to import tabular data from plain text files with columns delimited by: space, tab, and comma, with or without a header containing the column names.

  • With an added package it is also possible to import directly from a Microsoft Excel spreadsheet format or other foreign formats from various sources.

Importing from local files

  • In base R the standard commands to read text files are based on the read.table()function.

  • The following table lists the collection of the base R read functions.

  • For more details use the help command help(read.table) that will display help for all.

Details of dataset readings
Function name Assumes header Separator Decimal File type
read.table() No ” ” . .text
read.csv() Yes “,” . .csv
read.csv2() Yes “;” , .csv
read.delim() Yes “tab” . .text
read.delim2() Yes “tab” , .text

Reading raw data from other sources

Import data

  • There are many ways to get data into R and out of R.
  • Import text file data using read.table() and comma separated files using read.csv() functions.
Code
# syntax: 
read.table("file name with full path", arguments)
Code
# Examples:# Creates a data frame named myData
  mydata<- read.table(file = "datafile.txt",sep=" ", header=TRUE)
  mydata<- read.csv(file = "datafile.csv")
  • File names are specified in the same way as file.choose() function can be used to select the file interactively. i.e.
Code
mydata <-read.csv(file.choose())

Read data

From A Comma Delimited Text File (csv files) to R

  • Using read_csv() from readr package
Code
library(readr)
mydata <- read_csv(file="mydata.csv")

Example:

Read data from the CDC’s Youth Risk Behavior Surveillance System (YRBSS)

Code
library(readr)
yrb_data <- read_csv("data/yrbss.csv")

From Excel to R: Using read_xlsx() from readxl package

Code
library(readxl)
mydata <-  read_xlsx(path="mydata.xlsx", sheet = 1, col_names = TRUE)

Read data

Stata, SPSS, and SAS to R:

The haven package is the modern standard for importing data from other statistical software. It handles all recent versions of Stata, SPSS (.sav), and SAS (.sas7bdat).

Code
# Recommended for Stata, SPSS, and SAS files
library(haven)
stata_data <- read_dta("filename.dta") 
spss_data <- read_sav("filename.sav")
sas_data <- read_sas("filename.sas7bdat")

Example: Import the Facility Characteristics data from the Ethiopia Service Provision Assessment (ESPA) 2021-22

Code
# Import the Facility Characteristics data from an ESPA Stata file
library(haven)
facility_data <- read_dta("data/ESPA_2021/ETFC81FLSP.DTA")
library(labelled)
# Convert all labelled variables (like categoricals) to R factors with value labels
facility_data <- to_factor(facility_data)

Data import wizard

The data import wizard is a quick and easy way to import your data

  • Inside the data wizard, you can copy the code from the code-preview window, then paste the code into the code chunk of your r script or quarto document.

Composing the data import code…

Writing the import data function can be tricky. Try the import wizard pictured above. THEN, paste the code from the Code Preview section into your script.

Easily write import data function

Excel, SPSS, SAS, etc.

The data import wizard will help you find the proper package for importing your data. For example, use…

Just start with File > Import Dataset to get started composing that code, then paste your code into a script.

Exporting Data

R to csv: Use readr package

Code
write_csv(data, "data/my_data.csv")

R to a text file:

Code
write.table(data, "data.txt", sep="\t")

R to Excel: The readxl package is for reading Excel files only. For writing to Excel, the writexl package is a great modern and simple option.

Code
library(writexl)
write_xlsx(data, "data/my_data.xlsx")

R to Stata, SPSS, and SAS: Use haven library

Code
library(haven) 
write_dta(data, "data/mydata.dta")
write_sav(data, "data/mydata.sav")
write_sas(data, "data/mydata.sas7bdat")

Example

Export imported yrb_data data to different formats

Code
library(readr)
library(writexl)
library(haven)
yrb_data <- read_csv("data/yrbss.csv")
write_csv(yrb_data, "data/yrb_data.csv")
write_xlsx(yrb_data, "data/yrb_data.xlsx")
write_dta(yrb_data, "data/yrb_data.dta")
write_sav(yrb_data, "data/yrb_data.sav")
# write_sas(yrb_data, "data/yrb_data.sas7bdat")
write_xpt(yrb_data, "data/yrb_data.sas7bdat")

Exercise: Importing and Exporting Data

  1. Import the antenatal care service Stata data from ESPA datasets as "anc_data"
  2. Import the family planning service Stata data from ESPA datasets as "fp_service"
  3. Export the imported antenatal care data (anc_data) as a CSV file named "antenatal.csv" in your data folder
  4. Export the imported family planning data (fp_service) as a SPSS file (.sav) named "fp_services.sav" in your data folder