Univariate Time Series Analysis with R


Leykun Getaneh (MSc)

NDMC, EPHI

February 17 - 20, 2026

🎯 Session Objectives

By the end of this session, you will be able to:

  • Understand core concepts of univariate time series: trend, seasonality, stationarity
  • Load, explore, and visualize time series data in R
  • Perform decomposition and stationarity diagnostics
  • Build, tune, and evaluate ARIMA/SARIMA models
  • Generate and evaluate forecasts with confidence intervals

Notes:

  • This session assumes you are comfortable with basic R and tidyverse.
  • We’ll use a monthly measles cases dataset for hands-on practice.

📈 Time Series Foundations

What is a Time Series?

  • A time series is a sequence of data points collected at successive, equally spaced points in time.

  • Our goal is often to understand past patterns or to forecast future values.

  • Forecasting uses historical patterns to predict future values

  • Application areas:

    • Public health (disease forecasting, demand for vaccines)
    • Economics/finance (inflation, sales, stock indices)
    • Energy (load forecasting), climate (temperature, rainfall), operations (inventory)

Key Components

  • Trend (T): The long-term direction (up, down, or flat).
  • Seasonal (S): A repeating, periodic pattern (e.g., monthly, quarterly).
  • Cyclical (C): Long-term patterns not tied to a specific calendar period (e.g., business cycles). We often group this with Trend.
  • Residual/Noise (R): The random, irregular component left over.

Setup and Data

Code
# Core packages
library(tidyverse)
library(lubridate)

# Time series packages
library(forecast)   # ARIMA, auto.arima, forecast, ggtsdisplay
library(tseries)    # adf.test
library(urca)       # KPSS (ur.kpss)
library(zoo)        # zoo objects
library(xts)        # xts objects

# Visualization
library(scales)

Read and inspect the data

Code
measles_data_final <- read_csv("data/measles_data_final.csv", show_col_types = FALSE)

glimpse(measles_data_final)
Rows: 156
Columns: 5
$ year          <dbl> 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 20…
$ month         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6,…
$ date          <date> 2012-01-01, 2012-02-01, 2012-03-01, 2012-04-01, 2012-05…
$ yearmonth     <chr> "2012 Jan", "2012 Feb", "2012 Mar", "2012 Apr", "2012 Ma…
$ measles_total <dbl> 2082, 410, 309, 386, 379, 149, 172, 93, 116, 145, 238, 1…
Code
# Basic checks and ordering by date
measles <- measles_data_final |>
  mutate(date = as.Date(date)) |>
  arrange(date)
# Quick head
head(measles)
# A tibble: 6 × 5
   year month date       yearmonth measles_total
  <dbl> <dbl> <date>     <chr>             <dbl>
1  2012     1 2012-01-01 2012 Jan           2082
2  2012     2 2012-02-01 2012 Feb            410
3  2012     3 2012-03-01 2012 Mar            309
4  2012     4 2012-04-01 2012 Apr            386
5  2012     5 2012-05-01 2012 May            379
6  2012     6 2012-06-01 2012 Jun            149

Notes:

  • Columns: year, month, date, yearmonth, measles_total.

Visual Inspection

Before modeling, always visualize the raw data to identify obvious patterns.

Code
measles |>
  ggplot(aes(x = date, y = measles_total)) +
  geom_line(color = "steelblue", linewidth = 0.9) +
  geom_point(color = "steelblue", size = 0.6) +
  labs(title = "Monthly Measles Cases (2012-2024)",
       x = "Date", y = "Total cases (monthly)") +
  scale_x_date(date_breaks = "2 year", date_labels = "%Y") +    
  theme_minimal(base_size = 14)

Exercise 1: Visual Inspection

  • Look at the plot on the previous slide.
  • Discuss with a partner:
    1. Do you see a trend? (Is it generally increasing or decreasing?)
    2. Do you see seasonality? (Is there a repeating pattern within each year?)

📦 Handling Time Series Data

Time Series Objects in R

  • ts: The classic base R time series object. Required for the forecast package.
  • zoo / xts: Robust options for irregular time series or finance.
  • tsibble: Modern “tidy” time series frames (used with fable).

For this module, we focus on ts objects for ARIMA modeling.

Creating a ts Object

  • Use ts() for the forecast package functions like auto.arima.
Code
# We need:
# 1. The data vector ($measles_total)
# 2. The start date (Year, Month)
# 3. The frequency (12 for monthly)
measles_ts <- ts(measles_data_final$measles_total, 
                 start = c(2012, 1), 
                 frequency = 12)
Code
# Plot it with the generic `autoplot`
autoplot(measles_ts) +
  geom_line(color = "steelblue", linewidth = 0.7) +
  labs(title = "Measles Cases as a 'ts' Object") +
  theme_minimal()

Seasonal plot to inspect monthly patterns

Code
# Seasonal plot to inspect monthly patterns
library(forecast)
ggseasonplot(measles_ts, linewidth = 0.7) + 
    scale_y_continuous(labels = label_number(scale_cut = cut_short_scale())) +
  labs(title = "Seasonal Plot of Measles Cases") +
  theme_minimal()

🔍 Diagnostics & Decomposition

What is Decomposition?

  • Decomposition means splitting the time series into its components: \(Y(t) = Trend(t) + Seasonal(t) + Residual(t)\) (Additive)

    \(Y(t) = Trend(t) \times Seasonal(t) \times Residual(t)\) (Multiplicative)

  • Additive: Use when the seasonal variation is roughly constant.

  • Multiplicative: Use when seasonal variation increases as the trend increases.

Decomposition with stl()

We’ll use stl() (Seasonal and Trend decomposition using Loess), which is robust. We apply it to our ts object.

Code
# STL Decomposition
decomp <- stl(measles_ts, s.window = "periodic")
autoplot(decomp) + theme_minimal()

What Is Stationarity?

  • A stationary series is one whose statistical properties (mean, variance, covariance) do not change over time.
  • It has no trend and no changing seasonality.
  • Why care? Most time series models (like ARIMA) assume the data is stationary.
  • How? We often difference the data (e.g., \(Y_t - Y_{t-1}\))) to make it stationary. This is the “I” (for Integrated) in ARIMA.

Types of Stationarity: Strict vs. Weak

  • Strict Stationarity:
    • The entire probability distribution of the data is constant over time.
  • Weak (Second-Order) Stationarity: 1. Mean \(E[Y_t]\) is constant (no trend). 2. Variance \(Var(Y_t)\) is constant (no changing volatility). 3. Autocorrelation \(Cov(Y_t, Y_{t-k})\) depends only on lag length \(k\), not on time \(t\).

Exercise:

  • Inspect the measles plot: does level or variance change over time? What transformation might help?

Checking for Stationarity: ACF/PACF

  • ACF (Autocorrelation Function): Measures the correlation between \(Y_t\) and \(Y_{t-k}\) (past values).
  • PACF (Partial Autocorrelation Function): Measures the correlation between \(Y_t\) and \(Y_{t-k}\) after removing the effects of the lags in between.

. . .

ACF/PACF Rules of Thumb:

  • Non-stationary (Trend): ACF plot shows a very slow, linear decay.
  • Seasonal: ACF plot shows strong spikes at the seasonal lags (12, 24, 36…).
  • Stationary AR(p): ACF decays, PACF cuts off after lag p.
  • Stationary MA(q): ACF cuts off after lag q, PACF decays.

ACF/PACF Plots in R

Code
# Create training and test sets
# Training: 2012-2023 (12 years, 144 obs)
# Test: 2024 (1 year, 12 obs)
train_ts <- window(measles_ts, end = c(2023, 12))
test_ts <- window(measles_ts, start = c(2024, 1))

Let’s look at the ACF for our training data.

Code
# We use ggAcf and ggPacf from the 'forecast' package
ggAcf(train_ts, lag.max = 36) +
  labs(title = "ACF for Measles Training Cases") + theme_bw()
  • Observation: Examine the decay pattern and spikes at seasonal lags to assess stationarity and seasonality.

Stationarity Tests

  • We use statistical tests to be more formal.
  • ADF (Augmented Dickey-Fuller) Test:
    • \(H_0\) (Null): The series is non-stationary.
    • \(H_a\) (Alternative): The series is stationary.
    • Goal: We want a small p-value (< 0.05) to reject \(H_0\) and conclude the series is stationary.

. . .

Code
library(tseries) # For adf.test

# Run ADF test on the training data
adf.test(train_ts)

    Augmented Dickey-Fuller Test

data:  train_ts
Dickey-Fuller = -2.6755, Lag order = 5, p-value = 0.2952
alternative hypothesis: stationary
  • Result: Interpret the p-value to determine whether we can conclude the data is stationary or non-stationary.

Differencing

Code
# Create a first-differenced series:
train_ts_diff1 = diff(train_ts, lag = 1)

# create a seasonal differenced series
train_ts_seas_diff <- diff(train_ts, lag = 12)

# create combined differences (non seasonal and seasonal differencing)
train_ts_comb_diff <- diff(diff(train_ts, lag = 12))
Code
# 2. Plot a first-differenced  data:
autoplot(train_ts_diff1) + 
    labs(title = "First Differenced Series") + 
  theme_minimal()
Code
# 3.Run the `adf.test()` on `train_ts_comb_diff`.
adf.test(train_ts_diff1)

    Augmented Dickey-Fuller Test

data:  train_ts_diff1
Dickey-Fuller = -6.7824, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
Code
# 2. Plot the differenced data:
autoplot(train_ts_comb_diff) + 
    labs(title = "First Differenced Series") + 
  theme_minimal()
Code
# 3.Run the `adf.test()` on `train_ts_comb_diff`.
adf.test(train_ts_comb_diff)

    Augmented Dickey-Fuller Test

data:  train_ts_comb_diff
Dickey-Fuller = -6.2147, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary

Visualizing ACF/PACF

Code
# Correlation plots for the differenced (stationary) data
ggAcf(train_ts_comb_diff, lag.max = 48) + ggtitle("ACF (Non Seasonal & Seasonal Differenced)") + theme_minimal()
ggPacf(train_ts_comb_diff, lag.max = 48) + ggtitle("PACF (Non Seasonal & Seasonal Differenced)") + theme_minimal()

ARIMA Modeling

What is ARIMA?

ARIMA (Autoregressive Integrated Moving Average) is a popular statistical method used for analyzing and forecasting time series data. It predicts future points in a series by relying on its own past values, errors, and trends.

ARIMA(p, d, q) combines three key components to model:

  • AR (p): Autoregressive part

    • \(X_t\) is a weighted sum of p past values.
    • “The value today is a function of yesterday’s value.”
  • I (d): Integrated part

    • d is the number of differences needed to make the series stationary.
    • “We model the change from yesterday, not the raw value.”
  • MA (q): Moving Average part

    • \(X_t\) is a weighted sum of q past forecast errors.
    • “The value today is a function of yesterday’s error.”

For a stationary series (after appropriate differencing), we typically model it with ARMA/ARIMA.

What is SARIMA?

SARIMA = Seasonal ARIMA. It adds seasonal components.

SARIMA(p, d, q) (P, D, Q)[s]

  • (p, d, q): The non-seasonal parts (from the previous slide).
  • (P, D, Q): The seasonal AR, I, and MA parts.
  • [s]: The seasonal period (s=12 for monthly data).

Manually finding these 6 numbers is not easy!

ARMA and SARMA Equations

For a stationary series (\(X_t\)) (after differencing), an ARMA(p,q) model is

\[ X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \cdots + \phi_p X_{t-p} + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \cdots + \theta_q \varepsilon_{t-q}, \]

or more compactly with the backshift operator (B):

\[ \phi(B) X_t = \theta(B)\varepsilon_t. \]

Where:

  • \(X_t\): observed time series at time \(t\).
  • \(\varepsilon_t\): white noise error term at time \(t\), usually \(\varepsilon_t \sim \text{i.i.d. }(0,\sigma^2)\).
  • \(\phi_1, \dots, \phi_p\): nonseasonal AR parameters.
  • \(\theta_1, \dots, \theta_q\): nonseasonal MA parameters.
  • \(\phi(B) = 1 - \phi_1 B - \cdots - \phi_p B^p\): non-seasonal AR operator
  • \(\theta(B) = 1 + \theta_1 B + \cdots + \theta_q B^q\): non-seasonal MA operator
  • \(B\): backshift (lag) operator, \(B X_t = X_{t-1}\).
  • \(p\) and \(q\): order of nonseasonal AR and MA components, respectively.

SARIMA Model Equation

SARIMA(p, d, q)(P, D, Q)[s] model:

\[ \Phi(B^s)\phi(B)(1 - B)^d (1 - B^s)^D X_t = \Theta(B^s)\theta(B)\varepsilon_t \]

Where:

  • \(\phi(B) = 1 - \phi_1 B - \cdots - \phi_p B^p\): non-seasonal AR
  • \(\theta(B) = 1 + \theta_1 B + \cdots + \theta_q B^q\): non-seasonal MA
  • \(\Phi(B^s) = 1 - \Phi_1 B^s - \cdots - \Phi_P B^{Ps}\): seasonal AR
  • \(\Theta(B^s) = 1 + \Theta_1 B^s + \cdots + \Theta_Q B^{Qs}\): seasonal MA
  • \((1-B)^d\): differencing
  • \((1-B^s)^D\): seasonal differencing
  • \(s\): seasonal period (e.g., 12 for months)
  • \(X_t\): observed time series at time \(t\).
  • \(\varepsilon_t\): white noise error term at time \(t\), usually \(\varepsilon_t \sim \text{i.i.d. }(0,\sigma^2)\).
  • \(B\): backshift (lag) operator, \(B X_t = X_{t-1}\).
  • \(s\): seasonal period (e.g., \(s = 12\) for monthly data).
  • \(d\): order of nonseasonal differencing.
  • \(D\): order of seasonal differencing.
  • \(P\) and \(Q\): order of seasonal AR and MA components, respectively.
  • \(\Phi_1, \dots, \Phi_P\): seasonal AR parameters.
  • \(\Theta_1, \dots, \Theta_Q\): seasonal MA parameters.
  • Sometimes a constant or drift term is included (handled via include.constant or include.drift in forecast::Arima).

ACF/PACF Patterns for ARMA Models

(for stationary series, after appropriate differencing)

Model ACF behavior PACF behavior
AR(p) Tails off Cuts off after lag (p)
MA(q) Cuts off after lag (q) Tails off
ARMA(p,q) Tails off Tails off

Notes:

  • “Cuts off” = near-zero after that lag.
  • “Tails off” = decays gradually, often geometrically or in a damped pattern.
  • Use these patterns on the differenced (stationary) series to guide choice of (p) and (q).

4. Building ARIMA/SARIMA Model

We’ll fit the model to our training data (train_ts).

  • Use auto.arima() for a strong baseline; validate via residual diagnostics and Ljung–Box test

  • Forecast with intervals; compare models by AIC, BIC, and AICc and accuracy

Code
# Create training and test sets: last 12 months as test
# Training: 2012-2023 (12 years, 144 obs)
# Test: 2024 (1 year, 12 obs)
train_ts <- window(measles_ts, end = c(2023, 12))
test_ts <- window(measles_ts, start = c(2024, 1))
Code
# Find the best ARIMA model automatically
library(forecast)
fit_auto <- auto.arima(
  train_ts,
  seasonal = TRUE,
  stepwise = FALSE, approximation = FALSE
)
fit_auto
Series: train_ts 
ARIMA(1,0,0)(1,0,0)[12] with non-zero mean 

Coefficients:
         ar1    sar1      mean
      0.7348  0.2745  741.1147
s.e.  0.0645  0.0917  234.1055

sigma^2 = 319695:  log likelihood = -1116.29
AIC=2240.57   AICc=2240.86   BIC=2252.45

Manual SARIMA modeling

Code
# Example manual model: seasonal difference often D=1 for monthly
# Use ACF/PACF patterns to pick p,q,P,Q; here we try SARIMA(1,1,1)(0,1,1)[12]
fit_manual <- forecast::Arima(train_ts,
                              order = c(1,1,1),
                              seasonal = c(0,1,1),
                              include.constant = FALSE)
fit_manual
Series: train_ts 
ARIMA(1,1,1)(0,1,1)[12] 

Coefficients:
         ar1      ma1     sma1
      0.5534  -0.8516  -0.7639
s.e.  0.1222   0.0774   0.1210

sigma^2 = 368540:  log likelihood = -1029.45
AIC=2066.9   AICc=2067.22   BIC=2078.4
Code
tibble(
  model = c("auto.arima", "manual SARIMA(1,1,1)(0,1,1)[12]"),
  AIC  = c(fit_auto$aic, fit_manual$aic),
  BIC = c(fit_auto$bic, fit_manual$bic),
  AICc = c(fit_auto$aicc, fit_manual$aicc)
) |> 
  arrange(AIC)
# A tibble: 2 × 4
  model                             AIC   BIC  AICc
  <chr>                           <dbl> <dbl> <dbl>
1 manual SARIMA(1,1,1)(0,1,1)[12] 2067. 2078. 2067.
2 auto.arima                      2241. 2252. 2241.

Based on the above results, select the model with the smallest selection criteria as the preferred model.

🔍 Check Residuals

  • After fitting, we must check the residuals.
  • Good residuals should look like white noise:
    1. Mean of zero.
    2. No significant autocorrelation (all ACF spikes are inside the blue lines).
    3. Normally distributed.
  • checkresiduals() runs all these diagnostics for us!
Code
# Plot diagnostics for model residuals
checkresiduals(fit_auto) 

    Ljung-Box test

data:  Residuals from ARIMA(1,0,0)(1,0,0)[12] with non-zero mean
Q* = 15.795, df = 22, p-value = 0.8259

Model df: 2.   Total lags used: 24
  • Ljung-Box Test:
    • \(H_0\): The residuals are independently distributed (no autocorrelation).
    • Goal: We want a large p-value (> 0.05) to fail to reject (H_0).
  • Observation: If the p-value is large and the ACF plot looks good, our model is likely adequate.

Forecasting

Forecasting and Forecast comparison

Code
fc_auto <- forecast::forecast(fit_auto, h = 12)
fc_manual <- forecast::forecast(fit_manual, h = 12)

autoplot(fc_auto) +
  autolayer(test_ts, series = "Actual") +
  labs(title = "Forecasts: auto.arima vs. Actual (Measles 2024)",
       y = "Total cases") +
  scale_y_continuous(labels = label_number(scale_cut = cut_short_scale())) +
  theme_minimal(base_size = 14)

Forecasting and Forecast comparison

Code
fc_auto <- forecast::forecast(fit_auto, h = 12)
fc_manual <- forecast::forecast(fit_manual, h = 12)

autoplot(fc_manual) +
  autolayer(test_ts, series = "Actual") +
  labs(title = "Forecasts: SARIMA(1,1,1)(0,1,1)[12] vs. Actual (Measles 2024)",
       y = "Total cases") +
  scale_y_continuous(labels = label_number(scale_cut = cut_short_scale())) +
  theme_minimal(base_size = 14)

5. Forecast Evaluation and Confidence Intervals

  • How good was our forecast? We can compare our forecast to the test_ts data we held back.

  • Evaluate on holdout: MAE, RMSE, MAPE

  • accuracy() gives us common error metrics

Code
acc_auto <- accuracy(fc_auto, test_ts) |> as_tibble() |> mutate(model = "auto.arima")
acc_manual <- accuracy(fc_manual, test_ts) |> as_tibble() |> mutate(model = "manual")

bind_rows(acc_auto, acc_manual) |>
  select(model, ME, MAE, RMSE, MAPE, MASE, ACF1) |>
  arrange(RMSE)
# A tibble: 4 × 7
  model           ME   MAE  RMSE  MAPE  MASE     ACF1
  <chr>        <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
1 auto.arima   -9.33  320.  559. 191.  0.621 -0.0499 
2 manual       47.1   341.  572. 140.  0.662  0.00261
3 auto.arima  842.   1414. 2023.  71.4 2.74   0.628  
4 manual     -209.   1775. 2081. 153.  3.44   0.674  

Final Forecast

Forecasting the Actual Future

  1. Our model (fit_auto) was built only on the training set.
  2. We validated it on the test set.
  3. Now, to forecast the real future, we should re-fit the model on ALL available data (measles_ts).
Code
# Example solution for the wrap-up exercise:
fit_best <- forecast::auto.arima(measles_ts, stepwise = FALSE, approximation = FALSE)
fc_best <- forecast::forecast(fit_best, h = 12)

autoplot(fc_best) +
  labs(title = "12-Month Measles Forecast on Full Series",
       y = "Total cases") +
  scale_y_continuous(labels = label_number(scale_cut = cut_short_scale())) +
  theme_minimal(base_size = 14)

Final Forecast

Code
fit_best
Series: measles_ts 
ARIMA(1,1,3) 

Coefficients:
         ar1      ma1      ma2      ma3
      0.6355  -0.7506  -0.0372  -0.1572
s.e.  0.1745   0.1824   0.1062   0.1285

sigma^2 = 483000:  log likelihood = -1232.73
AIC=2475.45   AICc=2475.85   BIC=2490.67

Session Summary

  • We learned to plot time series with ggplot2 and forecast::autoplot.
  • We used ts objects.
  • We decomposed a series with stl() and checked stationarity with adf.test() and ggAcf().
  • We built a SARIMA model automatically using auto.arima().
  • We checked model quality with checkresiduals().
  • We generated and evaluated forecasts using forecast() and accuracy().

Extensions

  • ETS (Error, Trend, Seasonal): Exponential smoothing models (often comparable to ARIMA).

  • Prophet (Meta): Handles multiple seasonalities and holidays well; robust to outliers.

  • ARIMAX/SARIMAX (ARIMA with exogenous variables)

  • Machine Learning/ Deep Learning: RNNs (LSTM/GRU/TFT), XGBoost

Exercise: Measles cases Forecasting

Using your region measles cases data, Build a valid ARIMA/SARIMA model to predict measles cases trends for the upcoming year to support health intervention planning?

References

  • Shumway & Stoffer (2017). Time Series Analysis and Its Applications with R Examples.

  • Montgomery, C. L., Jennings, M. K., & Kulahci, M. (2015). Introduction Time Series Analysis and Forecasting, New Jersey: John Willey & Sons.

  • Hyndman, Athanasopoulos (2021). Forecasting: Principles and Practice (3rd ed.). https://otexts.com/fpp3/