Introduction to ci: Confidence Intervals for Education

Vilmantas Gegzna

2025-10-19

library(ci)
library(dplyr)

About the Package

Package ci (v0.0.1) is an educational package providing intuitive functions for calculating confidence intervals (CI) for various statistical parameters. Designed primarily for teaching and learning about statistical inference (particularly confidence intervals), the package offers user-friendly wrappers around established methods for:

All functions integrate seamlessly with Tidyverse workflows, making them ideal for classroom demonstrations and student exercises.

This vignette will introduce you to the concept of CIs and show you how to use the ci package to calculate and visualize them.

Abbreviations

Functions in This Package

Function Purpose Input Data Output Data
ci_binom() Proportion CI Summary statistics (frequencies) Data frame
ci_multinom() Proportion CI Summary statistics (frequencies) Data frame
ci_mean_t() Mean CI Data frame Data frame
ci_mean_t_stat() Mean CI Summary statistics (M, SD, n) Data frame
ci_boot() Any statistic CI Data frame Data frame

Quick Reference

The most common use cases:

# Binomial CI: two categories (e.g., mutant/wild-type)
ci_binom(x = 54, n = 80)

# Multinomial CI: 3+ categories (e.g., blood types)   
ci_multinom(c("A" = 20, "B" = 35, "AB" = 25, "O" = 15))

# Mean CI based on data: average with groups
data |> group_by(group_var) |> ci_mean_t(numeric_var)

# Mean CI based on summary statistics (mean, SD, n): e.g., literature values
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25)

# Any statistic (median, correlation, etc.)
data |> ci_boot(var1, var2, FUN = cor, R = 1000)

The Basics of Confidence Intervals

What is a Confidence Interval?

Confidence interval (CI) is a fundamental concept in statistics, providing a range of values that likely contain the true population parameter. They help quantify the uncertainty associated with sample estimates.

Imagine you want to know the average height of all students in your university. You can’t measure everyone, so you measure 100 students and find their average height is 170 cm. But would a different group of 100 students give exactly the same average? Probably not.

A confidence interval gives you a range of plausible values for the true population average. Instead of saying “the average is 170 cm,” you might say “I’m 95% confident the true average is between 168 cm and 172 cm.”

Key Concepts

Population vs Sample

Parameter vs Statistic

Confidence Level

Key Point: The interval either contains the true value or it doesn’t. The “95%” refers to how often the method works, not the probability for any single interval.

What Affects Confidence Interval Width?

  1. Sample size: Larger samples → narrower intervals (more precise)
  2. Confidence level: Higher confidence → wider intervals
  3. Data variability: More spread in data → wider intervals

Terms You Should Know

Common Mistakes to Avoid

Wrong: “95% probability the true value is in this interval”
Right: “We’re 95% confident the true value is in this interval”

Wrong: Making intervals narrower by changing confidence level
Right: Choose confidence level before calculating (usually 95%)

Practical Tips

Choosing Confidence Level:

Interpreting Results:

Sample Size Planning:

CIs of Proportion

Binomial Proportions: Two Categories

Binomial proportion CIs measure the uncertainty around a proportion in binary outcomes (i.e., two categories like mutant/wild-type, diseased/healthy, success/failure, yes/no).

To calculate confidence intervals in this case:

  1. Create a frequency table of values.
  2. Use frequency of value of interest as x and total n as input to ci_binom().

Scenario: You investigate 80 Drosophila specimens (n = 80) and 54 have no mutation (x = 54).

# 54 out of 80 Drosophila are mutation-free
ci_binom(x = 54, n = 80)
#> # A tibble: 1 × 5
#>     est lwr.ci upr.ci     x     n
#>   <dbl>  <dbl>  <dbl> <int> <int>
#> 1 0.675  0.566  0.768    54    80

Interpretation: We are 95% confident that between 56.6% and 76.8% of the population are mutation-free.

Effect of Sample Size

Let’s simulate results with same proportion (67.5%) but different sample sizes:

ci_n1 <- ci_binom(x = 54, n = 80)        # Small sample
ci_n2 <- ci_binom(x = 135, n = 200)      # Larger sample  
ci_n3 <- ci_binom(x = 675, n = 1000)     # Even larger sample

Now let’s merge all the results and calculate difference between upper and lower CI bounds:

bind_rows(ci_n1, ci_n2, ci_n3) |>
  mutate(diff_upr_lwr = round(upr.ci - lwr.ci, digits = 3))
#> # A tibble: 3 × 6
#>     est lwr.ci upr.ci     x     n diff_upr_lwr
#>   <dbl>  <dbl>  <dbl> <int> <int>        <dbl>
#> 1 0.675  0.566  0.768    54    80        0.201
#> 2 0.675  0.607  0.736   135   200        0.129
#> 3 0.675  0.645  0.703   675  1000        0.058

💡 Notice: Larger samples give narrower (more precise) intervals. This is true not only for proportions but also for means and other statistics.

Effect of Confidence Levels

Let’s see how confidence level affects interval width:

ci_q1 <- ci_binom(x = 54, n = 80, conf.level = 0.90)  # Less confident, narrower
ci_q2 <- ci_binom(x = 54, n = 80, conf.level = 0.95)  # Standard
ci_q3 <- ci_binom(x = 54, n = 80, conf.level = 0.99)  # More confident, wider
bind_rows(ci_q1, ci_q2, ci_q3) |>
  mutate(diff_upr_lwr = round(upr.ci - lwr.ci, digits = 3))
#> # A tibble: 3 × 6
#>     est lwr.ci upr.ci     x     n diff_upr_lwr
#>   <dbl>  <dbl>  <dbl> <int> <int>        <dbl>
#> 1 0.675  0.584  0.754    54    80        0.17 
#> 2 0.675  0.566  0.768    54    80        0.201
#> 3 0.675  0.531  0.792    54    80        0.261

💡 Notice: Higher confidence levels give wider intervals.

⚠️ This is just an educational example. Always choose the confidence level based on your study design before calculating CIs.

Multinomial Proportions: Several Categories

Multinomial proportion CIs measure uncertainty around proportions in categorical outcomes with 3 or more categories (e.g., A/B/AB/O blood types, eye colors, genotypes).

In this case:

  1. Create a frequency table of values.
  2. Use frequencies as input to ci_multinom().

Scenario: Blood type distribution in patients with a certain medical condition.

# Blood type distribution
blood_types <- c("A" = 20, "B" = 35, "AB" = 25, "O" = 15)
ci_multinom(blood_types)
#> # A tibble: 4 × 6
#>   group   est lwr.ci upr.ci     x     n
#>   <fct> <dbl>  <dbl>  <dbl> <int> <int>
#> 1 A     0.211 0.126   0.331    20    95
#> 2 B     0.368 0.257   0.497    35    95
#> 3 AB    0.263 0.167   0.388    25    95
#> 4 O     0.158 0.0860  0.272    15    95

Scenario: You survey lab members about their preferred method of commuting to the university.

# Transportation preferences
c("Car" = 20, "Bus" = 45, "Bike" = 15, "Walk" = 20) |>
  ci_multinom(method = "goodman")
#> # A tibble: 4 × 6
#>   group   est lwr.ci upr.ci     x     n
#>   <fct> <dbl>  <dbl>  <dbl> <int> <int>
#> 1 Car    0.2  0.119   0.316    20   100
#> 2 Bus    0.45 0.332   0.574    45   100
#> 3 Bike   0.15 0.0816  0.259    15   100
#> 4 Walk   0.2  0.119   0.316    20   100

Interpretation: We’re 95% confident that the true proportion of people who prefer taking the bus is between 32.7% and 57.7%.

In this case, simultaneous CIs are calculated. This means that CIs are calculated for each category at once, ensuring that the overall confidence level (e.g., 95%) is maintained across all categories.

CIs of Mean (Average)

Mean CIs: From Data

To calculate confidence intervals for means directly from data frames, use ci_mean_t().

Overall Mean (No Groups)

Scenario: You are investigating crop yield.

# Using built-in dataset as example
data(npk, package = "datasets")
head(npk, n = 3)
#>   block N P K yield
#> 1     1 0 1 1  49.5
#> 2     1 1 1 0  62.8
#> 3     1 0 0 0  46.8
# Average crop yield (overall)
npk |>
  ci_mean_t(yield)
#> # A tibble: 1 × 3
#>    mean lwr.ci upr.ci
#>   <dbl>  <dbl>  <dbl>
#> 1  54.9   52.3   57.5

Interpretation: We’re 95% confident the true average yield is between 51.9 and 56.9.

Comparing Groups

Scenario: You are investigating crop yield in different conditions (with/without fertilizers).

# Compare yields with and without nitrogen fertilizer
npk |>
  group_by(N) |>
  ci_mean_t(yield)
#> # A tibble: 2 × 4
#>   N      mean lwr.ci upr.ci
#>   <fct> <dbl>  <dbl>  <dbl>
#> 1 0      52.1   48.6   55.5
#> 2 1      57.7   54.0   61.4

Look at the confidence intervals: if they don’t overlap, it suggests a potential difference between groups.

# Multiple grouping factors
npk |>
  group_by(N, P, K) |>
  ci_mean_t(yield)
#> # A tibble: 8 × 6
#>   N     P     K      mean lwr.ci upr.ci
#>   <fct> <fct> <fct> <dbl>  <dbl>  <dbl>
#> 1 0     1     1      50.5   44.6   56.4
#> 2 1     1     0      57.9   44.3   71.5
#> 3 0     0     0      51.4   40.0   62.9
#> 4 1     0     1      54.7   44.2   65.1
#> 5 1     0     0      63.8   51.1   76.4
#> 6 1     1     1      54.4   41.9   66.8
#> 7 0     0     1      52     38.0   66.0
#> 8 0     1     0      54.3   31.0   77.7


Scenario: Iris flower measurements.

data(iris, package = "datasets")
head(iris, n = 3)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
# Petal length by flower species
iris |>
  group_by(Species) |>
  ci_mean_t(Petal.Length)
#> # A tibble: 3 × 4
#>   Species     mean lwr.ci upr.ci
#>   <fct>      <dbl>  <dbl>  <dbl>
#> 1 setosa      1.46   1.41   1.51
#> 2 versicolor  4.26   4.13   4.39
#> 3 virginica   5.55   5.40   5.71

Interpretation: The results suggest that the three species have clearly different petal lengths (no overlapping intervals).

Mean CI: From Summary Statistics

When you only have descriptive statistics (from literature, reports, etc.), you may use them directly.

Single Group

Scenario: You read a study that reports “mean = 75, SD = 10, n = 25” but doesn’t give confidence intervals.

# Calculate CI from reported statistics
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25)
#> # A tibble: 1 × 6
#>   group  mean lwr.ci upr.ci    sd     n
#>   <fct> <dbl>  <dbl>  <dbl> <dbl> <int>
#> 1 ""       75   70.9   79.1    10    25

Comparing Multiple Groups

Scenario: You compare 3 studies.

You may enter data as vectors:

# Three enzyme activity measurements
mean_val <- c(78, 82, 75)
std_dev  <- c(8,   7,  9)
n        <- c(30, 28, 32)
group    <- c("Enzyme A", "Enzyme B", "Enzyme C")

ci_mean_t_stat(mean_val, std_dev, n, group)
#> # A tibble: 3 × 6
#>   group     mean lwr.ci upr.ci    sd     n
#>   <fct>    <dbl>  <dbl>  <dbl> <dbl> <int>
#> 1 Enzyme A    78   75.0   81.0     8    30
#> 2 Enzyme B    82   79.3   84.7     7    28
#> 3 Enzyme C    75   71.8   78.2     9    32

You may enter data as a data frame too. Then use either $ or with() based syntax:

# Three different treatment conditions from literature
treatments <- 
  tibble(
    treatment = c("Control", "Low dose", "High dose"),
    mean_response = c(78, 82, 75),
    sd = c(8, 7, 9),
    n = c(30, 28, 32)
  )

treatments |>
  with(ci_mean_t_stat(mean_response, sd, n, treatment))
#> # A tibble: 3 × 6
#>   group      mean lwr.ci upr.ci    sd     n
#>   <fct>     <dbl>  <dbl>  <dbl> <dbl> <int>
#> 1 Control      78   75.0   81.0     8    30
#> 2 Low dose     82   79.3   84.7     7    28
#> 3 High dose    75   71.8   78.2     9    32

Exploring Sample Size Effects

# Same mean and SD but four different sample sizes
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = c(10, 25, 50, 100))
#> # A tibble: 4 × 6
#>   group  mean lwr.ci upr.ci    sd     n
#>   <fct> <dbl>  <dbl>  <dbl> <dbl> <int>
#> 1 ""       75   67.8   82.2    10    10
#> 2 ""       75   70.9   79.1    10    25
#> 3 ""       75   72.2   77.8    10    50
#> 4 ""       75   73.0   77.0    10   100

Larger samples yield narrower (more precise) confidence intervals. Doubling sample size doesn’t halve interval width: the relationship is not linear.

Bootstrap Confidence Intervals

Bootstrap is useful when:

⚠️ Warning: Bootstrap needs adequate sample sizes (not fewer than 20 per group, preferably 30+ per group).

CI for Single-Variable Statistics

Single-variable statistics need only one variable to be computed. Examples are mean, median, SD, etc.

Median CI

set.seed(123)  # For reproducible results

# Median petal length for all flowers
iris |>
  ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "bca")
#> # A tibble: 1 × 3
#>   median lwr.ci upr.ci
#>    <dbl>  <dbl>  <dbl>
#> 1   4.35      4   4.54

Median CI by Group

set.seed(456)

# Median by species
iris |>
  group_by(Species) |>
  ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "bca")
#> # A tibble: 3 × 4
#>   Species    median lwr.ci upr.ci
#>   <fct>       <dbl>  <dbl>  <dbl>
#> 1 setosa       1.5     1.4    1.5
#> 2 versicolor   4.35    4.1    4.5
#> 3 virginica    5.55    5.2    5.6

Standard Deviation CI

When to use: CIs of variability measures (SD, IQR, etc.) should be used when you want to know how consistent or variable your data is.

set.seed(789)

# How variable is petal length?
iris |>
  ci_boot(Petal.Length, FUN = sd, R = 1000, bci.method = "bca")
#> # A tibble: 1 × 3
#>      sd lwr.ci upr.ci
#>   <dbl>  <dbl>  <dbl>
#> 1  1.77   1.66   1.88

CIs for Variable-Pair Statistics

Variable-pair statistics need two variables to be computed. Examples are correlation coefficients (Pearson, Spearman, Kendall), Cramér’s V, Goodman-Kruskal Tau and Lambda, etc.

Pearson correlation measures the strength of linear relationship between two continuous variables:

set.seed(101)

# Relationship between petal length and width
iris |>
  group_by(Species) |>
  ci_boot(
    Petal.Length, Petal.Width,
    FUN = cor, method = "pearson",
    R = 1000, bci.method = "bca"
  )
#> # A tibble: 3 × 4
#>   Species      cor lwr.ci upr.ci
#>   <fct>      <dbl>  <dbl>  <dbl>
#> 1 setosa     0.332 0.0649  0.517
#> 2 versicolor 0.787 0.665   0.868
#> 3 virginica  0.322 0.111   0.514

Note 1: method is an argument of function cor().
Note 2: If the CI does not include 0, there is likely a real correlation.

Spearman correlation measures rank (monotonic non-linear) relationship between two continuous variables:

set.seed(101)

# Correlation between petal length and width
iris |>
  group_by(Species) |>
  ci_boot(
    Petal.Length, Petal.Width,
    FUN = cor, method = "spearman",
    R = 1000, bci.method = "bca"
  )
#> # A tibble: 3 × 4
#>   Species      cor   lwr.ci upr.ci
#>   <fct>      <dbl>    <dbl>  <dbl>
#> 1 setosa     0.271 -0.00466  0.510
#> 2 versicolor 0.787  0.615    0.875
#> 3 virginica  0.363  0.111    0.591

Results: All three species show strong positive correlations.

Bootstrap Methods: When to Use Which

The package offers several bootstrap methods. They are set using the bci.method argument. Different methods can give slightly different results. Let’s look at two examples:

set.seed(222)

# Percentile method (simpler, more robust)
iris |> ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "perc")
#> # A tibble: 1 × 3
#>   median lwr.ci upr.ci
#>    <dbl>  <dbl>  <dbl>
#> 1   4.35      4   4.55

# BCa method (often more accurate)
iris |> ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "bca")
#> # A tibble: 1 × 3
#>   median lwr.ci upr.ci
#>    <dbl>  <dbl>  <dbl>
#> 1   4.35      4    4.5

Use BCa (default) when:

Use percentile when:

Learning Exercises

Exercise 1: Understanding Confidence Levels

# Same data, different confidence levels
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25, conf.level = 0.90)
#> # A tibble: 1 × 6
#>   group  mean lwr.ci upr.ci    sd     n
#>   <fct> <dbl>  <dbl>  <dbl> <dbl> <int>
#> 1 ""       75   71.6   78.4    10    25
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25, conf.level = 0.95)
#> # A tibble: 1 × 6
#>   group  mean lwr.ci upr.ci    sd     n
#>   <fct> <dbl>  <dbl>  <dbl> <dbl> <int>
#> 1 ""       75   70.9   79.1    10    25
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25, conf.level = 0.99)
#> # A tibble: 1 × 6
#>   group  mean lwr.ci upr.ci    sd     n
#>   <fct> <dbl>  <dbl>  <dbl> <dbl> <int>
#> 1 ""       75   69.4   80.6    10    25

Questions:

  1. What happens to interval width as confidence level increases? Why?

Exercise 2: Sample Size Effect

# Effect of sample size on precision
ci_mean_t_stat(mean_ = 85, sd_ = 15, n = c(10, 25, 50, 100, 400))
#> # A tibble: 5 × 6
#>   group  mean lwr.ci upr.ci    sd     n
#>   <fct> <dbl>  <dbl>  <dbl> <dbl> <int>
#> 1 ""       85   74.3   95.7    15    10
#> 2 ""       85   78.8   91.2    15    25
#> 3 ""       85   80.7   89.3    15    50
#> 4 ""       85   82.0   88.0    15   100
#> 5 ""       85   83.5   86.5    15   400

Questions:

  1. Which sample size gives the most precise estimate?
  2. Is the improvement from n = 25 to n = 50 the same as from n = 50 to n = 100?

Exercise 3: Overlapping vs Non-overlapping Intervals

# Two treatment methods
ci_mean_t_stat(
  mean_ = c(78, 82), 
  sd_ = c(8, 7), 
  n = c(30, 28), 
  group = c("Treatment A", "Treatment B")
)
#> # A tibble: 2 × 6
#>   group        mean lwr.ci upr.ci    sd     n
#>   <fct>       <dbl>  <dbl>  <dbl> <dbl> <int>
#> 1 Treatment A    78   75.0   81.0     8    30
#> 2 Treatment B    82   79.3   84.7     7    28

Questions:

  1. Do the confidence intervals overlap? What might this suggest about the difference between treatments?

Getting Help