Overview

MultiLevelOptimalBayes (MLOB) is designed for estimating two-level latent variable models, particularly in small sample settings. This is especially useful in psychology, education, and other fields with hierarchical or nested data structures. We present the R package MultiLevelOptimalBayes (MLOB) for estimating between-group effects in multilevel latent variable models. MLOB employs a regularised Bayesian estimator devised by Dashuk, Hecht, Luedtke, Robitzsch, and Zitzmann (2025a), which was subsequently enhanced for additional covariates by the same authors. This estimator chooses prior parameters to minimise the mean squared error (MSE) of the between-group effect by effectively balancing bias and variance. The regularised Bayesian estimator provides MSE-optimal estimations due to the mean-variance tradeoff, especially in scenarios of small sample sizes and poor intraclass correlation (ICC). The MLOB software supports imbalanced group sizes through integrated data-balancing methods and offers comprehensive inference, including point estimates, standard errors, p-values, and confidence intervals for both primary regressors and covariates. To gain comprehensive understanding, we initially examine the theoretical underpinnings of the regularised Bayesian estimator (Dashuk et al. 2025a, 2025b), followed by a discussion of its implementation in MLOB, namely the core function mlob(). We illustrate the application of mlob() using real datasets. Consequently, we provide researchers in psychology, education, and related disciplines a robust, user-friendly instrument for dependable multilevel latent variable estimation, particularly in contexts characterised by small sample sizes and low ICCs.

The core function mlob() estimates the between-group coefficient (beta_b) using a regularized Bayesian approach, and applies a data balancing procedure if the groups are unbalanced.

Key Features

Suitable for small-sample, multilevel data
Performs regularized Bayesian estimation
Automatically balances data if necessary
Optional jackknife resampling for standard errors
Provides comparison with unregularized ML estimates

Function Usage

Below is the signature for the mlob() function. This shows the arguments an theit default values you can pass, but note that this chunk is not meant to be executed.

mlob(
  formula,
  data,
  group,
  balancing.limit = 0.2,
  conf.level = 0.95,
  jackknife = FALSE,
  punish.coeff = 2,
  ...
)

Arguments:

formula: A formula (e.g., Y ~ X + C) where Y is the outcome, X is the context variable of interest, and C represents covariates.
data: A data.frame containing all variables in the formula and the grouping variable.
group: A string naming the grouping variable.
balancing.limit: Proportion (0-1) of the dataset that can be removed to balance group sizes. Default is 0.2.
conf.level: Confidence level for confidence intervals. Default is 0.95 (95% CI).
jackknife: Logical. If TRUE, standard errors and CIs are computed via jackknife resampling. Default is FALSE.
punish.coeff: Multiplier for penalizing removal of entire groups during balancing. Higher values discourage full-group deletions.

Balancing Procedure: The mlob() function also verifies whether the data is balanced, that is each group consist of exactly the same number of individuals. If the data is unbalanced, the balancing procedure comes into effect and identifies the optimal number of individuals and groups to delete based on the punishment coefficient. If the amount of data to be deleted is more than the threshold (balancing.limit), the regularized Bayesian estimator is not calculated and mlob() produces an error. This forces the user to increase the balancing limit manually and warns that the results should be interpreted with caution. # Examples

Example 1: Iris Dataset

result_iris <- mlob(
  Sepal.Length ~ Sepal.Width + Petal.Length,
  data = iris,
  group = "Species",
  conf.level = 0.99,
  jackknife = FALSE
)

summary(result_iris)
#> Call:
#>  mlob(Sepal.Length ~ Sepal.Width + Petal.Length, data = iris, group = Species, conf.level = 0.99, jackknife = FALSE) 
#> 
#> Summary of Coefficients:
#>                     Estimate Std. Error Lower CI (99%) Upper CI (99%)   Z value
#> beta_b             0.8308711  1.4655556     -2.9441499       4.605892 0.5669325
#> gamma_Petal.Length 0.4679522  0.2582579     -0.1972762       1.133181 1.8119567
#>                      Pr(>|z|) Significance
#> beta_b             0.57076004             
#> gamma_Petal.Length 0.06999289            .
#> 
#> 
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#>                     Estimate   Std. Error Lower CI (99%) Upper CI (99%)
#> beta_b             0.6027440 5.424780e+15  -1.397331e+16   1.397331e+16
#> gamma_Petal.Length 0.4679522 2.582579e-01  -1.972762e-01   1.133181e+00
#>                         Z value   Pr(>|z|) Significance
#> beta_b             1.111094e-16 1.00000000             
#> gamma_Petal.Length 1.811957e+00 0.06999289            .
#> 
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Note:
#>   The standard error from unoptimized ML estimation is about 3.701518e+17% larger than the standard error obtained through our optimization procedure,
#>   meaning that the optimized estimates are more accurate.
#>   Concerning the estimates themselves, the unoptimized ML estimates may
#>   differ greatly from the optimized estimates and should not be reported.
#>   As the optimized estimates are always at least as accurate as the
#>   unoptimized ML estimates,
#>   please use them and their corresponding standard errors (first table of
#>   output) for interpretation and reporting.
#>   For more information, see Dashuk et al. (2025a).

Example 2: Slightly Unbalanced ChickWeight Dataset

result_chick <- mlob(
  weight ~ Time,
  data = ChickWeight,
  group = "Diet",
  punish.coeff = 1.5,
  jackknife = FALSE
)

print(result_chick)
#> Call:
#>  mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE) 
#> 
#> Coefficients
#>    beta_b
#>  4.404382
#> 
#> Standard_Error
#>    beta_b
#>  334.2915
#> 
#> Confidence_Interval (95%)
#>            Lower    Upper
#> beta_b -650.7949 659.6036
#> 
#> Z value
#>      beta_b
#>  0.01317528
#> 
#> p value
#>    beta_b
#>  0.989488
summary(result_chick)
#> Call:
#>  mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE) 
#> 
#> Summary of Coefficients:
#>        Estimate Std. Error Lower CI (95%) Upper CI (95%)    Z value Pr(>|z|)
#> beta_b 4.404382   334.2915      -650.7949       659.6036 0.01317528 0.989488
#>        Significance
#> beta_b             
#> 
#> 
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#>        Estimate Std. Error Lower CI (95%) Upper CI (95%)     Z value  Pr(>|z|)
#> beta_b 3.137082     479.57      -936.8029        943.077 0.006541447 0.9947807
#>        Significance
#> beta_b             
#> 
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Note:
#>   The standard error from unoptimized ML estimation is about 43.46% larger than the standard error obtained through our optimization procedure,
#>   meaning that the optimized estimates are more accurate.
#>   Concerning the estimates themselves, the unoptimized ML estimates may
#>   differ greatly from the optimized estimates and should not be reported.
#>   As the optimized estimates are always at least as accurate as the
#>   unoptimized ML estimates,
#>   please use them and their corresponding standard errors (first table of
#>   output) for interpretation and reporting.
#>   For more information, see Dashuk et al. (2025a).

** Interperation of the results for Chickenweight Dataset **

Estimated beta_b and additional covariates effects, denoted as gamma_Diet3 and gamma_Diet4, are included in the end result table. We did not incorporate any additional covariates into the mlob() formula. However, due to the fact that Diet is a four-level factor, mlob() considers Diet 1 to be the primary regressor (X), excludes Diet 2 from the design matrix to prevent multicollinearity, and reports the remaining levels as gamma_Diet3 and gamma_Diet4, using Diet 2 as the reference. The results of the estimation indicate a substantial distinction between the regularised Bayesian estimator (first table) and the ML estimator (second table). Beta_b’s regularised Bayesian estimator is 5.108, with a statistically significant p-value of 0.0139. This implies that the average weight of chicks on Diet 1 is approximately 5g more than that of chicks on Diet 2 at each time point. In contrast, the ML estimate of the between-group effect beta_b is significantly larger (13.993) but not statistically significant (p value of 0.3910), demonstrating how standard ML can overstate the group-level effect when data is scarce. The positive estimate of gamma_Diet3 is 41.69 and is highly significant for both estimators (p = 0.0048 in MultiLevelOptimalBayes (MLOB)). This suggests that chicks on Diet 3 have a significant increase in weight in comparison to the baseline group. Gamma_Diet4 has an estimate of 29.64, but its p-value of 0.0504 is marginally significant at the 5% level. This implies that Diet 4 has a positive impact on weight; however, the magnitude of this effect is less significant than that of Diet 3.

Example 3: Highly Unbalanced mtcars Dataset

result_mtcars <- mlob(
  mpg ~ hp + wt + am + hp:wt + hp:am,
  data = mtcars,
  group = "cyl",
  balancing.limit = 0.35
)

summary(result_mtcars)
#> Call:
#>  mlob(mpg ~ hp + wt + am + hp:wt + hp:am, data = mtcars, group = cyl, balancing.limit = 0.35, conf.level = 0.95) 
#> 
#> Summary of Coefficients:
#>                 Estimate  Std. Error Lower CI (95%) Upper CI (95%)     Z value
#> beta_b      -0.094774009         Inf           -Inf            Inf  0.00000000
#> gamma_wt     0.129605570 10.03113447   -19.53105671    19.79026785  0.01292033
#> gamma_am     8.243531222  5.02593394    -1.60711830    18.09418074  1.64019888
#> gamma_hp:wt -0.003592393  0.04892602    -0.09948564     0.09230085 -0.07342499
#> gamma_hp:am -0.063955624  0.16654503    -0.39037788     0.26246663 -0.38401401
#>              Pr(>|z|) Significance
#> beta_b      1.0000000             
#> gamma_wt    0.9896914             
#> gamma_am    0.1009638             
#> gamma_hp:wt 0.9414679             
#> gamma_hp:am 0.7009681             
#> 
#> 
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#>                 Estimate   Std. Error Lower CI (95%) Upper CI (95%)
#> beta_b      -0.094774009 3.755040e+14  -7.359744e+14   7.359744e+14
#> gamma_wt     0.129605570 1.003113e+01  -1.953106e+01   1.979027e+01
#> gamma_am     8.243531222 5.025934e+00  -1.607118e+00   1.809418e+01
#> gamma_hp:wt -0.003592393 4.892602e-02  -9.948564e-02   9.230085e-02
#> gamma_hp:am -0.063955624 1.665450e-01  -3.903779e-01   2.624666e-01
#>                   Z value  Pr(>|z|) Significance
#> beta_b      -2.523915e-16 1.0000000             
#> gamma_wt     1.292033e-02 0.9896914             
#> gamma_am     1.640199e+00 0.1009638             
#> gamma_hp:wt -7.342499e-02 0.9414679             
#> gamma_hp:am -3.840140e-01 0.7009681             
#> 
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Note:

Output

The output is an object of class mlob_result, which contains:

Estimated coefficients (beta_b and gamma values)
Standard errors and confidence intervals
Unregularized ML estimates for comparison
Information about the data balancing procedure (if used)

Available S3 Methods

The mlob_result object supports a comprehensive set of S3 methods that follow standard R conventions, making it easy to work with results in familiar ways. Here are all available methods:

Display Methods

# Get a basic result for demonstration
result <- mlob(weight ~ Time, data = ChickWeight, group = 'Diet', jackknife = FALSE)

# Print method - displays coefficients, standard errors, confidence intervals, Z-values, and p-values
print(result)
#> Call:
#>  mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE) 
#> 
#> Coefficients
#>    beta_b
#>  35.38337
#> 
#> Standard_Error
#>    beta_b
#>  255.9379
#> 
#> Confidence_Interval (95%)
#>            Lower    Upper
#> beta_b -466.2456 537.0124
#> 
#> Z value
#>     beta_b
#>  0.1382498
#> 
#> p value
#>    beta_b
#>  0.890043

# Summary method - comprehensive summary with significance stars and comparison to unoptimized ML
summary(result)
#> Call:
#>  mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE) 
#> 
#> Summary of Coefficients:
#>        Estimate Std. Error Lower CI (95%) Upper CI (95%)   Z value Pr(>|z|)
#> beta_b 35.38337   255.9379      -466.2456       537.0124 0.1382498 0.890043
#>        Significance
#> beta_b             
#> 
#> 
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#>        Estimate Std. Error Lower CI (95%) Upper CI (95%)    Z value Pr(>|z|)
#> beta_b 19.35236   282.5233      -534.3832       573.0879 0.06849825 0.945389
#>        Significance
#> beta_b             
#> 
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Note:
#>   The standard error from unoptimized ML estimation is about 10.39% larger than the standard error obtained through our optimization procedure,
#>   meaning that the optimized estimates are more accurate.
#>   Concerning the estimates themselves, the unoptimized ML estimates may
#>   differ greatly from the optimized estimates and should not be reported.
#>   As the optimized estimates are always at least as accurate as the
#>   unoptimized ML estimates,
#>   please use them and their corresponding standard errors (first table of
#>   output) for interpretation and reporting.
#>   For more information, see Dashuk et al. (2025a).

Statistical Methods

# Extract coefficients as a data frame
coef(result)
#>     beta_b
#> 1 35.38337

# Extract standard errors
se(result)
#>   beta_b 
#> 255.9379

# Extract variance-covariance matrix (diagonal only)
vcov(result)
#>   beta_b 
#> 65504.19

# Extract confidence intervals
confint(result)
#>             2.5%    97.5%
#> beta_b -466.2456 537.0124

# Extract confidence intervals for specific parameters
confint(result, "beta_b")
#>             2.5%    97.5%
#> beta_b -466.2456 537.0124

# Extract confidence intervals with different confidence level
confint(result, level = 0.99)
#>             0.5%    99.5%
#> beta_b -623.8689 694.6356

Utility Methods

# Convert results to a data frame format
as.data.frame(result)
#>        Estimate Std. Error Lower CI (95%) Upper CI (95%)   Z value Pr(>|z|)
#> beta_b 35.38337   255.9379      -466.2456       537.0124 0.1382498 0.890043

# Get dimensions (number of parameters)
dim(result)
#> [1] 1 1

# Get number of parameters
length(result)
#> [1] 1

# Get parameter names
names(result)
#> [1] "beta_b"

Update Method

# Update model with new parameters (e.g., different confidence level)
updated_result <- update(result, conf.level = 0.99)
summary(updated_result)
#> Call:
#>  mlob(weight ~ Time, data = data, group = Diet, conf.level = 0.99, jackknife = FALSE) 
#> 
#> Summary of Coefficients:
#>        Estimate Std. Error Lower CI (99%) Upper CI (99%)   Z value Pr(>|z|)
#> beta_b 35.38337   255.9379      -623.8689       694.6356 0.1382498 0.890043
#>        Significance
#> beta_b             
#> 
#> 
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#>        Estimate Std. Error Lower CI (99%) Upper CI (99%)    Z value Pr(>|z|)
#> beta_b 19.35236   282.5233      -708.3796       747.0843 0.06849825 0.945389
#>        Significance
#> beta_b             
#> 
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Note:
#>   The standard error from unoptimized ML estimation is about 10.39% larger than the standard error obtained through our optimization procedure,
#>   meaning that the optimized estimates are more accurate.
#>   Concerning the estimates themselves, the unoptimized ML estimates may
#>   differ greatly from the optimized estimates and should not be reported.
#>   As the optimized estimates are always at least as accurate as the
#>   unoptimized ML estimates,
#>   please use them and their corresponding standard errors (first table of
#>   output) for interpretation and reporting.
#>   For more information, see Dashuk et al. (2025a).

Discovering Available Methods

You can discover all available methods for mlob_result objects using:

methods(class = "mlob_result")
#>  [1] as.data.frame coef          confint       dim           length       
#>  [6] names         print         se            summary       update       
#> [11] vcov         
#> see '?methods' for accessing help and source code

Practical Example: Working with Results

Here’s a practical example showing how to use multiple methods together:

# Run analysis
result <- mlob(weight ~ Time, data = ChickWeight, group = 'Diet', jackknife = FALSE)

# Get basic information
cat("Number of parameters:", length(result), "\n")
#> Number of parameters: 1
cat("Parameter names:", paste(names(result), collapse = ", "), "\n")
#> Parameter names: beta_b

# Extract key statistics
coefficients <- coef(result)
standard_errors <- se(result)
confidence_intervals <- confint(result, level = 0.99)

# Create a custom summary table
custom_summary <- data.frame(
  Parameter = names(result),
  Estimate = as.numeric(coefficients),
  SE = as.numeric(standard_errors),
  CI_Lower = confidence_intervals[, 1],
  CI_Upper = confidence_intervals[, 2]
)

print(custom_summary)
#>   Parameter Estimate      SE  CI_Lower CI_Upper
#> 1    beta_b 6.706054 525.358 -1346.527 1359.939

All these methods follow standard R conventions, making your mlob_result objects compatible with existing R workflows and familiar to users of other statistical packages.

Limitations

While MultiLevelOptimalBayes provides a robust solution for regularized estimation in two-level models, users should be aware of the following limitations:

Local Grid Search:
The optimization is restricted to a local 5*sigma region around the ML estimate. Although this region captures the true MSE minimum with high probability (>= 0.999 for J = 5), it does not guarantee identification of the global minimum.
Assumption of Equal Group Sizes:
The estimator assumes equal group sizes to simplify the model. While averaging group sizes is a proposed solution, the method does not yet handle unbalanced group sizes natively, but find the optimal reduction to the balanced size.
Jackknife Computational Cost:
Jackknife resampling improves interval coverage in small samples. Note that it may be computationally demanding for larger samples.
Limited Model Scope:
Currently, MLOB only handles two-level models with continuous outcomes. Extensions to support generalized linear mixed models, three- and more-level structures, or multivariate outcomes are not yet available.

References

Dashuk, V., Hecht, M., Luedtke, O., Robitzsch, A., & Zitzmann, S. (2025a). An Optimally Regularized Estimator of Multilevel Latent Variable Models, with Improved MSE Performance. https://doi.org/10.1017/psy.2025.10045

Dashuk, V., Hecht, M., Lüdtke, O., Robitzsch, A., & Zitzmann, S. (2025b). Estimating context effects in small samples while controlling for covariates: an optimally regularized Bayesian estimator for multilevel latent variable models. https://doi.org/10.1007/s41237-025-00264-7

Luedtke, O., Marsh, H. W., Robitzsch, A., et al. (2008).
The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies.
https://doi.org/10.1037/a0012869

Authors

Valerii Dashuk
Binayak Timilsina
Martin Hecht
Steffen Zitzmann

Contact:
vadashuk@gmail.com
binayak.timilsina001@gmail.com
martin.hecht@hsu-hh.de
steffen.zitzmann@medicalschool-hamburg.de

MultiLevelOptimalBayes-Intro