Note: This vignette requires suggested packages. Install them with:
install.packages(c("ggplot2", "gridExtra", "zoo", "forecast", "scales", "moments"))
This vignette illustrates a complete \(\beta\)ARMA modeling workflow applied to
monthly relative humidity data from Brasília, Brazil, using the
betaARMA package. The analysis has two complementary
objectives.
The first objective concerns numerical stability. Conditional maximum likelihood estimation (CMLE) of \(\beta\text{ARMA}\) models can fail in practice: Cribari-Neto, Costa, and Fonseca (2025) show, via Monte Carlo simulation, that convergence failure rates can reach 18.68% in challenging scenarios (e.g., \(\beta\text{ARMA}(1,4)\) with \(n = 125\)). Their paper also demonstrates, using two empirical datasets, that certain model specifications only converge when ridge penalization is applied. A natural question then arises: once the penalized estimator (PCMLE) achieves convergence, how reliable is the resulting model? Does it produce a well-specified fit that passes residual diagnostics? This vignette addresses that question directly by presenting a specification for which CMLE fails and PCMLE succeeds, and by carrying out a full residual analysis on the penalized model.
The second objective is to present a full modeling workflow — from exploratory analysis through feature engineering, model fitting, residual diagnostics, and out-of-sample forecasting — applied to the complete Brasília humidity series. This mirrors the applied methodology of Costa, Cribari-Neto, and Scher (2024), published in the Journal of Hydrology.
Replication code and data are openly available at https://github.com/Everton-da-Costa/BarmaRidgeBJPS2025.
library(betaARMA)
library(forecast)
library(ggplot2)
library(gridExtra)
library(zoo)
library(scales)
library(moments)The dataset consists of 306 monthly observations of average relative humidity in Brasília, Brazil, covering January 1999 to June 2024. The data were obtained from NASA’s POWER project. Relative humidity is naturally bounded in \((0, 1)\), making it well suited for \(\beta\)ARMA modeling.
To motivate the penalized estimator, we focus on a specific subsample of the Brasília dataset. In Cribari-Neto, Costa, and Fonseca (2025), this subsample was used to illustrate convergence failure under standard CMLE, and it was shown that ridge penalization successfully restores convergence.
While that paper established the numerical stability of the penalized estimator, the objective here is different: to verify the statistical adequacy of the penalized fit through residual diagnostics, and to show how this workflow can guide model reduction. To that end, we specify a \(\beta\text{ARMA}\) model with \(\text{AR} = (10, 18)\), \(\text{MA} = (1, 11, 13)\), and harmonic regressors.
y_subsample_ts <- window(
brasilia_ts,
start = c(2006, 02),
end = c(2019, 07)
)
y_train <- window(
y_subsample_ts,
end = c(2018, 07)
)
y_test <- window(
y_subsample_ts,
start = c(2018, 08)
)The figure below shows the full time series with the training period delimited by dashed vertical lines.
Relative humidity in Brasília. Dashed red lines delimit the training period used in the instability analysis.
| Min | Max | Median | Mean | SD | Skewness | Excess_kurtosis |
|---|---|---|---|---|---|---|
| 0.33 | 0.88 | 0.75 | 0.69 | 0.16 | -0.72 | 2.11 |
To capture annual seasonality we include two harmonic regressors, \(s_t = \sin(2\pi t / 12)\) and \(c_t = \cos(2\pi t / 12)\), which together represent a smooth annual wave.
freq <- frequency(y_train)
n_test <- length(y_test)
trend_index_train <- seq_along(y_train)
trend_index_test <- (max(trend_index_train) + 1):
(max(trend_index_train) + n_test)
hs_train <- sin(2 * pi * trend_index_train / freq)
hc_train <- cos(2 * pi * trend_index_train / freq)
hs_test <- sin(2 * pi * trend_index_test / freq)
hc_test <- cos(2 * pi * trend_index_test / freq)
X_train <- cbind(hs = hs_train, hc = hc_train)
X_test <- cbind(hs = hs_test, hc = hc_test)ar_lags <- c(10, 18)
ma_lags <- c(1, 11, 13)
fit_cmle <- barma(
y = y_train,
ar = ar_lags,
ma = ma_lags,
penalty = FALSE,
xreg = X_train
)summary(fit_cmle)
#> Beta Autoregressive Moving Average Model
#>
#> Call:
#> barma(y = y_train, ar = ar_lags, ma = ma_lags, xreg = X_train,
#> penalty = FALSE)
#>
#> Link function: logit
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> alpha 0.72525 0.11752 6.171 6.77e-10 ***
#> varphi10 -0.09790 0.06146 -1.593 0.111167
#> varphi18 0.28234 0.08558 3.299 0.000969 ***
#> theta1 0.92386 0.04349 21.242 < 2e-16 ***
#> theta11 0.16907 0.05591 3.024 0.002496 **
#> theta13 0.15547 0.05554 2.799 0.005120 **
#> hs 0.51653 0.06040 8.552 < 2e-16 ***
#> hc 0.71515 0.05372 13.313 < 2e-16 ***
#> phi 53.56433 6.54492 8.184 2.22e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Warning: Optimization algorithm [BFGS (stats::optim)] did not converge (Code: 1).
#> ---
#> Log-likelihood: 186.2
#> AIC: -354.3 BIC: -327.2 HQ: -343.3The optimizer failed to converge (conv = 1), and the
estimates are unreliable. Without a penalized alternative, an analyst
would have little recourse but to abandon this lag structure and search
for a different specification.
Applying ridge penalization to the same specification restores convergence.
summary(fit_pcmle)
#> Beta Autoregressive Moving Average Model
#>
#> Call:
#> barma(y = y_train, ar = ar_lags, ma = ma_lags, xreg = X_train,
#> penalty = TRUE)
#>
#> Link function: logit
#> Penalization : Ridge (lambda = 0.01234)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> alpha 0.76929 0.10587 7.266 3.69e-13 ***
#> varphi10 -0.18060 0.07521 -2.401 0.01634 *
#> varphi18 0.23408 0.08074 2.899 0.00374 **
#> theta1 0.47711 0.07387 6.459 1.06e-10 ***
#> theta11 -0.05077 0.07329 -0.693 0.48843
#> theta13 0.31205 0.07846 3.977 6.97e-05 ***
#> hs 0.53171 0.04495 11.828 < 2e-16 ***
#> hc 0.77621 0.04482 17.317 < 2e-16 ***
#> phi 59.57003 7.28618 8.176 2.22e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Optimization: BFGS (stats::optim) converged after 111 objective function evaluations.
#> ---
#> Log-likelihood: 191.8
#> AIC: -365.7 BIC: -338.6 HQ: -354.7The penalized estimator successfully converges. The ridge penalty value \(\lambda_n = 1/(n - a)^{0.9}\) is chosen to balance bias and variance, as established in Cribari-Neto, Costa, and Fonseca (2025).
This illustrates a practical use case for PCMLE. When standard CMLE presents a convergence failure, an analyst might discard the entire lag structure without further investigation. By applying PCMLE, convergence is achieved and the parameter estimates become available for inspection. Furthermore, the diagnostic plots below for the PCMLE fit confirm that the residuals exhibit no serial correlation (Scher, Cribari-Neto, Pumi, and Bayer, 2020), indicating that the lag structure adequately captures the dynamics of the series.
Diagnostic plots for the PCMLE fit: AR=(10,18), MA=(1,11,13).
The observed and fitted values track each other closely. All ACF and PACF spikes lie within the confidence bands, and both the Ljung-Box and Monti p-values remain consistently above 0.05 across all evaluated lags, providing no evidence of residual serial correlation.
Upon reviewing the PCMLE summary, we note that the \(\theta_{11}\) estimate is not statistically significant (p-value = 0.488). This suggests that the MA(11) lag does not contribute meaningfully to the model dynamics. Removing it yields the reduced specification \(\text{AR} = (10, 18)\), \(\text{MA} = (1, 13)\), for which the likelihood surface is no longer flat and standard CMLE converges. Without PCMLE, the non-significance of \(\theta_{11}\) would never have been discovered, and the valid underlying structure of the series would have remained inaccessible.
ar_lags_reduced <- c(10, 18)
ma_lags_reduced <- c(1, 13)
fit_cmle_reduced <- barma(
y = y_train,
ar = ar_lags_reduced,
ma = ma_lags_reduced,
penalty = FALSE,
xreg = X_train
)summary(fit_cmle_reduced)
#> Beta Autoregressive Moving Average Model
#>
#> Call:
#> barma(y = y_train, ar = ar_lags_reduced, ma = ma_lags_reduced,
#> xreg = X_train, penalty = FALSE)
#>
#> Link function: logit
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> alpha 0.79153 0.10931 7.241 4.45e-13 ***
#> varphi10 -0.18959 0.07669 -2.472 0.013435 *
#> varphi18 0.22200 0.08253 2.690 0.007147 **
#> theta1 0.47857 0.07509 6.373 1.85e-10 ***
#> theta13 0.29543 0.07676 3.849 0.000119 ***
#> hs 0.53129 0.04584 11.591 < 2e-16 ***
#> hc 0.77560 0.04571 16.969 < 2e-16 ***
#> phi 59.35648 7.26010 8.176 2.22e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Optimization: BFGS (stats::optim) converged after 97 objective function evaluations.
#> ---
#> Log-likelihood: 193.2
#> AIC: -370.4 BIC: -346.3 HQ: -360.6The reduced CMLE model converges and all coefficients are statistically significant. We proceed with residual diagnostics and forecasting using this parsimonious specification.
The panels below assess whether the fitted model adequately captures the dynamics of the Brasília relative humidity series.
Diagnostic plots for the reduced CMLE model AR=(10,18), MA=(1,13).
The observed and fitted values track each other closely throughout the sample period. All ACF and PACF spikes lie within the confidence bands, and both the Ljung-Box and Monti p-values remain consistently above 0.05 across all evaluated lags, providing no evidence of residual serial correlation.
Normal Q-Q plot of quantile residuals for the reduced CMLE model.
The quantile residuals fall closely along the reference line within the 95% confidence band, supporting the normality assumption. One observation in the upper tail departs slightly from the line but does not indicate a systematic departure from the assumed distribution.
The largest residual corresponds to October 2011 (quantile residual = 4.37), a month that marks the climatological transition from the dry to the rainy season in Brasília — the period of highest interannual humidity variability. This observation coincides with the onset of an extreme drought period documented across Brazil from 2011 onward (Cunha et al., 2019). The second largest residual corresponds to January 2015 (−3.44), coinciding with the strong 2015–2016 El Niño event, which suppressed rainfall across central Brazil. These isolated departures reflect genuine meteorological anomalies rather than systematic model misspecification.
# Define the translation dictionary
month_map <- c(
"Jan" = "Jan", "Fev" = "Feb", "Mar" = "Mar", "Abr" = "Apr",
"Mai" = "May", "Jun" = "Jun", "Jul" = "Jul", "Ago" = "Aug",
"Set" = "Sep", "Out" = "Oct", "Nov" = "Nov", "Dez" = "Dec"
)
dates <- as.Date(zoo::as.yearmon(time(y_test)))
pt_months <- tools::toTitleCase(format(dates, "%b"))
months <- unname(month_map[pt_months])
forecasts_df <- data.frame(
observed = as.numeric(y_test),
forecast = forecast(fit_cmle_reduced, xreg = X_test, h = n_test)$mean,
month = months
)
forecasts_df$time <- seq_len(n_test)Out-of-sample forecasts from the reduced CMLE model against the observed test data (teal dots).
mae <- function(obs, pred) mean(abs(obs - pred))
rmse <- function(obs, pred) sqrt(mean((obs - pred)^2))
obs <- forecasts_df$observed
accuracy_table <- data.frame(
Model = "betaARMA CMLE (reduced)",
MAE = round(mae(obs, forecasts_df$forecast) * 100, 2),
RMSE = round(rmse(obs, forecasts_df$forecast) * 100, 2)
)
knitr::kable(
accuracy_table,
caption = paste0(
"MAE and RMSE ($\\times 100$) over the ",
n_test,
"-month forecast horizon."
)
)| Model | MAE | RMSE |
|---|---|---|
| betaARMA CMLE (reduced) | 5.27 | 7.57 |
This vignette illustrated the practical value of PCMLE as a diagnostic tool in \(\beta\)ARMA modeling, using monthly relative humidity data from Brasília.
Standard CMLE failed to converge for the \(\beta\text{ARMA}\) model with \(\text{AR} = (10, 18)\), \(\text{MA} = (1, 11, 13)\). Applying ridge penalization (PCMLE) restored convergence and made the parameter estimates available for inspection. Residual diagnostics on the PCMLE fit confirmed that the lag structure adequately captured the series dynamics. Inspection of the PCMLE estimates revealed that \(\theta_{11}\) was not statistically significant, motivating the reduction to the parsimonious \(\beta\text{ARMA}\) model with \(\text{AR} = (10, 18)\), \(\text{MA} = (1, 13)\). The reduced CMLE model converged, passed all residual diagnostics, and produced accurate out-of-sample forecasts, with all predictions guaranteed to lie in \((0, 1)\).
This workflow — CMLE fails, PCMLE reveals the structure, reduction yields a stable CMLE model — demonstrates that PCMLE is not merely a fallback estimator but an active analytical tool that can guide model selection in challenging estimation scenarios. For the complete simulation study, bootstrap procedures, and extended empirical results, we refer the reader to Cribari-Neto, Costa, and Fonseca (2025).
Costa, E., Cribari-Neto, F., & Scher, V. T. (2024). Test inferences and link function selection in dynamic beta modeling of seasonal hydro-environmental time series with temporary abnormal regimes. Journal of Hydrology, 638, 131489. https://doi.org/10.1016/j.jhydrol.2024.131489
Cribari-Neto, F., Costa, E., & Fonseca, R. V. (2025). Numerical stability enhancements in beta autoregressive moving average model estimation. Brazilian Journal of Probability and Statistics, 39(4), 410–437. https://doi.org/10.1214/25-BJPS645
Cunha, A. P. M. A., Zeri, M., Deusdará Leal, K., Costa, L., Cuartas, L. A., Marengo, J. A., … & Cunningham, C. (2019). Extreme drought events over Brazil from 2011 to 2019. Atmosphere, 10(11), 642. https://doi.org/10.3390/atmos10110642
Rocha, A. V., & Cribari-Neto, F. (2009). Beta autoregressive moving average models. TEST, 18(3), 529–545. https://doi.org/10.1007/s11749-008-0112-z
Rocha, A. V., & Cribari-Neto, F. (2017). Erratum to: Beta autoregressive moving average models. TEST, 26, 451–459. https://doi.org/10.1007/s11749-017-0528-4
Scher, V. T., Cribari-Neto, F., Pumi, G., & Bayer, F. M. (2020). Goodness-of-fit tests for \(\beta\)ARMA hydrological time series modeling. Environmetrics, 31(3), e2607. https://doi.org/10.1002/env.2607
The analysis was conducted in the following computational environment.
#> =================================================================
#> Session Information
#> =================================================================
#> This report was generated at: maio 23, 2026 at 02:12
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=pt_BR.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=pt_BR.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=pt_BR.UTF-8 LC_MESSAGES=pt_BR.UTF-8
#> [7] LC_PAPER=pt_BR.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/Bahia
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] moments_0.14.1 scales_1.4.0 zoo_1.8-15 gridExtra_2.3 ggplot2_4.0.3
#> [6] forecast_9.0.2 betaARMA_1.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 jsonlite_2.0.0 dplyr_1.1.4 compiler_4.4.2
#> [5] tidyselect_1.2.1 Rcpp_1.1.1-1.1 parallel_4.4.2 jquerylib_0.1.4
#> [9] yaml_2.3.10 fastmap_1.2.0 lattice_0.22-5 R6_2.6.1
#> [13] labeling_0.4.3 generics_0.1.4 knitr_1.50 tibble_3.3.0
#> [17] timeDate_4052.112 bslib_0.9.0 pillar_1.11.1 RColorBrewer_1.1-3
#> [21] rlang_1.2.0 cachem_1.1.0 urca_1.3-4 xfun_0.53
#> [25] sass_0.4.10 S7_0.2.2 cli_3.6.6 withr_3.0.2
#> [29] magrittr_2.0.5 digest_0.6.37 grid_4.4.2 rstudioapi_0.17.1
#> [33] nlme_3.1-165 lifecycle_1.0.5 fracdiff_1.5-4 vctrs_0.7.3
#> [37] evaluate_1.0.5 glue_1.8.1 farver_2.1.2 colorspace_2.1-2
#> [41] rmarkdown_2.31 tools_4.4.2 pkgconfig_2.0.3 htmltools_0.5.8.1