This version includes the possibility of running (and simulating data
from) a 1-state multilevel hidden Markov model. This is mainly
convenient for benchmarking purposes for model selection (e.g., how much
does the AIC / AICc decrease from a 1-state to a 2-state model). To
accommodate this change the following main functions are updated:
mHMM()
and sim_mHMM()
. The following S3
methods were updated: print()
, summary()
, and
plot()
. In addition, the following post processing
functions were updated: obtain_emiss()
,
obtain_gamma()
, and vit_mHMM()
.
The estimation vignette is changed from a pdf output file to a html output file due to recurrent issues with GHA workflow and rendering pdf vignettes.
A major improvement in this release is the possibility to include
count data in mHMM()
. Currently, the user can model data
composed of either categorical data OR continuous data OR count data (so
a mix of different types of emission distributions is not possible
within the stable CRAN version). As such, the following changes are
implemented:
data_distr
of the function
mHMM()
used specify the type of input data now contains the
option data_distr = 'count'
.sim_mHMM()
allows the simulation of count data, which
is facilitated again by the input parameter
data_distr
.mHMM()
output objects as input
such as obtain_emiss()
, vit_mHMM()
, and S3
methods as print()
, summary()
, and
plot()
automatically detect whether the output object
relates to a multilevel HMM fitted to categorical, continuous, or count
data, and adjusts it’s processing methods accordingly.Several new functions are introduced specifically relating to modelling count data:
prior_emiss_count()
enables the specification of hyper
prior parameters when using count input data.pd_RW_emiss_count()
enables the manual specification of
the settings of the proposal distribution of the random walk (RW)
Metropolis sampler of Poisson emission distribution(s). Note that the
implemented RW Metropolis sampler is self tuning, hence manual
specification is optional.var_to_logvar()
aids the user with transforming the
between-subject variance in the positive scale to the log variance in
the logarithmic scale. That is, specifying hyper prior parameters when
using count input data is obligatory. When not using covariates, the
expected means (lambda) and corresponding variances can be specified in
the natural (positive real numbers) scale. However, when using
covariates, the expected means and corresponding variances have to be
specified on the logarithmic scale. Transforming the variances to the
logarithmic scale is a nontrivial task. As such, to aid the user with
this task, the function var_to_logvar()
can be used.sim_mHMM()
sim_mHMM()
,
it is now possible to specify the between subject variance in gamma and
the emission distribution at the parameter level, instead of fixed over
states. When the input parameter var_gamma
or
var_emiss
is a numeric vector with length 1 for gamma or
length n_dep
for var_emiss
, the variance is
still assumed fixed across switching probabilities of the transition
probability matrix gamma or fixed across states (and, for the
categorical distribution, categories within a state) within an emission
distribution.mHMM()
The S3 print()
option now returns the corrected
Akaike information criterion (AICc) in addition to the conventional AIC
model selection criterion. The AICc is a modification of the original
AIC that corrects for small sample sizes. One rule of thumb is to use
the AICc when the number of observations divided by the number of model
parameters < 40. In the implementation of the function
mHMM()
, the number of observations relates to the (average)
number of observations per subject, and the model parameters to the
subject level freely estimated transition probabilities and emission
distribution parameters. For example in a model with m = 3 states and
univariate data (i.e., only one dependent variable) with a normal
emission distribution, the number of parameters equals: m x (m-1) = 6
freely estimated transition probabilities, 3 normal emission means and 3
normal emission variances, totals to 12 parameters. Any subject level
sequence length below 12 * 40 = 480 observations would preferably use
the AICc instead of the AIC.
mHMM()
output now also includes
gamma_V_int_bar
, which is a matrix containing the variance
components for the subject-level intercepts (between subject variances)
of the multinomial logistic regression modeling the transition
probabilities over the iterations of the hybrid Metropolis within Gibbs
sampler.
mHMM()
output now also includes
emiss_V_int_bar
, which is a list containing one matrix per
dependent variable, denoting the variance components for the
subject-level intercepts (between subject variances) of the multinomial
logistic regression modeling the categorical emission probabilities over
the iterations of the hybrid Metropolis within Gibbs sampler.
vit_mHMM()
vit_mHMM()
is now fixed. The number of rows in the output
object of vit_mHMM()
should be the sum of the sequence
lengths over the subjects. However, with varying sequence length, for
each subject the maximum sequence length of the sample was used,
inserting NA for non existing observations at the end of the sequence.
This is now corrected, with the number of rows in the output object of
vit_mHMM()
equaling the sum of the sequence lengths over
the subjects and not inputting ‘spurious’ NA
s.A major improvement in this release is the possibility to include
continuous data in mHMM()
. Currently, the user can model
data composed of either categorical data OR continuous data (so a mix of
different types of emission distributions is not possible within the
stable CRAN version). As such, the following changes are
implemented:
mHMM()
now includes the input parameter
data_distr
, where the user can specify whether the input
data contains categorical or continuous data. Defaults to
data_distr = 'categorical'
.sim_mHMM()
allows the simulation of continuous data,
which is facilitated again by the input parameter
data_distr
, where the user can specify whether one wants to
simulate categorical or continuous data. Defaults to
data_distr = 'categorical'
.mHMM()
output objects as input
such as obtain_emiss()
, vit_mHMM()
, and S3
methods as print()
, summary()
, and
plot()
automatically detect whether the output object
relates to a multilevel HMM fitted to categorical or continuous data,
and adjusts it’s processing methods accordingly.Also, a new function, prior_emiss_cont()
, is introduced
which enables the specification of hyper prior parameters when using
continuous input data.
New is also the accommodation of missing values (NA
) in
the dependent input variable(s). Missingness is assumed Missing at
Random (MAR), so that the missingness mechanism is independent of the
missing data and the hidden states given the observed data and model
parameters. This means that missing observations are assumed equally
likely in each of the states. In our approach, hidden state
probabilities are inferred for missing observations (thus only based on
the transition probability matrix gamma), but missing observations
themselves are not directly imputed.
To accommodate missing values, the forward algorithm implemented in C++ was slightly adjusted, and state dependent observations (on which the parameter estimates of the emission distribution(s) are based) are selected such that missing values are omitted.
mHMM()
The mHMM()
output component PD_subj
was
modified to facilitate the inclusion of both categorical and continuous
input data. Before, PD_subj
was a list containing one
matrix per subject containing all subject level output parameters over
the iterations of the MCMC sampler. Now, PD_subj
is a list
containing one list per subject with the elements
trans_prob
, cat_emiss
or
cont_emiss
in case of categorical or continuous
observations, respectively, and log_likl
, providing the
subject parameter estimates over the iterations of the MCMC sampler.
trans_prob
relates to the transition probabilities gamma,
cat_emiss
to the categorical emission distribution
(emission probabilities), cont_emiss
to the continuous
emission distributions (subsequently the the emission means and the
(fixed over subjects) emission standard deviation), and
log_likl
to the log likelihood over the MCMC
iterations.
A detailed error message is displayed when trying to post-process mHMM objects created with an earlier version of the package.
mHMM()
Several extra checks have been implemented in mHMM()
.
Specifically, checking for:
starting_val
.A major improvement in this release is the increased speed of the
mHMM()
algorithm.
Two new functions to manually specify hyper-prior distribution parameter values for the multilevel hidden Markov model are introduced:
Using manually specified hyper-prior distribution parameter values in the function mHMM() is as of now thus done by inputting an object of the class ‘mHMM_prior_emiss’ and/or ‘mHMM_prior_gamma’ for the input parameters emiss_hyp_prior and gamma_hyp_prior, respectively, created by the above functions. Note that manually specifying hyper-prior distribution parameter values is optional, default values are available for all parameters.
Manually specifying hyper-prior distribution parameter values is done on the logit domain. That is, the hyper-priors are on the intercepts (and, if subject level covariates are used, regression coefficients) of the Multinomial logit model used to accommodate the multilevel framework of the data, instead of on the probabilities directly. As logit domain might be more unfamiliar to the user compared to the probability domain, two functions are introduced to aid the user:
Two new functions to manually specify settings of the proposal distribution of the Random Walk (RW) Metropolis sampler for the multilevel hidden Markov model are introduced:
Using manually specified settings of the proposal distribution of the Random Walk (RW) Metropolis sampler in the function mHMM() is as of now thus done by inputting an object of the class ‘mHMM_pdRW_emiss’ and/or ‘mHMM_pdRW_gamma’ for the input parameters emiss_sampler and gamma_sampler, respectively, created by the above functions. Note that manually specifying setting of the RW proposal distribution is optional, default values are available for all parameters.
In the function sim_mHMM() used to simulate data for multiple subject - for which the observation follow a hidden Markov model (HMM) with an multilevel structure
Patch release to solve noLD issues (tests without long double on x86_64 Linux system) uncovered by CRAN Package Check Results.
First (official) version of the package!