% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/CERFIT.R
\name{CERFIT}
\alias{CERFIT}
\title{Fits a Random Forest of Interactions Trees}
\usage{
CERFIT(
  formula,
  data,
  ntrees,
  subset = NULL,
  search = c("exhaustive", "sss"),
  method = c("RCT", "observational"),
  PropForm = c("randomForest", "CBPS", "GBM", "HI"),
  split = c("t.test"),
  mtry = NULL,
  nsplit = NULL,
  nsplit.random = FALSE,
  minsplit = 20,
  minbucket = round(minsplit/3),
  maxdepth = 30,
  oob = FALSE,
  a = 50,
  sampleMethod = c("bootstrap", "subsample", "subsampleByID", "allData"),
  useRes = TRUE,
  scale.y = FALSE
)
}
\arguments{
\item{formula}{Formula to build CERFIT.  Categorical predictors must be listed as a factor. e.g., Y ~ x1 + x2 | treatment}

\item{data}{Data to grow a tree.}

\item{ntrees}{Number of Trees to grow}

\item{subset}{A logical vector that controls what observations are used to grow the forest.
The default value will use the entire dataframe}

\item{search}{Method to search through candidate splits}

\item{method}{For observational study data, method="observational";for randomized study data, method="RCT".}

\item{PropForm}{Method to estimate propensity score}

\item{split}{Impurity measure splitting statistic}

\item{mtry}{Number of variables to consider at each split}

\item{nsplit}{Number of cut points selected}

\item{nsplit.random}{Logical: indicates if process to select cut points are random}

\item{minsplit}{Number of observations required to continue growing tree}

\item{minbucket}{Number of observations required in each child node}

\item{maxdepth}{Maximum depth of tree}

\item{oob}{Whether or not to use Out-of-bag sample for predictions.}

\item{a}{Sigmoid approximation variable (for "sss" which is still under development)}

\item{sampleMethod}{Method to sample learning sample. Default is bootstrap. Subsample
takes a subsample of the original data. SubsamplebyID samples by an ID column and
uses all observations that have that ID. allData uses the entire data set
for every tree.}

\item{useRes}{Logical indicator if you want to fit the CERFIT model to
the residuals from a linear model}

\item{scale.y}{Logical, standardize y when creating splits (For "sss" to increase stability)}
}
\value{
Returns a fitted CERFIT object which is a list with the following elements
\itemize{
\item RandFor: The Random forest of interaction trees
\item trt.type: A string containing the treatment type of the data used to fit the model.
Cant be binary, multiple, ordered or continuous.
\item response.type: A string representing the response type of the data. Can be
binary or continuous.
\item useRes: A logical indicator that is TRUE if the model was fit on the
residuals of a linear model
\item data: The data used to fit the model also contains the propensity score if
method was set to observational}
}
\description{
Estimates an observations individualized treatment effect for RCT
and observational data. Treatment can be an binary, categorical, ordered, or continuous
variable. Currently if response is binary useRes must be set equal to TRUE.
}
\details{
This function implements Random Forest of Interaction Trees proposed
in Su (2018). Which is a modification of the Random Forest algorithm where
instead of a split being chosen to maximize prediction accuracy each split
is chosen to maximized subgroup treatment heterogeneity. It chooses the best
split by maximizing the test statistic for \eqn{H_0: \beta_3=0} in the
following linear model

\eqn{Y_i = \beta_0 + \beta_1I(X_{ij} < c) + \beta_2I(Z = 1) + \beta_3I(X_{ij} < c)I(Z = 1) + \varepsilon_i}

Where \eqn{X_{ij}} represents the splitting variable and Z = 1 represents
treatment. So, by maximizing the  test statistic for \eqn{\beta_3} we are
maximizing the treatment difference between the nodes.

The above equation only works when the data comes from a randomized controlled
trial. But we can modify it to gives us unbiased estimates of treatment
effect in observational studies Li et al. (2022). To do that we add propensity score into the
linear model.

\eqn{Y_i = \beta_0 + \beta_1I(X_{ij} < c) + \beta_2I(Z = 1) + \beta_3I(X_{ij} < c)I(Z = 1) + \beta_4e_i + \varepsilon_i}

Where \eqn{e_i} represents the propensity score. The CERIT function will estimate
propensity score automatically when the method argument is set to observational.

To control how this function estimates propensity score you can use the
PropForm argument. Which can take four possible values randomForest, CBPS,
GBM and HI. randomForest uses the randomForest package to use a random forest
to estimate propensity score, CBPS uses Covariate balancing propensity score
to estimate propensity score GBM uses generalized boosted regression models
to estimate propensity score, and HI is for continuous treatment and
estimates the general propensity score. Some of these options only work
for certain treatment types. Full list below
\itemize{
\item binary: GBM, CBPS, randomForest
\item categorical: GBM, CBPS
\item ordered: GBM, CBPS
\item continuous: CBPS, HI
}
}
\examples{
fit <- CERFIT(Result_of_Treatment ~ sex + age + Number_of_Warts + Area + Time + Type | treatment,
data = warts,
ntrees = 30,
method = "RCT",
mtry = 2)

}
\references{
\itemize{
\item Li, Luo, et al. Causal Effect Random Forest of
Interaction Trees for Learning Individualized Treatment Regimes with
Multiple Treatments in Observational Studies. Stat, 2022,
https://doi.org/10.1002/sta4.457.
\item Su, X., Peña, A., Liu, L., & Levine, R. (2018). Random forests of interaction trees for estimating individualized treatment effects in randomized trials.
Statistics in Medicine, 37(17), 2547- 2560.
\item G. W. Imbens, The role of the propensity score in estimating dose-response
functions., Biometrika, 87 (2000), pp. 706–710.
\item G. Ridgeway, D. McCarey, and A. Morral, The twang package: Toolkit for
weighting and analysis of nonequivalent groups, (2006).
\item A. Liaw and M. Wiener, Classification and regression by randomforest, R
News, 2 (2002), pp. 18–22}
}
