% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/feature_selection.R
\name{select_features}
\alias{select_features}
\alias{select_features,MetaNLP-method}
\title{Select features via elasticnet regularization}
\usage{
select_features(object, ...)

\S4method{select_features}{MetaNLP}(object, alpha = 0.8, lambda = "avg", seed = NULL, ...)
}
\arguments{
\item{object}{An object of class \code{MetaNLP}}

\item{...}{Additional arguments for \link[glmnet]{cv.glmnet}. An important
option might be \code{type.measure} to specify which loss is used when
the cross validation is executed.}

\item{alpha}{The elastic net mixing parameter, with \eqn{0\leq \alpha \leq 1}.
\code{alpha = 1} then equals the lasso penalty, \code{alpha = 0} is the ridge
penalty.}

\item{lambda}{The weight parameter of the penalty. The possible values are
\code{"avg", "min", "1se"} or a numeric value which directly determines
\eqn{\lambda}. When choosing \code{"avg", "min"} or \code{"1se"}, cross
validation is executed to determine \eqn{\lambda}.
Note that cross validation uses random folds, so the results are not necessarily
replicable.
"avg" calls \code{select_features} 10 times, computes the \eqn{\lambda} which
minimizes the loss for each iteration and then uses the median of these
values as the final value, for which the objective function is
minimized. \code{"min"} and \code{"1se"} carry out the cross validation just
once and \eqn{\lambda} is either the value, for which the cross-validated
error is minimized (option \code{"min"}) or the value, that gives
the most regularized model such that the cross-validated error is within
one standar error of the minimum (option \code{"1se"}).}

\item{seed}{A numeric value which is used as a local seed for this function.
Default is \code{seed = NULL}, so no seed is set.
Setting a seed leads to replicable results of
the cross validation, such that each call of \code{select_features} selects
the same columns. If a seed is set, the option \code{lambda = "avg"}
yields the same results as \code{lambda = "min"}.}
}
\value{
An object of class \code{MetaNLP}, where the columns were selected
via elastic net.
}
\description{
As the document-term matrix quickly grows with an increasing number of abstracts,
it can easily reach several thousand columns. Thus, it can be important to
extract the columns that carry most of the information in the decision making
process. This function uses a generalized linear model combined with
elasticnet regularization to extract these features. In contrast to a usual
regression model or a L2 penalty (ridge regression), elasticnet (and LASSO)
sets some regression parameters to 0. Thus, the selected features are exactly
the features with a non-zero entry.
}
\details{
The computational aspects are executed by the \code{glmnet}
package. At first, a model is fitted via \link[glmnet]{glmnet}. The
elastic net parameter \eqn{\alpha} can be specified by the user. The
parameter \eqn{\lambda}, which determines the weight of the penalty, can
either be chosen via cross validation (using \link[glmnet]{cv.glmnet} or by
giving a numeric value.
}
\note{
By using a fix value for \code{lambda}, the number of features which should
be selected can easily be adjusted by the parameter \code{alpha}. The smaller
one chooses \code{alpha}, the more columns will still be present in the
resulting data frame, the higher one chooses \code{alpha}, the less
columns will be chosen.
}
\examples{
path <- system.file("extdata", "test_data.csv", package = "MetaNLP", mustWork = TRUE)
obj <- MetaNLP(path)
obj2 <- select_features(obj, alpha = 0.7, lambda = "min")


}
