% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DiSSMod.R
\name{DiSSMod}
\alias{DiSSMod}
\title{Fitting Sample Selection Models for Discrete Response Variables}
\usage{
DiSSMod(response, selection, data, resp.dist, select.dist, alpha,
  trunc.num, standard = FALSE, verbose = 1, eps = 1e-07,
  itmax = 1000)
}
\arguments{
\item{response}{a formula for the response equation.}

\item{selection}{a formula for the selection equation.}

\item{data}{a data frame and data has to be included with the form of \code{data.frame}.}

\item{resp.dist}{a character for the distribution choice of the response variable,
\code{"bernoulli"} for Bernoulli distribution, \code{"poisson"} for Poisson distribution,
and \code{"negbinomial"} for Negative binomial distribution. Also, the character strings
can be abbreviated and can be upper or lower case as preferred.}

\item{select.dist}{a character for the distribution choice of the selection variable,
\code{"gumbel"} for Gumbel distribution, \code{"normal"} for Normal distribution,
and \code{"logistic"} for Logistic distribution. Also, the character strings
can be abbreviated and can be upper or lower case as preferred.}

\item{alpha}{a vector of \eqn{alpha} values on which the profile log-likelihood function is evaluated;
if the argument is missing, a set of values in the interval \code{(-10, 10)} is used for the initial search,
followed by a second search on a revised interval which depends on the outcome from the first search.}

\item{trunc.num}{an integer numeric constant used as the truncation point of an infine summation of probabilities
involved when \code{resp.dist} equals \code{"Poisson"} or \code{"NegBinomial"};
if the argument is missing, a default choice is made, as described in the \sQuote{Details} section. Notice: this
default choice of \code{trunc.num} may be subject to revision in some future version of the package,
and the argument \code{trunc.num} itselt may possibly be replaced by some other ingredient.}

\item{standard}{a logical value for the standardizing explanatory variables, if \code{TRUE} two types of values
(standardized and not) will be returned.}

\item{verbose}{an integer value for the level of printed details (values: 0|1|2); the default value is 1
which stands for shortly printed details. If the value is 2, more details are viewed such as
values of the log likelihood functions and iteration numbers. If the value is 0, there is no printed
detail.}

\item{eps}{a numeric value for the estimating parameters, which is needed for the step of the optimization.
If the sum of absolute differences between present step estimated parameters and former step
estimated parameters is smaller than \code{eps}, we assume that estimated parameters are
optimized.}

\item{itmax}{an integer stands for maximum number for the iteration of optimizing the parameters.}
}
\value{
\code{DiSSMod} returns an object of class \code{"DiSSMod"},
which is a list containing following components:

\item{call}{a matched call.}
\item{standard}{a logical value, stands for standardization or not.}
\item{st_loglik}{a vector containing the differences between log likelihoods and maximized log likelihood.}
\item{max_loglik}{a maximized log likelihood value.}
\item{mle_alpha}{a maximized likelihood estimator of alpha.}
\item{alpha}{a vector containing grids of the alpha}
\item{Nalpha}{a vector containing proper alpha, which does not have
\code{NA} value for corresponding log likelihood.}
\item{num_NA}{a number of \code{NA} values of log likelihoods.}
\item{n_select}{a number of selected response variables.}
\item{n_all}{a number of all response variables.}
\item{estimate_response}{estimated values for the response model.}
\item{std_error_response}{estimated standard errors for the response model.}
\item{estimate_selection}{estimated values for the selection model.}
\item{std_error_selection}{estimated standard errors for the selection model.}
}
\description{
Function \code{DiSSMod} fits sample selection models for discrete random
variables, by suitably extending the formulation of the classical
Heckman model to the case of a discrete response, but retaining the
original conceptual framework. Maximum likelihood estimates are obtained
by Newton-Raphson iteration combined with use of profile likelihood.
}
\details{
The specification of the two linear models regulating the response variable and
the selection mechanism, as indicated in the \sQuote{Background} section,
is accomplished by two arguments of \code{formula} type,
denoted \code{response} and \code{selection}, respectively.
Each \code{formula} is specified with the same syntax of similar arguments in
standard functions such as \code{lm} and \code{glm}, with the restriction that
the intercept term (which is automatically included) must not be removed.

The distributional assumptions associated to the \code{response} and \code{selection} components
are specified by the arguments \code{resp.dist} and \code{select.dist}, respectively.
Argument \code{select.dist} refers to the unobservable continuous variable of which we
observe only the dichotomous outcome Yes-No.

In this respect, a remark is appropriate about the option \code{"Gumbel"} for \code{select.dist}.
This choice is equivalent to the adoption of an Exponential distribution of the selection variables
combined  an exponential transformation of the linear predictor of the
\code{selection} argument, as it is presented in Section 3.2 of Azzalini et al. (2019).
Also, it corresponds to work with the log-transformation of an Exponential variable,
which is essentially a Gumbel type of variable, up to a linear transformation with
respect to its more commonly employed parameterization.

When \code{resp.dist} is \code{"Poisson"} or \code{"NegBinomial"} and \code{trunc.num} is missing,
a default choice is made; this equals \code{1.5*m} or \code{2*m} in the two respective cases,
where \code{m} denotes the maximum observed value of the response variable.

Function \code{DiSSMOd} calls lower level functions, \code{nr.bin, nr.nbinom, nr.pois} and the others
for the actual numerical maximization of the log-likelihood via a Newton-Raphson iteration.

Notice that the automatic initialization of the \code{alpha} search interval, when this argument is
missing, may change in future versions of the package.
}
\section{Background}{

Function \code{DiSSMod} fits sample selection models for discrete random variables,
by suitably extending the formulation of the classical Heckman model to the case of a discrete response,
but retaining the original conceptual framework.
This logic involves the following key ingredients: (1) a linear model indicating which explanatory variables
influence the response variable; (2) a linear model indicating  which (possibly different) explanatory variables,
besides the response variable itself, influence  a `selection variable', which is intrinsically continuous but
we only observe a dichotomous outcome from it, of type Yes-No, which selects which are the observed response cases;
(3) distributional assumptions on the response and the selection variable.

The data fitting method is maximum likelihood estimation (MLE), which operates in two steps:
(i) for each given value of parameter \eqn{alpha} which regulates the level of selection,
 MLE is performed for all the remaining parameters, using a Newton-Raphson iteration;
(ii) a scan of the \eqn{alpha} axis builds the  profile log-likelihood function and
 its maximum point represents the overall MLE.

A detailed account of the underlying theory and the operational methodology is provided by Azzalini et al. (2019).
}

\examples{
set.seed(45)
data(DoctorRWM, package = "DiSSMod")
n0 <- 600
set.n0 <- sample(1:nrow(DoctorRWM), n0)
reduce_DoctorRWM <- DoctorRWM[set.n0,]
result0 <- DiSSMod(response = as.numeric(DOCVIS > 0) ~ AGE + INCOME_SCALE + HHKIDS + EDUC + MARRIED,
                   selection = PUBLIC ~ AGE + EDUC + FEMALE,
                   data = reduce_DoctorRWM, resp.dist="bernoulli", select.dist = "normal",
                   alpha = seq(-5.5, -0.5, length.out = 21), standard = TRUE)

print(result0)

data(CreditMDR, package = "DiSSMod")
n1 <- 600
set.n1 <- sample(1:nrow(CreditMDR), n1)
reduce_CreditMDR <- CreditMDR[set.n1,]
result1 <- DiSSMod(response = MAJORDRG ~ AGE + INCOME + EXP_INC,
                   selection = CARDHLDR ~ AGE + INCOME + OWNRENT + ADEPCNT + SELFEMPL,
                   data = reduce_CreditMDR, resp.dist="poi", select.dist = "logis",
                   alpha = seq(-0.3, 0.3,length.out = 21), standard = FALSE, verbose = 1)

print(result1)

}
\references{
Azzalini, A., Kim, H.-M. and Kim, H.-J. (2019) Sample selection
models for discrete and other non-Gaussian response variables.
 \emph{Statistical Methods & Applications}, \strong{28}, 27--56. First online 30 March 2018.
\url{https://doi.org/10.1007/s10260-018-0427-1}
}
\seealso{
The functions \code{\link[DiSSMod]{summary.DiSSMod}}, \code{\link[DiSSMod]{coef.DiSSMod}},
\code{\link[DiSSMod]{confint.DiSSMod}}, \code{\link[DiSSMod]{plot.DiSSMod}}
 are used to obtain and print a summary, coefficients, confidence interval and
 plot of the results.

The generic function \code{\link[stats]{logLik}} is used to obtain maximum log likelihood of the
result.

See also \code{\link[stats]{lm}}, \code{\link[stats]{glm}} and
\code{\link[stats]{formula}}.
}
\concept{Heckman model}
\concept{discrete response}
\concept{generalized linear models}
\concept{maximum likelihood}
\concept{sample selection}
\concept{selection models}
