% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cat_plot.R
\name{cat_plot}
\alias{cat_plot}
\title{Plot interaction effects between categorical predictors.}
\usage{
cat_plot(
  model,
  pred,
  modx = NULL,
  mod2 = NULL,
  data = NULL,
  geom = c("point", "line", "bar"),
  pred.values = NULL,
  modx.values = NULL,
  mod2.values = NULL,
  interval = TRUE,
  plot.points = FALSE,
  point.shape = FALSE,
  vary.lty = FALSE,
  centered = "all",
  int.type = c("confidence", "prediction"),
  int.width = 0.95,
  line.thickness = 1.1,
  point.size = 1.5,
  pred.point.size = 3.5,
  jitter = 0.1,
  geom.alpha = NULL,
  dodge.width = NULL,
  errorbar.width = NULL,
  interval.geom = c("errorbar", "linerange"),
  outcome.scale = "response",
  robust = FALSE,
  cluster = NULL,
  vcov = NULL,
  pred.labels = NULL,
  modx.labels = NULL,
  mod2.labels = NULL,
  set.offset = 1,
  x.label = NULL,
  y.label = NULL,
  main.title = NULL,
  legend.main = NULL,
  colors = "CUD Bright",
  partial.residuals = FALSE,
  point.alpha = 0.6,
  color.class = NULL,
  at = NULL,
  ...
)
}
\arguments{
\item{model}{A regression model. The function is tested with \code{lm},
\code{glm}, \code{\link[survey]{svyglm}}, \code{\link[lme4]{merMod}},
\code{\link[quantreg]{rq}}, \code{\link[brms]{brmsfit}},
\code{stanreg} models.
Models from other classes may work as well but are not officially
supported. The model should include the interaction of interest.}

\item{pred}{A categorical predictor variable that will appear on the x-axis.
Note that it is evaluated using \code{rlang}, so programmers can use the \verb{!!}
syntax to pass variables instead of the verbatim names.}

\item{modx}{A categorical moderator variable.}

\item{mod2}{For three-way interactions, the second categorical moderator.}

\item{data}{Optional, default is NULL. You may provide the data used to
fit the model. This can be a better way to get mean values for centering
and can be crucial for models with variable transformations in the formula
(e.g., \code{log(x)}) or polynomial terms (e.g., \code{poly(x, 2)}). You will
see a warning if the function detects problems that would likely be
solved by providing the data with this argument and the function will
attempt to retrieve the original data from the global environment.}

\item{geom}{What type of plot should this be? There are several options
here since the best way to visualize categorical interactions varies by
context. Here are the options:
\itemize{
\item \code{"point"}: The default. Simply plot the point estimates. You may want to
use \code{point.shape = TRUE} with this and you should also consider
\code{interval = TRUE} to visualize uncertainty.
\item \code{"line"}: This connects observations across levels of the \code{pred}
variable with a line. This is a good option when the \code{pred} variable
is ordinal (ordered). You may still consider \code{point.shape = TRUE} and
\code{interval = TRUE} is still a good idea.
\item \code{"bar"}: A bar chart. Some call this a "dynamite plot."
Many applied researchers advise against this type of plot because it
does not represent the distribution of the observed data or the
uncertainty of the predictions very well. It is best to at least use the
\code{interval = TRUE} argument with this geom.
}}

\item{pred.values}{Which values of the predictor should be included in the
plot? By default, all levels are included.}

\item{modx.values}{For which values of the moderator should lines be
plotted? There are two basic options:
\itemize{
\item A vector of values (e.g., \code{c(1, 2, 3)})
\item A single argument asking to calculate a set of values. See details
below.
}

Default is \code{NULL}. If \code{NULL} (or \code{mean-plus-minus}),
then the customary +/- 1 standard
deviation from the mean as well as the mean itself are used for continuous
moderators. If \code{"plus-minus"}, plots lines when the moderator is at
+/- 1 standard deviation without the mean. You may also choose \code{"terciles"}
to split the data into equally-sized groups and choose the point at the
mean of each of those groups.

If the moderator is a factor variable and \code{modx.values} is
\code{NULL}, each level of the factor is included. You may specify
any subset of the factor levels (e.g., \code{c("Level 1", "Level 3")}) as long
as there is more than 1. The levels will be plotted in the order you
provide them, so this can be used to reorder levels as well.}

\item{mod2.values}{For which values of the second moderator should the plot
be
facetted by? That is, there will be a separate plot for each level of this
moderator. Defaults are the same as \code{modx.values}.}

\item{interval}{Logical. If \code{TRUE}, plots confidence/prediction
intervals.}

\item{plot.points}{Logical. If \code{TRUE}, plots the actual data points as a
scatterplot on top of the interaction lines. Note that if
\code{geom = "bar"}, this will cause the bars to become transparent so you can
see the points.}

\item{point.shape}{For plotted points---either of observed data or predicted
values with the "point" or "line" geoms---should the shape of the points
vary by the values of the factor? This is especially useful if you aim to
be black and white printing- or colorblind-friendly.}

\item{vary.lty}{Should the resulting plot have different shapes for each
line in addition to colors? Defaults to \code{TRUE}.}

\item{centered}{A vector of quoted variable names that are to be
mean-centered. If \code{"all"}, all non-focal predictors are centered. You
may instead pass a character vector of variables to center. User can
also use "none" to base all predictions on variables set at 0.
The response variable, \code{pred}, \code{modx}, and \code{mod2} variables are never
centered.}

\item{int.type}{Type of interval to plot. Options are "confidence" or
"prediction". Default is confidence interval.}

\item{int.width}{How large should the interval be, relative to the standard
error? The default, .95, corresponds to roughly 1.96 standard errors and
a .05 alpha level for values outside the range. In other words, for a
confidence interval, .95 is analogous to a 95\% confidence interval.}

\item{line.thickness}{How thick should the plotted lines be? Default is 1.}

\item{point.size}{What size should be used for observed data when
\code{plot.points} is TRUE? Default is 1.5.}

\item{pred.point.size}{If TRUE and \code{geom} is \code{"point"} or \code{"line"},
sets the size of the predicted points. Default is 3.5.
Note the distinction from \code{point.size}, which refers to the
observed data points.}

\item{jitter}{How much should \code{plot.points} observed values be "jittered"
via \code{\link[ggplot2:position_jitter]{ggplot2::position_jitter()}}? When there are many points near each
other, jittering moves them a small amount to keep them from
totally overlapping. In some cases, though, it can add confusion since
it may make points appear to be outside the boundaries of observed
values or cause other visual issues. Default is 0.1, but increase as
needed if your points are overlapping too much or set to 0 for no jitter.
If the argument is a vector with two values, then the first is assumed to
be the jitter for width and the second for the height.}

\item{geom.alpha}{What should the alpha aesthetic be for the plotted
lines/bars? Default is NULL, which means it is set depending on the value
of \code{geom} and \code{plot.points}.}

\item{dodge.width}{What should the \code{width} argument to
\code{\link[ggplot2:position_dodge]{ggplot2::position_dodge()}} be? Default is NULL, which means it is set
depending on the value of \code{geom}.}

\item{errorbar.width}{How wide should the error bars be? Default is NULL,
meaning it is set depending on the value \code{geom}. Ignored if \code{interval}
is FALSE.}

\item{interval.geom}{For categorical by categorical interactions.
One of "errorbar" or "linerange". If the former,
\code{\link[ggplot2:geom_linerange]{ggplot2::geom_errorbar()}} is used. If the latter,
\code{\link[ggplot2:geom_linerange]{ggplot2::geom_linerange()}} is used.}

\item{outcome.scale}{For nonlinear models (i.e., GLMs), should the outcome
variable be plotted on the link scale (e.g., log odds for logit models) or
the original scale (e.g., predicted probabilities for logit models)? The
default is \code{"response"}, which is the original scale. For the link
scale, which will show straight lines rather than curves, use
\code{"link"}.}

\item{robust}{Should robust standard errors be used to find confidence
intervals for supported models? Default is FALSE, but you should specify
the type of sandwich standard errors if you'd like to use them (i.e.,
\code{"HC0"}, \code{"HC1"}, and so on). If \code{TRUE}, defaults to \code{"HC3"} standard
errors.}

\item{cluster}{For clustered standard errors, provide the column name of
the cluster variable in the input data frame (as a string). Alternately,
provide a vector of clusters.}

\item{vcov}{Optional. You may supply the variance-covariance matrix of the
coefficients yourself. This is useful if you are using some method for
robust standard error calculation not supported by the \pkg{sandwich}
package.}

\item{pred.labels}{A character vector of equal length to the number of
factor levels of the predictor (or number specified in \code{predvals}). If
\code{NULL}, the default, the factor labels are used.}

\item{modx.labels}{A character vector of labels for each level of the
moderator values, provided in the same order as the \code{modx.values}
argument. If \code{NULL}, the values themselves are used as labels unless
\code{modx,values} is also \code{NULL}. In that case, "+1 SD" and "-1 SD"
are used.}

\item{mod2.labels}{A character vector of labels for each level of the 2nd
moderator values, provided in the same order as the \code{mod2.values}
argument. If \code{NULL}, the values themselves are used as labels unless
\code{mod2.values} is also \code{NULL}. In that case, "+1 SD" and "-1 SD"
are used.}

\item{set.offset}{For models with an offset (e.g., Poisson models), sets an
offset for the predicted values. All predicted values will have the same
offset. By default, this is set to 1, which makes the predicted values a
proportion. See details for more about offset support.}

\item{x.label}{A character object specifying the desired x-axis label. If
\code{NULL}, the variable name is used.}

\item{y.label}{A character object specifying the desired x-axis label. If
\code{NULL}, the variable name is used.}

\item{main.title}{A character object that will be used as an overall title
for the plot. If \code{NULL}, no main title is used.}

\item{legend.main}{A character object that will be used as the title that
appears above the legend. If \code{NULL}, the name of the moderating
variable is used.}

\item{colors}{Any palette argument accepted by
\code{\link[ggplot2]{scale_colour_brewer}}. Default is "Set2".
You may also simply supply a vector of colors accepted by
\code{ggplot2} and of equal length to the number of moderator levels.}

\item{partial.residuals}{Instead of plotting the observed data, you may plot
the partial residuals (controlling for the effects of variables besides
\code{pred}).}

\item{point.alpha}{What should the \code{alpha} aesthetic for plotted points of
observed data be? Default is 0.6, and it can range from 0 (transparent) to
1 (opaque).}

\item{color.class}{Deprecated. Now known as \code{colors}.}

\item{at}{If you want to manually set the values of other variables in the
model, do so by providing a named list where the names are the variables
and the list values are vectors of the values. This can be useful
especially when you are exploring interactions or other conditional
predictions.}

\item{...}{extra arguments passed to \code{make_predictions}}
}
\value{
The functions returns a \code{ggplot} object, which can be treated
like a user-created plot and expanded upon as such.
}
\description{
\code{cat_plot} is a complementary function to \code{\link[=interact_plot]{interact_plot()}} that is designed
for plotting interactions when both predictor and moderator(s) are
categorical (or, in R terms, factors).
}
\details{
This function provides a means for plotting conditional effects
for the purpose of exploring interactions in the context of regression.
You must have the
package \code{ggplot2} installed to benefit from these plotting functions.

The function is designed for two and three-way interactions. For
additional terms, the
\code{\link[effects]{effects}} package may be better suited to the task.

This function supports nonlinear and generalized linear models and by
default will plot them on
their original scale (\code{outcome.scale = "response"}).

While mixed effects models from \code{lme4} are supported, only the fixed
effects are plotted. \code{lme4} does not provide confidence intervals,
so they are not supported with this function either.

Note: to use transformed predictors, e.g., \code{log(variable)},
provide only the variable name to \code{pred}, \code{modx}, or \code{mod2} and supply
the original data separately to the \code{data} argument.

\emph{Info about offsets:}

Offsets are partially supported by this function with important
limitations. First of all, only a single offset per model is supported.
Second, it is best in general to specify offsets with the offset argument
of the model fitting function rather than in the formula. You are much
more likely to have success if you provide the data used to fit the model
with the \code{data} argument.
}
\examples{

library(ggplot2)
fit <- lm(price ~ cut * color, data = diamonds)
cat_plot(fit, pred = color, modx = cut, interval = TRUE)

# 3-way interaction

## Will first create a couple dichotomous factors to ensure full rank
mpg2 <- mpg
mpg2$auto <- "auto"
mpg2$auto[mpg2$trans \%in\% c("manual(m5)", "manual(m6)")] <- "manual"
mpg2$auto <- factor(mpg2$auto)
mpg2$fwd <- "2wd"
mpg2$fwd[mpg2$drv == "4"] <- "4wd"
mpg2$fwd <- factor(mpg2$fwd)
## Drop the two cars with 5 cylinders (rest are 4, 6, or 8)
mpg2 <- mpg2[mpg2$cyl != "5",]
mpg2$cyl <- factor(mpg2$cyl)
## Fit the model
fit3 <- lm(cty ~ cyl * fwd * auto, data = mpg2)

# The line geom looks good for an ordered factor predictor
cat_plot(fit3, pred = cyl, modx = fwd, mod2 = auto, geom = "line",
 interval = TRUE)

}
