API

The exported symbols from this package define its interface. Some symbols from other packages are re-exported for convenience. Fields of objects with composite types should not be accessed directly; the internals of any given structure may change at any time and this would not be considered a breaking change.

Fitting a model

BetaRegression.BetaRegressionModelType
BetaRegressionModel{T,L1,L2,V,M} <: RegressionModel

Type representing a regression model for beta-distributed response values in the open interval (0, 1), as described by Ferrari and Cribari-Neto (2004).

The mean response is linked to the linear predictor by a link function with type L1 <: Link01, i.e. the link must map $(0, 1) \mapsto \mathbb{R}$ and use the GLM package's interface for link functions. While there is no canonical link function for the beta regression model as there is for GLMs, logit is the most common choice.

The precision is transformed by a link function with type L2 <: Link which should map $\mathbb{R} \mapsto \mathbb{R}$ or, ideally, $(0, \infty) \mapsto \mathbb{R}$ because the precision must be positive. The most common choices are the identity, log, and square root links.

source
BetaRegression.BetaRegressionModelMethod
BetaRegressionModel(X, y, link=LogitLink(), precisionlink=IdentityLink();
                    weights=nothing, offset=nothing)

Construct a BetaRegressionModel object with the given model matrix X, response y, mean link function link, precision link function precisionlink, and optionally weights and offset. Note that the returned object is not fit until fit! is called on it.

Warning

Support for user-provided weights is currently incomplete; passing a value other than nothing or an empty array for weights will result in an error for now.

source
StatsAPI.fitMethod
fit(BetaRegressionModel, formula, data, link=LogitLink(), precisionlink=IdentityLink();
    kwargs...)

Fit a BetaRegressionModel to the given table data, which may be any Tables.jl-compatible table (e.g. a DataFrame), using the given formula, which can be constructed using @formula. In this method, the response and model matrix are determined from the formula and table. It is also possible to provide them explicitly.

fit(BetaRegressionModel, X::AbstractMatrix, y::AbstractVector, link=LogitLink(),
    precisionlink=IdentityLink(); kwargs...)

Fit a beta regression model using the provided model matrix X and response vector y. In both of these methods, a link function may be provided, otherwise the default logit link is used. Similarly, a link for the precision may be provided, otherwise the default identity link is used.

Keyword Arguments

  • weights: A vector of weights or nothing (default). Currently only nothing is accepted.
  • offset: An offset vector to be added to the linear predictor or nothing (default).
  • maxiter: Maximum number of Fisher scoring iterations to use when fitting. Default is 100.
  • atol: Absolute tolerance to use when checking for model convergence. Default is sqrt(eps(T)) where T is the type of the estimates.
  • rtol: Relative tolerance to use when checking for convergence. Default is the Base default relative tolerance for T.
Tip

If you experience convergence issues, you may consider trying a different link for the precision; LogLink() is a common choice. Increasing the maximum number of iterations may also be beneficial, especially when working with Float32.

source
StatsAPI.fit!Method
fit!(b::BetaRegressionModel{T}; maxiter=100, atol=sqrt(eps(T)), rtol=Base.rtoldefault(T))

Fit the given BetaRegressionModel, updating its values in-place. If model convergence is achieved, b is returned, otherwise a ConvergenceException is thrown.

Fitting the model consists of computing the maximum likelihood estimates for the coefficients and precision parameter via Fisher scoring with analytic derivatives. The model is determined to have converged when the score vector, i.e. the vector of first partial derivatives of the log likelihood with respect to the parameters, is approximately zero. This is determined by isapprox using the specified atol and rtol. maxiter dictates the maximum number of Fisher scoring iterations.

source

Properties of a model

StatsAPI.aicFunction
aic(model::StatisticalModel)

Akaike's Information Criterion, defined as $-2 \log L + 2k$, with $L$ the likelihood of the model, and k its number of consumed degrees of freedom (as returned by dof).

StatsAPI.aiccFunction
aicc(model::StatisticalModel)

Corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989), defined as $-2 \log L + 2k + 2k(k-1)/(n-k-1)$, with $L$ the likelihood of the model, $k$ its number of consumed degrees of freedom (as returned by dof), and $n$ the number of observations (as returned by nobs).

StatsAPI.bicFunction
bic(model::StatisticalModel)

Bayesian Information Criterion, defined as $-2 \log L + k \log n$, with $L$ the likelihood of the model, $k$ its number of consumed degrees of freedom (as returned by dof), and $n$ the number of observations (as returned by nobs).

StatsAPI.coefnamesMethod
coefnames(model::TableRegressionModel{<:BetaRegressionModel})

For a BetaRegressionModel fit using a table and @formula, return the names of the coefficients as a vector of strings. The precision term is included as the last element in the array and has name "(Precision)".

source
StatsAPI.coeftableMethod
coeftable(model::BetaRegressionModel; level=0.95)

Return a table of the point estimates of the model parameters, their respective standard errors, $z$-statistics, Wald $p$-values, and confidence intervals at the given level. The precision parameter is included as the last row in the table.

The object returned by this function implements the Tables.jl interface for tabular data.

source
StatsAPI.confintMethod
confint(model::BetaRegressionModel; level=0.95)

For a model with $p$ regression coefficients, return a $(p + 1) \times 2$ matrix of confidence intervals for the estimated coefficients and precision at the given level.

source
StatsAPI.devianceMethod
deviance(model::BetaRegressionModel)

Compute the deviance of the model, defined as the sum of the squared deviance residuals.

See also: devresid

source
GLM.devresidMethod
devresid(model::BetaRegressionModel)

Compute the signed deviance residuals of the model,

\[\mathrm{sgn}(y_i - \hat{y}_i) \sqrt{2 \lvert \ell(y_i, \hat{\phi}) - \ell(\hat{y}_i, \hat{\phi}) \rvert}\]

where $\ell$ denotes the log likelihood, $y_i$ is the $i$th observed value of the response, $\hat{y}_i$ is the $i$th fitted value, and $\hat{\phi}$ is the estimated common precision parameter.

See also: deviance

source
StatsAPI.dofMethod
dof(model::BetaRegressionModel)

Return the number of estimated parameters in the model. For a model with $p$ independent variables, this is $p + 1$, since the precision must also be estimated.

source
StatsAPI.fittedFunction
fitted(model::RegressionModel)

Return the fitted values of the model.

StatsAPI.informationmatrixMethod
informationmatrix(model::BetaRegressionModel; expected=true)

Compute the information matrix of the model. By default, this is the Fisher information, i.e. the expected value of the matrix of second partial derivatives of loglikelihood with respect to each element of params. Set expected to false to obtain the observed information.

See also: vcov, score

source
StatsAPI.linearpredictorFunction
linearpredictor(model::RegressionModel)

Return the model's linear predictor, where X is the model matrix and β is the vector of coefficients, or Xβ + offset if the model was fit with an offset.

GLM.LinkMethod
Link(model::BetaRegressionModel)

Return the link function $g$ that links the mean $\mu$ to the linear predictor $\eta$ by $\mu = g^{-1}(\eta)$.

source
StatsAPI.loglikelihoodFunction
loglikelihood(model::StatisticalModel)
loglikelihood(model::StatisticalModel, observation)

Return the log-likelihood of the model.

With an observation argument, return the contribution of observation to the log-likelihood of model.

If observation is a Colon, return a vector of each observation's contribution to the log-likelihood of the model. In other words, this is the vector of the pointwise log-likelihood contributions.

In general, sum(loglikehood(model, :)) == loglikelihood(model).

StatsAPI.modelmatrixFunction
modelmatrix(model::RegressionModel)

Return the model matrix (a.k.a. the design matrix).

StatsAPI.nobsMethod
nobs(model::BetaRegressionModel)

Return the effective number of observations used to fit the model. For weighted models, this is the number of nonzero weights, otherwise it's the number of elements of the response (or equivalently, the number of rows in the model matrix).

source
StatsAPI.offsetFunction
offset(model::RegressionModel)

Return the offset used in the model, i.e. the term added to the linear predictor with known coefficient 1, or nothing if the model was not fit with an offset.

StatsAPI.paramsMethod
params(model::BetaRegressionModel)

Return the vector of estimated model parameters $\theta = [\beta_1, \ldots, \beta_p, \phi]$, i.e. the regression coefficients and precision.

Danger

Mutating this array may invalidate the model object.

See also: coef, precision

source
Base.precisionMethod
precision(model::BetaRegressionModel)

Return the estimated precision parameter, $\phi$, for the model. This function returns $\phi$ on the natural scale, not on the precision link scale. This parameter is estimated alongside the regression coefficients and is included in coefficient tables, where it is displayed on the precision link scale.

See also: coef, params

source
BetaRegression.precisionlinkFunction
precisionlink(model::BetaRegressionModel)

Return the link function $h$ that links the precision $\phi$ to the estimated constant parameter $\theta_{p+1}$ such that $\phi = h^{-1}(\theta_{p+1})$.

source
StatsAPI.predictFunction
predict(model::RegressionModel, [newX])

Form the predicted response of model. An object with new covariate values newX can be supplied, which should have the same type and structure as that used to fit model; e.g. for a GLM it would generally be a DataFrame with the same variable names as the original predictors.

StatsAPI.r2Method
r2(model::BetaRegressionModel)
r²(model::BetaRegressionModel)

Return the Pearson correlation between the linear predictor $\eta$ and the link-transformed response $g(y)$.

source
StatsAPI.residualsFunction
residuals(model::RegressionModel)

Return the residuals of the model.

StatsAPI.responseFunction
response(model::RegressionModel)

Return the model response (a.k.a. the dependent variable).

StatsAPI.responsenameMethod
responsename(model::TableRegressionModel{<:BetaRegressionModel})

For a BetaRegressionModel fit using a table and @formula, return a string containing the left hand side of the formula, i.e. the model's response.

source
StatsAPI.stderrorMethod
stderror(model::BetaRegressionModel)

Return the standard errors of the estimated model parameters, including both the regression coefficients and the precision.

See also: vcov

source
StatsAPI.weightsFunction
weights(model::StatisticalModel)

Return the weights used in the model.

There is a subtlety here that bears repeating. The function coef does not include the precision term, only the regression coefficients, so for a model with $p$ independent variables, coef will return a vector of length $p$. A number of other functions, such as informationmatrix, vcov, stderror, etc., do include the precision term, and thus will return an array with (non-singleton) dimension $p + 1$. While this difference may seem strange at first blush, the design was chosen intentionally to ensure that the model matrix and regression coefficient vector are conformable for multiplication. Use params to retrieve the full parameter vector with length $p + 1$.

This package employs the system for link functions defined by the GLM.jl package. In short, each link function has its own concrete type which subtypes Link. Some may actually subtype Link01, which is itself a subtype of Link; this denotes that the function's domain is the open unit interval, $(0, 1)$. Link functions are applied with linkfun and their inverse is applied with linkinv. Relevant docstrings from GLM.jl are reproduced below.

Any mention of "the" link function for a BetaRegressionModel refers to that applied to the mean (at least in this document). However, despite only having one linear predictor, BetaRegressionModels actually have two link functions: one for the mean and one for the precision.

Mean

GLM.Link01Type
Link01

An abstract subtype of Link which are links defined on (0, 1)

GLM.CloglogLinkType
CloglogLink

A Link01 corresponding to the extreme value (or log-Weibull) distribution. The link is the complementary log-log transformation, log(1 - log(-μ)).

Precision

GLM.IdentityLinkType
IdentityLink

The canonical Link for the Normal distribution, defined as η = μ.

Developer documentation

This section documents some functions that are not user facing (and are thus not exported) and may be removed at any time. They're included here for the benefit of anyone looking to contribute to the package and wondering how certain internals work. Other internal functions may be documented with comments in the source code rather than with docstrings; read the source directly for more information on those.

BetaRegression.dmuetaFunction
dmueta(link::Link, η)

Return the second derivative of linkinv, $\frac{\partial^2 \mu}{\partial \eta^2}$, of the link function link evaluated at the linear predictor value η. A method of this function must be defined for a particular link function in order to compute the observed information matrix.

source
BetaRegression.initialize!Function
initialize!(b::BetaRegressionModel)

Initialize the given BetaRegressionModel by computing starting points for the parameter estimates and return the updated model object. The initial estimates are based on those from a linear regression model with the same model matrix as b but with linkfun.(Link(b), response(b)) as the response.

If the initial estimate of the precision is invalid (not strictly positive) then it is taken instead to be 1 prior to applying the precision link function.

source