API
The exported symbols from this package define its interface. Some symbols from other packages are re-exported for convenience. Fields of objects with composite types should not be accessed directly; the internals of any given structure may change at any time and this would not be considered a breaking change.
Fitting a model
BetaRegression.BetaRegressionModel
— TypeBetaRegressionModel{T,L1,L2,V,M} <: RegressionModel
Type representing a regression model for beta-distributed response values in the open interval (0, 1), as described by Ferrari and Cribari-Neto (2004).
The mean response is linked to the linear predictor by a link function with type L1 <: Link01
, i.e. the link must map $(0, 1) \mapsto \mathbb{R}$ and use the GLM package's interface for link functions. While there is no canonical link function for the beta regression model as there is for GLMs, logit is the most common choice.
The precision is transformed by a link function with type L2 <: Link
which should map $\mathbb{R} \mapsto \mathbb{R}$ or, ideally, $(0, \infty) \mapsto \mathbb{R}$ because the precision must be positive. The most common choices are the identity, log, and square root links.
BetaRegression.BetaRegressionModel
— MethodBetaRegressionModel(X, y, link=LogitLink(), precisionlink=IdentityLink();
weights=nothing, offset=nothing)
Construct a BetaRegressionModel
object with the given model matrix X
, response y
, mean link function link
, precision link function precisionlink
, and optionally weights
and offset
. Note that the returned object is not fit until fit!
is called on it.
Support for user-provided weights is currently incomplete; passing a value other than nothing
or an empty array for weights
will result in an error for now.
StatsAPI.fit
— Methodfit(BetaRegressionModel, formula, data, link=LogitLink(), precisionlink=IdentityLink();
kwargs...)
Fit a BetaRegressionModel
to the given table data
, which may be any Tables.jl-compatible table (e.g. a DataFrame
), using the given formula
, which can be constructed using @formula
. In this method, the response and model matrix are determined from the formula and table. It is also possible to provide them explicitly.
fit(BetaRegressionModel, X::AbstractMatrix, y::AbstractVector, link=LogitLink(),
precisionlink=IdentityLink(); kwargs...)
Fit a beta regression model using the provided model matrix X
and response vector y
. In both of these methods, a link function may be provided, otherwise the default logit link is used. Similarly, a link for the precision may be provided, otherwise the default identity link is used.
Keyword Arguments
weights
: A vector of weights ornothing
(default). Currently onlynothing
is accepted.offset
: An offset vector to be added to the linear predictor ornothing
(default).maxiter
: Maximum number of Fisher scoring iterations to use when fitting. Default is 100.atol
: Absolute tolerance to use when checking for model convergence. Default issqrt(eps(T))
whereT
is the type of the estimates.rtol
: Relative tolerance to use when checking for convergence. Default is the Base default relative tolerance forT
.
If you experience convergence issues, you may consider trying a different link for the precision; LogLink()
is a common choice. Increasing the maximum number of iterations may also be beneficial, especially when working with Float32
.
StatsAPI.fit!
— Methodfit!(b::BetaRegressionModel{T}; maxiter=100, atol=sqrt(eps(T)), rtol=Base.rtoldefault(T))
Fit the given BetaRegressionModel
, updating its values in-place. If model convergence is achieved, b
is returned, otherwise a ConvergenceException
is thrown.
Fitting the model consists of computing the maximum likelihood estimates for the coefficients and precision parameter via Fisher scoring with analytic derivatives. The model is determined to have converged when the score vector, i.e. the vector of first partial derivatives of the log likelihood with respect to the parameters, is approximately zero. This is determined by isapprox
using the specified atol
and rtol
. maxiter
dictates the maximum number of Fisher scoring iterations.
Properties of a model
StatsAPI.aic
— Functionaic(model::StatisticalModel)
Akaike's Information Criterion, defined as $-2 \log L + 2k$, with $L$ the likelihood of the model, and k
its number of consumed degrees of freedom (as returned by dof
).
StatsAPI.aicc
— Functionaicc(model::StatisticalModel)
Corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989), defined as $-2 \log L + 2k + 2k(k-1)/(n-k-1)$, with $L$ the likelihood of the model, $k$ its number of consumed degrees of freedom (as returned by dof
), and $n$ the number of observations (as returned by nobs
).
StatsAPI.bic
— FunctionStatsAPI.coef
— Methodcoef(model::BetaRegressionModel)
Return a copy of the vector of regression coefficients $\mathbf{\beta}$.
StatsAPI.coefnames
— Methodcoefnames(model::TableRegressionModel{<:BetaRegressionModel})
For a BetaRegressionModel
fit using a table and @formula
, return the names of the coefficients as a vector of strings. The precision term is included as the last element in the array and has name "(Precision)"
.
StatsAPI.coeftable
— Methodcoeftable(model::BetaRegressionModel; level=0.95)
Return a table of the point estimates of the model parameters, their respective standard errors, $z$-statistics, Wald $p$-values, and confidence intervals at the given level
. The precision parameter is included as the last row in the table.
The object returned by this function implements the Tables.jl interface for tabular data.
StatsAPI.confint
— Methodconfint(model::BetaRegressionModel; level=0.95)
For a model with $p$ regression coefficients, return a $(p + 1) \times 2$ matrix of confidence intervals for the estimated coefficients and precision at the given level
.
StatsAPI.deviance
— Methoddeviance(model::BetaRegressionModel)
Compute the deviance of the model, defined as the sum of the squared deviance residuals.
See also: devresid
GLM.devresid
— Methoddevresid(model::BetaRegressionModel)
Compute the signed deviance residuals of the model,
\[\mathrm{sgn}(y_i - \hat{y}_i) \sqrt{2 \lvert \ell(y_i, \hat{\phi}) - \ell(\hat{y}_i, \hat{\phi}) \rvert}\]
where $\ell$ denotes the log likelihood, $y_i$ is the $i$th observed value of the response, $\hat{y}_i$ is the $i$th fitted value, and $\hat{\phi}$ is the estimated common precision parameter.
See also: deviance
StatsAPI.dof
— Methoddof(model::BetaRegressionModel)
Return the number of estimated parameters in the model. For a model with $p$ independent variables, this is $p + 1$, since the precision must also be estimated.
StatsAPI.dof_residual
— Methoddof_residual(model::BetaRegressionModel)
Return the residual degrees of freedom for the model, defined as nobs
minus dof
.
StatsAPI.fitted
— Functionfitted(model::RegressionModel)
Return the fitted values of the model.
StatsAPI.informationmatrix
— Methodinformationmatrix(model::BetaRegressionModel; expected=true)
Compute the information matrix of the model. By default, this is the Fisher information, i.e. the expected value of the matrix of second partial derivatives of loglikelihood
with respect to each element of params
. Set expected
to false
to obtain the observed information.
StatsAPI.linearpredictor
— Functionlinearpredictor(model::RegressionModel)
Return the model's linear predictor, Xβ
where X
is the model matrix and β
is the vector of coefficients, or Xβ + offset
if the model was fit with an offset.
GLM.Link
— MethodLink(model::BetaRegressionModel)
Return the link function $g$ that links the mean $\mu$ to the linear predictor $\eta$ by $\mu = g^{-1}(\eta)$.
StatsAPI.loglikelihood
— Functionloglikelihood(model::StatisticalModel)
loglikelihood(model::StatisticalModel, observation)
Return the log-likelihood of the model.
With an observation
argument, return the contribution of observation
to the log-likelihood of model
.
If observation
is a Colon
, return a vector of each observation's contribution to the log-likelihood of the model. In other words, this is the vector of the pointwise log-likelihood contributions.
In general, sum(loglikehood(model, :)) == loglikelihood(model)
.
StatsAPI.modelmatrix
— Functionmodelmatrix(model::RegressionModel)
Return the model matrix (a.k.a. the design matrix).
StatsAPI.nobs
— Methodnobs(model::BetaRegressionModel)
Return the effective number of observations used to fit the model. For weighted models, this is the number of nonzero weights, otherwise it's the number of elements of the response (or equivalently, the number of rows in the model matrix).
StatsAPI.offset
— Functionoffset(model::RegressionModel)
Return the offset used in the model, i.e. the term added to the linear predictor with known coefficient 1, or nothing
if the model was not fit with an offset.
StatsAPI.params
— Methodparams(model::BetaRegressionModel)
Return the vector of estimated model parameters $\theta = [\beta_1, \ldots, \beta_p, \phi]$, i.e. the regression coefficients and precision.
Mutating this array may invalidate the model object.
Base.precision
— Methodprecision(model::BetaRegressionModel)
Return the estimated precision parameter, $\phi$, for the model. This function returns $\phi$ on the natural scale, not on the precision link scale. This parameter is estimated alongside the regression coefficients and is included in coefficient tables, where it is displayed on the precision link scale.
BetaRegression.precisionlink
— Functionprecisionlink(model::BetaRegressionModel)
Return the link function $h$ that links the precision $\phi$ to the estimated constant parameter $\theta_{p+1}$ such that $\phi = h^{-1}(\theta_{p+1})$.
StatsAPI.predict
— Functionpredict(model::RegressionModel, [newX])
Form the predicted response of model
. An object with new covariate values newX
can be supplied, which should have the same type and structure as that used to fit model
; e.g. for a GLM it would generally be a DataFrame
with the same variable names as the original predictors.
StatsAPI.r2
— Methodr2(model::BetaRegressionModel)
r²(model::BetaRegressionModel)
Return the Pearson correlation between the linear predictor $\eta$ and the link-transformed response $g(y)$.
StatsAPI.residuals
— Functionresiduals(model::RegressionModel)
Return the residuals of the model.
StatsAPI.response
— Functionresponse(model::RegressionModel)
Return the model response (a.k.a. the dependent variable).
StatsAPI.responsename
— Methodresponsename(model::TableRegressionModel{<:BetaRegressionModel})
For a BetaRegressionModel
fit using a table and @formula
, return a string containing the left hand side of the formula, i.e. the model's response.
StatsAPI.score
— Methodscore(model::BetaRegressionModel)
Compute the score vector of the model, i.e. the vector of first partial derivatives of loglikelihood
with respect to each element of params
.
See also: informationmatrix
StatsAPI.stderror
— Methodstderror(model::BetaRegressionModel)
Return the standard errors of the estimated model parameters, including both the regression coefficients and the precision.
See also: vcov
StatsAPI.vcov
— Methodvcov(model::BetaRegressionModel)
Compute the variance-covariance matrix of the model, i.e. the inverse of the Fisher information matrix.
See also: stderror
, informationmatrix
StatsAPI.weights
— Functionweights(model::StatisticalModel)
Return the weights used in the model.
There is a subtlety here that bears repeating. The function coef
does not include the precision term, only the regression coefficients, so for a model with $p$ independent variables, coef
will return a vector of length $p$. A number of other functions, such as informationmatrix
, vcov
, stderror
, etc., do include the precision term, and thus will return an array with (non-singleton) dimension $p + 1$. While this difference may seem strange at first blush, the design was chosen intentionally to ensure that the model matrix and regression coefficient vector are conformable for multiplication. Use params
to retrieve the full parameter vector with length $p + 1$.
Link functions
This package employs the system for link functions defined by the GLM.jl package. In short, each link function has its own concrete type which subtypes Link
. Some may actually subtype Link01
, which is itself a subtype of Link
; this denotes that the function's domain is the open unit interval, $(0, 1)$. Link functions are applied with linkfun
and their inverse is applied with linkinv
. Relevant docstrings from GLM.jl are reproduced below.
Any mention of "the" link function for a BetaRegressionModel
refers to that applied to the mean (at least in this document). However, despite only having one linear predictor, BetaRegressionModel
s actually have two link functions: one for the mean and one for the precision.
Mean
GLM.Link01
— TypeLink01
An abstract subtype of Link
which are links defined on (0, 1)
GLM.LogitLink
— TypeLogitLink
The canonical Link01
for Distributions.Bernoulli
and Distributions.Binomial
. The inverse link, linkinv
, is the c.d.f. of the standard logistic distribution, Distributions.Logistic
.
GLM.CauchitLink
— TypeCauchitLink
A Link01
corresponding to the standard Cauchy distribution, Distributions.Cauchy
.
GLM.CloglogLink
— TypeCloglogLink
A Link01
corresponding to the extreme value (or log-Weibull) distribution. The link is the complementary log-log transformation, log(1 - log(-μ))
.
GLM.ProbitLink
— TypeProbitLink
A Link01
whose linkinv
is the c.d.f. of the standard normal distribution, Distributions.Normal()
.
Precision
GLM.IdentityLink
— TypeIdentityLink
The canonical Link
for the Normal
distribution, defined as η = μ
.
GLM.InverseLink
— TypeInverseLink
The canonical Link
for Distributions.Gamma
distribution, defined as η = inv(μ)
.
GLM.InverseSquareLink
— TypeInverseSquareLink
The canonical Link
for Distributions.InverseGaussian
distribution, defined as η = inv(abs2(μ))
.
GLM.LogLink
— TypeLogLink
The canonical Link
for Distributions.Poisson
, defined as η = log(μ)
.
GLM.PowerLink
— TypePowerLink
A Link
defined as η = μ^λ
when λ ≠ 0
, and to η = log(μ)
when λ = 0
, i.e. the class of transforms that use a power function or logarithmic function.
Many other links are special cases of PowerLink
:
IdentityLink
when λ = 1.SqrtLink
when λ = 0.5.LogLink
when λ = 0.InverseLink
when λ = -1.InverseSquareLink
when λ = -2.
GLM.SqrtLink
— TypeSqrtLink
A Link
defined as η = √μ
Developer documentation
This section documents some functions that are not user facing (and are thus not exported) and may be removed at any time. They're included here for the benefit of anyone looking to contribute to the package and wondering how certain internals work. Other internal functions may be documented with comments in the source code rather than with docstrings; read the source directly for more information on those.
BetaRegression.dmueta
— Functiondmueta(link::Link, η)
Return the second derivative of linkinv
, $\frac{\partial^2 \mu}{\partial \eta^2}$, of the link function link
evaluated at the linear predictor value η
. A method of this function must be defined for a particular link function in order to compute the observed information matrix.
BetaRegression.initialize!
— Functioninitialize!(b::BetaRegressionModel)
Initialize the given BetaRegressionModel
by computing starting points for the parameter estimates and return the updated model object. The initial estimates are based on those from a linear regression model with the same model matrix as b
but with linkfun.(Link(b), response(b))
as the response.
If the initial estimate of the precision is invalid (not strictly positive) then it is taken instead to be 1 prior to applying the precision link function.