| mboost-package {mboost} | R Documentation |
Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data.
| Package: | mboost |
| Type: | Package |
| Version: | 2.2-3 |
| Date: | 2013-09-09 |
| License: | GPL-2 |
| LazyLoad: | yes |
| LazyData: | yes |
This package is intended for modern regression modelling and stands
in-between classical generalized linear and additive models, as for example
implemented by lm, glm, or gam,
and machine-learning approaches for complex interactions models,
most prominently represented by gbm and
randomForest.
All functionality in this package is based on the generic
implementation of the optimization algorithm (function
mboost_fit) that allows for fitting linear, additive,
and interaction models (and mixtures of those) in low and
high dimensions. The response may be numeric, binary, ordered,
censored or count data.
Both theory and applications are discussed by Buehlmann and Hothorn (2007).
UseRs without a basic knowledge of boosting methods are asked
to read this introduction before analyzing data using this package.
The examples presented in this paper are available as package vignette
mboost_illustrations.
Note that the model fitting procedures in this package DO NOT automatically determine an appropriate model complexity. This task is the responsibility of the data analyst.
Starting from version 2.2, the default for the degrees of freedom has changed. Now the degrees of freedom are (per default) defined as
df(λ) = trace(2S - S'S),
with smoother matrix
S = X(X'X + λ K)^(-1)
X (see Hofner et al., 2011). Earlier versions used the trace of the
smoother matrix \mathrm{df}(λ) = \mathrm{trace}(S) as
degrees of freedom. One can change the deployed definition using
options(mboost_dftraceS = TRUE) (see also B. Hofner et al.,
2011 and bols).
Other important changes inlclude:
We switched from packages multicore and snow to
parallel
We changed the behavior of bols(x, intercept = FALSE)
when x is a factor: now the intercept is simply dropped from
the design matrix and the coding can be specified as usually for
factors. Addtionally, a new contrast is introduced:
"contr.dummy" (see bols for details).
We changed the computation of B-spline basis at the boundaries; B-splines now also use equidistant knots in the boundaries (per default).
For more changes see NEWS file.
In the 2.1 series, we added multiple new base-learners including
bmono (monotonic effects), brad (radial
basis functions) and bmrf (Markov random fields), and
extended bbs to incorporate cyclic splines (via argument
cyclic = TRUE). We also changed the default df for
bspatial to 6.
Starting from this version, we now also automatically center the
variables in glmboost (argument center = TRUE).
A complete list of changes can be found in the NEWS file.
Version 2.0 comes with new features, is faster and more accurate
in some aspects. In addition, some changes to the user interface
were necessary: Subsetting mboost objects changes the object.
At each time, a model is associated with a number of boosting iterations
which can be changed (increased or decreased) using the subset operator.
The center argument in bols was renamed
to intercept. Argument z renamed to by.
The base-learners bns and bss are deprecated
and replaced by bbs (which results in qualitatively the
same models but is computationally much more attractive).
New features include new families (for example for ordinal regression)
and the which argument to the coef and predict
methods for selecting interesting base-learners. Predict
methods are much faster now.
The memory consumption could be reduced considerably,
thanks to sparse matrix technology in package Matrix.
Resampling procedures run automatically in parallel
on OSes where parallelization via package parallel is available.
The most important advancement is a generic implementation
of the optimizer in function mboost_fit.
Torsten Hothorn Torsten.Hothorn@R-project.org,
Peter Buehlmann, Thomas Kneib, Matthias Schmid and
Benjamin Hofner
Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.
Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Mattthias Schmid and Benjamin Hofner (2010), Model-based Boosting 2.0. Journal of Machine Learning Research, 11, 2109–2113.
Benjamin Hofner, Torsten Hothorn, Thomas Kneib, and Matthias Schmid (2011), A framework for unbiased model selection based on boosting. Journal of Computational and Graphical Statistics, 20, 956–971.
Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid
(2012). Model-based Boosting in R: A Hands-on Tutorial Using the R
Package mboost. Department of Statistics, Technical Report No. 120.
http://epub.ub.uni-muenchen.de/12754/
Available as vignette via: vignette(package = "mboost", "mboost_tutorial")
The main fitting functions include:
gamboost for boosted (generalized) additive models,
glmboost for boosted linear models and
blackboost for boosted trees.
See there for more details and further links.
data("bodyfat")
set.seed(290875)
### model conditional expectation of DEXfat given
model <- mboost(DEXfat ~
bols(age) + ### a linear function of age
btree(hipcirc, waistcirc) + ### a non-linear interaction of
### hip and waist circumference
bbs(kneebreadth), ### a smooth function of kneebreadth
data = bodyfat, control = boost_control(mstop = 100))
### bootstrap for assessing `optimal' number of boosting iterations
cvm <- cvrisk(model, papply = lapply)
### restrict model to mstop(cvm)
model[mstop(cvm), return = FALSE]
mstop(model)
### plot age and kneebreadth
layout(matrix(1:2, nc = 2))
plot(model, which = c("age", "kneebreadth"))
### plot interaction of hip and waist circumference
attach(bodyfat)
nd <- expand.grid(hipcirc = h <- seq(from = min(hipcirc),
to = max(hipcirc),
length = 100),
waistcirc = w <- seq(from = min(waistcirc),
to = max(waistcirc),
length = 100))
plot(model, which = 2, newdata = nd)
detach(bodyfat)
### customized plot
layout(1)
pr <- predict(model, which = "hip", newdata = nd)
persp(x = h, y = w, z = matrix(pr, nrow = 100, ncol = 100))