| glmFit {edgeR} | R Documentation |
Fit a negative binomial generalized log-linear model to the read counts for each gene or transcript. Conduct genewise statistical tests for a given coefficient or coefficient contrast.
## S3 method for class 'DGEList'
glmFit(y, design=NULL, dispersion=NULL, offset=NULL, weights=NULL, lib.size=NULL,
prior.count=0.125, start=NULL, method="auto", ...)
glmLRT(glmfit, coef=ncol(glmfit$design), contrast=NULL, test="chisq")
glmQLFTest(y, design=NULL, dispersion=NULL, coef=ncol(glmfit$design), contrast=NULL, abundance.trend=TRUE, robust=FALSE, winsor.tail.p=c(0.05,0.1), plot=FALSE)
y |
an object that contains the raw counts for each library (the measure of expression level); alternatively, a matrix of counts, or a |
design |
numeric matrix giving the design matrix for the tagwise linear models. Must be of full column rank. Defaults to a single column of ones, equivalent to treating the columns as replicate libraries. |
dispersion |
numeric scalar or vector of negative binomial dispersions. Can be a common value for all tags, or a vector of values can provide a unique dispersion value for each tag. If |
offset |
numeric matrix of same size as |
weights |
optional numeric matrix giving prior weights for the observations (for each library and transcript) to be used in the GLM calculations. Not supported by methods |
lib.size |
numeric vector of length |
prior.count |
average prior count to be added to observation to shrink the estimated log-fold-changes towards zero. |
start |
optional numeric matrix of initial estimates for the linear model coefficients. |
method |
which fitting algorithm to use. Possible values are |
... |
other arguments are passed to lower-level functions, for example to |
glmfit |
a |
coef |
integer or character vector indicating which coefficients of the linear model are to be tested equal to zero. Values must be columns or column names of |
contrast |
numeric vector or matrix specifying one or more contrasts of the linear model coefficients to be tested equal to zero. Number of rows must equal to the number of columns of |
test |
which test (distribution) to use in calculating the p-values. Possible values are |
abundance.trend |
logical, whether to allow an abundance-dependent trend when estimating the prior values for the quasi-likelihood multiplicative dispersion parameter. |
robust |
logical, whether to estimate the prior.df robustly. |
winsor.tail.p |
numeric vector of length 2 giving proportion to trim (Winsorize) from lower and upper tail of the distribution of genewise deviances when estimating the hyperparameters. Positive values produce robust empirical Bayes ignoring outlier small or large deviances. Only used when |
plot |
logical, whether to plot the quasi-likelihood dispersion estimates vs abundance |
glmFit and glmLRT implement generalized linear model (glm) methods developed by McCarthy et al (2012).
glmFit fits genewise negative binomial glms, all with the same design matrix but possibly different dispersions, offsets and weights.
When the design matrix defines a one-way layout, or can be re-parametrized to a one-way layout, the glms are fitting very quickly using mglmOneGroup.
Otherwise the default fitting method, implemented in mglmLS, is a parallelized line search algorithm described by McCarthy et al (2012).
Other possible fitting methods are mglmLevenberg and mglmSimple.
Positive prior.count cause the returned coefficients to be shrunk in such a way that fold-changes between the treatment conditions are decreased.
In particular, infinite fold-changes are avoided.
Larger values cause more shrinkage.
The returned coefficients are affected but not the likelihood ratio tests or p-values.
glmLRT conducts likelihood ratio tests for one or more coefficients in the linear model.
If coef is used, the null hypothesis is that all the coefficients indicated by coef are equal to zero.
If contrast is non-null, then the null hypothesis is that the specified contrast of the coefficients is equal to zero.
For example, a contrast of c(0,1,-1), assuming there are three coefficients, would test the hypothesis that the second and third coefficients are equal.
glmQLFTest implements the quasi-likelihood method of Lund et al (2012).
It behaves the same as glmLRT except that it replaces likelihood ratio tests with
quasi-likelihood F-tests for coefficients in the linear model.
This function calls the limma function squeezeVar to conduct empirical Bayes smoothing of the genewise multiplicative dispersions.
Note that the QuasiSeq package provides a alternative implementation of Lund et al (2012),
with slightly different glm, trend and FDR methods.
glmFit produces an object of class DGEGLM containing components counts, samples, genes and abundance from y plus the following new components:
design |
design matrix as input. |
weights |
matrix of weights as input. |
df.residual |
numeric vector of residual degrees of freedom, one for each tag. |
offset |
numeric matrix of linear model offsets. |
dispersion |
vector of dispersions used for the fit. |
coefficients |
numeric matrix of estimated coefficients from the glm fits, on the natural log scale, of size |
fitted.values |
matrix of fitted values from glm fits, same number of rows and columns as |
deviance |
numeric vector of deviances, one for each tag. |
glmLRT and glmQFTest produce objects of class DGELRT with the same components as for glmfit plus the following:
table |
data frame with the same rows as |
comparison |
character string describing the coefficient or the contrast being tested. |
The data frame table contains the following columns:
logFC |
log2-fold change of expression between conditions being tested. |
logCPM |
average log2-counts per million, the average taken over all libraries in |
LR |
likelihood ratio statistics (only for |
F |
F-statistics (only for |
PValue |
p-values. |
Davis McCarthy and Gordon Smyth
McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297. http://nar.oxfordjournals.org/content/40/10/4288
Lund, SP, Nettleton, D, McCarthy, DJ, and Smyth, GK (2012). Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical Applications in Genetics and Molecular Biology Volume 11, Issue 5, Article 8. http://www.statsci.org/smyth/pubs/QuasiSeqPreprint.pdf
Low-level computations are done by mglmOneGroup, mglmLS, mglmLevenberg or mglmSimple.
topTags displays results from glmLRT or glmQLFTest.
The QuasiSeq package gives an alternative implementation of glmQLFTest based on the same statistical ideas.
nlibs <- 3
ntags <- 100
dispersion.true <- 0.1
# Make first transcript respond to covariate x
x <- 0:2
design <- model.matrix(~x)
beta.true <- cbind(Beta1=2,Beta2=c(2,rep(0,ntags-1)))
mu.true <- 2^(beta.true %*% t(design))
# Generate count data
y <- rnbinom(ntags*nlibs,mu=mu.true,size=1/dispersion.true)
y <- matrix(y,ntags,nlibs)
colnames(y) <- c("x0","x1","x2")
rownames(y) <- paste("Gene",1:ntags,sep="")
d <- DGEList(y)
# Normalize
d <- calcNormFactors(d)
# Fit the NB GLMs
fit <- glmFit(d, design, dispersion=dispersion.true)
# Likelihood ratio tests for trend
results <- glmLRT(fit, coef=2)
topTags(results)
# Estimate the dispersion (may be unreliable with so few tags)
d <- estimateGLMCommonDisp(d, design, verbose=TRUE)