| npsigtest {np} | R Documentation |
npsigtest implements a consistent test of significance of
an explanatory variable in a nonparametric regression setting that is
analogous to a simple t-test in a parametric regression
setting. The test is based on Racine, Hart, and Li (2006) and
Racine (1997).
npsigtest(bws, ...)
## S3 method for class 'formula'
npsigtest(bws, data = NULL, ...)
## S3 method for class 'call'
npsigtest(bws, ...)
## S3 method for class 'npregression'
npsigtest(bws, ...)
## Default S3 method:
npsigtest(bws, xdat, ydat, ...)
## S3 method for class 'rbandwidth'
npsigtest(bws,
xdat = stop("data xdat missing"),
ydat = stop("data ydat missing"),
boot.num = 399,
boot.method = c("iid","wild","wild-rademacher"),
boot.type = c("I","II"),
index = seq(1,ncol(xdat)),
random.seed = 42,
...)
bws |
a bandwidth specification. This can be set as a |
data |
an optional data frame, list or environment (or object
coercible to a data frame by |
xdat |
a |
ydat |
a one (1) dimensional numeric or integer vector of dependent data, each
element |
boot.method |
a character string used to specify the bootstrap method.
|
boot.num |
an integer value specifying the number of bootstrap replications to
use. Defaults to |
boot.type |
a character string specifying whether to use a ‘Bootstrap I’ or
‘Bootstrap II’ method (see Racine, Hart, and Li (2006) for
details). The ‘Bootstrap II’ method re-runs cross-validation for
each bootstrap replication and uses the new cross-validated
bandwidth for variable |
index |
a vector of indices for the columns of |
random.seed |
an integer used to seed R's random number generator. This is to ensure replicability. Defaults to 42. |
... |
additional arguments supplied to specify the bandwidth type, kernel types, selection methods, and so on, detailed below. |
npsigtest returns an object of type
sigtest. summary supports sigtest objects. It
has the
following components:
In |
the vector of statistics |
P |
the vector of P-values for each statistic in |
In.bootstrap |
contains a matrix of the bootstrap
replications of the vector |
If you are using data of mixed types, then it is advisable to use the
data.frame function to construct your input data and not
cbind, since cbind will typically not work as
intended on mixed data types and will coerce the data to the same
type.
Caution: bootstrap methods are, by their nature, computationally
intensive. This can be frustrating for users possessing large
datasets. For exploratory purposes, you may wish to override the
default number of bootstrap replications, say, setting them to
boot.num=99 A version of this package using the Rmpi wrapper is
under development that allows one to deploy this software in a
clustered computing environment to facilitate computation involving
large datasets.
Tristen Hayfield hayfield@mpia.de, Jeffrey S. Racine racinej@mcmaster.ca
Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Racine, J.S., J. Hart, and Q. Li (2006), “Testing the significance of categorical predictor variables in nonparametric regression models,” Econometric Reviews, 25, 523-544.
Racine, J.S. (1997), “Consistent significance testing for nonparametric regression,” Journal of Business and Economic Statistics 15, 369-379.
Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.
## Not run:
# EXAMPLE 1 (INTERFACE=FORMULA): For this example, we simulate 100 draws
# from a DGP in which z, the first column of X, is an irrelevant
# discrete variable
set.seed(12345)
n <- 100
z <- rbinom(n,1,.5)
x1 <- rnorm(n)
x2 <- runif(n,-2,2)
y <- x1 + x2 + rnorm(n)
# Next, we must compute bandwidths for our regression model. In this
# case we conduct local linear regression. Note - this may take a few
# minutes depending on the speed of your computer...
bw <- npregbw(formula=y~factor(z)+x1+x2,regtype="ll",bwmethod="cv.aic")
# We then compute a vector of tests corresponding to the columns of
# X. Note - this may take a few minutes depending on the speed of your
# computer... we have to generate the null distribution of the statistic
# for each variable whose significance is being tested using 399
# bootstrap replications for each...
npsigtest(bws=bw)
# If you wished, you could conduct the test for, say, variables 1 and 3
# only, as in
npsigtest(bws=bw,index=c(1,3))
# EXAMPLE 1 (INTERFACE=DATA FRAME): For this example, we simulate 100
# draws from a DGP in which z, the first column of X, is an irrelevant
# discrete variable
set.seed(12345)
n <- 100
z <- rbinom(n,1,.5)
x1 <- rnorm(n)
x2 <- runif(n,-2,2)
X <- data.frame(factor(z),x1,x2)
y <- x1 + x2 + rnorm(n)
# Next, we must compute bandwidths for our regression model. In this
# case we conduct local linear regression. Note - this may take a few
# minutes depending on the speed of your computer...
bw <- npregbw(xdat=X,ydat=y,regtype="ll",bwmethod="cv.aic")
# We then compute a vector of tests corresponding to the columns of
# X. Note - this may take a few minutes depending on the speed of your
# computer... we have to generate the null distribution of the statistic
# for each variable whose significance is being tested using 399
# bootstrap replications for each...
npsigtest(bws=bw)
# If you wished, you could conduct the test for, say, variables 1 and 3
# only, as in
npsigtest(bws=bw,index=c(1,3))
## End(Not run)