dtrace {dtrace} | R Documentation |
This function computes kernel density trace objects. Related methods can be used to plot and manpulate this kind of object.
dtrace(x, bw, sd = "nrd0", kernel = "gaussian", ...)
x |
A specification or the data to be used to produce the density traces. This can be a numeric vector, a list of numeric vectors or a modelling formula which specifies grouping for a numeric vector. |
bw |
The bandwidth of the kernel used to estimate the
density. This the standard deviation of the kernel divided by
sqrt(12). This measure has been choosen so that the
bandwith of a rectangular kernel is the width of the kernel. This
makes direct comparison with histograms possible. If several
densities are to be computed with a single call to
dtrace , a vector of bw values be
specified. These are recycled if necessary. |
sd |
The standard deviation of the smoothing kernel. This can be
used in place of bw to specify the amount of smoothing which
is to occur. This can either be a number, or a character string
which specifies an automatic method for choosing the standard
deviation (see bw.nrd ). If several densities are
to be computed with a single call to dtrace , a vector
of sd values be specified. These are recycled if necessary. |
kernel |
The smoothing kernel to be used. A list of possible
kernels is given in density . If several densities are
to be computed with a single call to dtrace , a vector
of kernel values be specified. These are recycled if necessary. |
... |
Additional arguments which are passed to the
density function. |
This function provides a way to estimate probability density functions
for one or more samples of observations. The estimation is carried
out by using a smoothing kernel, which is convolved with the empirical
distribution function of the data values to produce a smooth density
estimate. A variety of different smoothing kernels are available,
with the default being gaussian
.
There are two ways in which to specify the degree of smoothing
provided by the kernel. The first is to use the argument sd
to
specify the kernel standard deviation. The second is to use the
argument bw
to specify the bandwidth of the kernel.
The definition of bandwidth is the width of the rectangular kernel
which has the same standard deviation as the kernel. Specifying a
rectangular kernel (kern="r"
) means that bw
gives the
cell-width for a moving cell histogram estimate.
An object of class dtrace
. Summary, plot and
subsetting methods exist for this type of object.
It can be useful to know that the object is implemented as a list with
components:
density |
A list containing the individual density estimates |
xlim |
The x range spanned by the ordinates of the estimates. |
ylim |
The y range spanned by the coordinates of the estimates. |
nobs |
The sample size. |
kernel |
The kernel(s) used. |
bw |
The kernel bandwidth(s). |
nobs |
The kernel standard deviation(s). |
Note that I (Ross Ihaka) take full blame for the use of bw
in
this function to describe the amount of kernel smoothing. Although
this might be non-standard, I find it pedagogically useful, to
motivate the shift from fixed-cell histograms to moving-cell
histograms and thence to general kernel density extimates.
I think it also provides a degree of intuition which helps in
understanding the size of feature which is likely to be smoothed
away by the kernel.
Ross Ihaka. The underlying density estimation code was written by Ross Ihaka, Martin Maechler and Brian Ripley.
Scott, D. W. (1992) Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.
Sheather, S. J. and Jones M. C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. B, 683690.
Silverman, B. W. (1986) Density Estimation. London: Chapman and Hall.
Venables, W. N. and Ripley, B. D. (1999) Modern Applied Statistics with S-PLUS. New York: Springer.
density
,
hist
,
bw.nrd
.
plot.dtrace
,
print.dtrace
,
summary.dtrace
,
print.summary.dtrace
,
[.dtrace
,
[[.dtrace
.
## The rectangular kernel with bw = 1. plot(dtrace(0, bw=1, kern="r"), main = "Rectangular Kernel (bw = 1)", xlab = "This is the output from ``dtrace''") ## Compare this with the density function. ## It's hard to see how bw = 1 here.. plot(density(0, bw=1, kern="r"), main = "Rectangular Kernel (bw = 1)", xlab = "This is the output from ``density''") ## The Gaussian kernel with sd = 1. plot(dtrace(0, sd=1, kern="g"), main = "Gaussian Kernel (sd = 1)") ## A moving-cell histogram and an equivalent ## fixed-cell histogram. Note that the fixed-cell ## variant is less stable than it appears. x <- rnorm(100) d <- dtrace(x, bw=.5, kern = "r", n=1024) hist(x, breaks = seq(floor(min(x)), ceiling(max(x)), by = .5), prob = TRUE, ylim = d$ylim, border = "gray40", main = "Fixed and Moving-Cell Histograms") plot(d, col = "red", add = TRUE) ## A comparison of rectangular and Gaussian kernels ## with the same bandwidth. d <- dtrace(list(Rectangular = x, Gaussian = x), bw = 1, kern = c("r", "g")) plot(d, col = c("green4", "red"), lty = 1, main = "``Equivalent Bandwidth'' Kernels", xlab = "Rectangular and Gaussian, (bw = 1, sd = 0.2887)") ## Simple density estimation. ## Here we are using automatic bandwidth selection. data(faithful) d <- dtrace(faithful$eruptions, sd = "sj") summary(d) plot(d, xlab = "Eruption Time in Minutes", main = "Old Faithful Data") ## A customized plot. This is just to show how ## to access the underlying density structure. plot(d$density[[1]], type = "n", xlab = "Eruption Time in Minutes", ylab = "Density", main = "Old Faithful Data") polygon(d$density[[1]], col = "wheat") ## Estimation and plotting of three densities. ## A demonstration of the formula-based interface. data(iris) d <- dtrace(Petal.Width ~ Species, data = iris) plot(d, lty = 1, col = c("red","green4", "blue"), legend = c("Setosa", "Versicolor", "Virginica"), main = "The Distribution of Iris Petal Width", xlab = "Petal Width (cm)")