dtrace {dtrace}R Documentation

Density Traces

Description

This function computes kernel density trace objects. Related methods can be used to plot and manpulate this kind of object.

Usage

dtrace(x,  bw, sd = "nrd0",
    kernel = "gaussian", ...)

Arguments

x A specification or the data to be used to produce the density traces. This can be a numeric vector, a list of numeric vectors or a modelling formula which specifies grouping for a numeric vector.
bw The bandwidth of the kernel used to estimate the density. This the standard deviation of the kernel divided by sqrt(12). This measure has been choosen so that the bandwith of a rectangular kernel is the width of the kernel. This makes direct comparison with histograms possible. If several densities are to be computed with a single call to dtrace, a vector of bw values be specified. These are recycled if necessary.
sd The standard deviation of the smoothing kernel. This can be used in place of bw to specify the amount of smoothing which is to occur. This can either be a number, or a character string which specifies an automatic method for choosing the standard deviation (see bw.nrd). If several densities are to be computed with a single call to dtrace, a vector of sd values be specified. These are recycled if necessary.
kernel The smoothing kernel to be used. A list of possible kernels is given in density. If several densities are to be computed with a single call to dtrace, a vector of kernel values be specified. These are recycled if necessary.
... Additional arguments which are passed to the density function.

Details

This function provides a way to estimate probability density functions for one or more samples of observations. The estimation is carried out by using a smoothing kernel, which is convolved with the empirical distribution function of the data values to produce a smooth density estimate. A variety of different smoothing kernels are available, with the default being gaussian.

There are two ways in which to specify the degree of smoothing provided by the kernel. The first is to use the argument sd to specify the kernel standard deviation. The second is to use the argument bw to specify the bandwidth of the kernel. The definition of bandwidth is the width of the rectangular kernel which has the same standard deviation as the kernel. Specifying a rectangular kernel (kern="r") means that bw gives the cell-width for a moving cell histogram estimate.

Value

An object of class dtrace. Summary, plot and subsetting methods exist for this type of object.
It can be useful to know that the object is implemented as a list with components:

density A list containing the individual density estimates
xlim The x range spanned by the ordinates of the estimates.
ylim The y range spanned by the coordinates of the estimates.
nobs The sample size.
kernel The kernel(s) used.
bw The kernel bandwidth(s).
nobs The kernel standard deviation(s).

Note

Note that I (Ross Ihaka) take full blame for the use of bw in this function to describe the amount of kernel smoothing. Although this might be non-standard, I find it pedagogically useful, to motivate the shift from fixed-cell histograms to moving-cell histograms and thence to general kernel density estimates. I think it also provides a degree of intuition which helps in understanding the size of feature which is likely to be smoothed away by the kernel. Most of this is standard in the frequency-domain analysis in time series.

Author(s)

Ross Ihaka. The underlying density estimation code was written by Ross Ihaka, Martin Maechler and Brian Ripley.

References

Scott, D. W. (1992) Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.

Sheather, S. J. and Jones M. C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. B, 683–690.

Silverman, B. W. (1986) Density Estimation. London: Chapman and Hall.

Venables, W. N. and Ripley, B. D. (1999) Modern Applied Statistics with S-PLUS. New York: Springer.

See Also

density, hist, bw.nrd. plot.dtrace, print.dtrace, summary.dtrace, print.summary.dtrace, [.dtrace, [[.dtrace.

Examples

## The rectangular kernel with bw = 1.
plot(dtrace(0, bw=1, kern="r"),
     main = "Rectangular Kernel (bw = 1)",
     xlab = "This is the output from ``dtrace''")

## Compare this with the density function.
## It's hard to see how bw = 1 here..
plot(density(0, bw=1, kern="r"),
   main = "Rectangular Kernel (bw = 1)",
   xlab = "This is the output from ``density''")

## The Gaussian kernel with sd = 1.
plot(dtrace(0, sd=1, kern="g"),
     main = "Gaussian Kernel (sd = 1)")

## A moving-cell histogram and an equivalent
## fixed-cell histogram.  Note that the fixed-cell
## variant is less stable than it appears.
x <- rnorm(100)
d <- dtrace(x, bw=.5, kern = "r", n=1024)
hist(x, breaks = seq(floor(min(x)), ceiling(max(x)), by = .5),
     prob = TRUE, ylim = d$ylim, border = "gray40",
     main = "Fixed and Moving-Cell Histograms")
plot(d, col = "red", add = TRUE)

## A comparison of rectangular and Gaussian kernels
## with the same bandwidth.
d <- dtrace(list(Rectangular = x,
                       Gaussian = x),
            bw = 1, kern = c("r", "g"))
plot(d, col = c("green4", "red"), lty = 1,
     main = "``Equivalent Bandwidth'' Kernels",
     xlab = "Rectangular and Gaussian, (bw = 1, sd = 0.2887)")

## Simple density estimation.
## Here we are using automatic bandwidth selection.
data(faithful)
d <- dtrace(faithful$eruptions, sd = "sj")
summary(d)
plot(d, xlab = "Eruption Time in Minutes",
     main = "Old Faithful Data")

## A customized plot.  This is just to show how
## to access the underlying density structure.
plot(d$density[[1]], type = "n",
     xlab = "Eruption Time in Minutes",
     ylab = "Density",
     main = "Old Faithful Data")
polygon(d$density[[1]], col = "wheat")

## Estimation and plotting of three densities.
## A demonstration of the formula-based interface.
data(iris)
d <- dtrace(Petal.Width ~ Species, data = iris)
plot(d, lty = 1, col = c("red","green4", "blue"),
     legend = c("Setosa", "Versicolor", "Virginica"),
     main = "The Distribution of Iris Petal Width",
     xlab = "Petal Width (cm)")

[Package dtrace version 0.1 Index]