dtrace                package:dtrace                R Documentation

_D_e_n_s_i_t_y _T_r_a_c_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function computes kernel density trace objects. Related
     methods can be used to plot and manpulate this kind of object.

_U_s_a_g_e:

     dtrace(x,  bw, sd = "nrd0",
         kernel = "gaussian", ...)

_A_r_g_u_m_e_n_t_s:

       x: A specification or the data to be used to produce the density
          traces. This can be a numeric vector, a list of numeric
          vectors or a modelling formula which specifies grouping for a
           numeric vector.

      bw: The bandwidth of the kernel used to estimate the density.
          This the standard deviation of the kernel divided by
          sqrt(12). This measure has been choosen so that the bandwith
          of a rectangular kernel is the width of the kernel. This
          makes direct comparison with histograms possible. If several
          densities are to be computed with a single call to `dtrace',
          a vector of `bw' values be specified. These are recycled if
          necessary.

      sd: The standard deviation of the smoothing kernel. This can be
          used in place of `bw' to specify the amount of smoothing
          which is to occur.  This can either be a number, or a
          character string which specifies an automatic method for
          choosing the standard deviation (see `bw.nrd').  If several
          densities are to be computed with a single call to `dtrace',
          a vector of `sd' values be specified. These are recycled if
          necessary.

  kernel: The smoothing kernel to be used.  A list of possible kernels
          is given in `density'.  If several densities are to be
          computed with a single call to `dtrace', a vector of `kernel'
          values be specified. These are recycled if necessary.

     ...: Additional arguments which are passed to the `density'
          function.

_D_e_t_a_i_l_s:

     This function provides a way to estimate probability density
     functions for one or more samples of observations.  The estimation
     is carried out by using a smoothing kernel, which is convolved
     with the empirical distribution function of the data values to
     produce a smooth density estimate.  A variety of different
     smoothing kernels are available, with the default being
     `gaussian'.

     There are two ways in which to specify the degree of smoothing
     provided by the kernel.  The first is to use the argument `sd' to
     specify the kernel standard deviation.  The second is to use the
     argument `bw' to specify the bandwidth of the kernel. The
     definition of bandwidth is the width of the rectangular kernel
     which has the same standard deviation as the kernel. Specifying a
     rectangular kernel (`kern="r"') means that `bw' gives the
     cell-width for a moving cell histogram estimate.

_V_a_l_u_e:

     An object of class `dtrace'.  Summary, plot and subsetting methods
     exist for this type of object.

     It can be useful to know that the object is implemented as a list
     with components: 

 density: A list containing the individual density estimates

    xlim: The x range spanned by the ordinates of the estimates.

    ylim: The y range spanned by the coordinates of the estimates.

    nobs: The sample size.

  kernel: The kernel(s) used.

      bw: The kernel bandwidth(s).

    nobs: The kernel standard deviation(s).

_N_o_t_e:

     Note that I (Ross Ihaka) take full blame for the use of `bw' in
     this function to describe the amount of kernel smoothing. 
     Although this might be non-standard, I find it pedagogically
     useful, to motivate the shift from fixed-cell histograms to
     moving-cell histograms and thence to general kernel density
     extimates. I think it also provides a degree of intuition which
     helps in understanding the size of feature which is likely to be
     smoothed away by the kernel.

_A_u_t_h_o_r(_s):

     Ross Ihaka.  The underlying density estimation code was written by
     Ross Ihaka, Martin Maechler and Brian Ripley.

_R_e_f_e_r_e_n_c_e_s:

     Scott, D. W. (1992) Multivariate Density Estimation. Theory,
     Practice and Visualization. New York: Wiley.

     Sheather, S. J. and Jones M. C. (1991) A reliable data-based
     bandwidth selection method for kernel density estimation. J. Roy.
     Statist. Soc. B, 683-690.

     Silverman, B. W. (1986) Density Estimation. London: Chapman and
     Hall.

     Venables, W. N. and Ripley, B. D. (1999) Modern Applied Statistics
     with S-PLUS. New York: Springer.

_S_e_e _A_l_s_o:

     `density', `hist', `bw.nrd'. `plot.dtrace', `print.dtrace',
     `summary.dtrace', `print.summary.dtrace', `[.dtrace', `[[.dtrace'.

_E_x_a_m_p_l_e_s:

     ## The rectangular kernel with bw = 1.
     plot(dtrace(0, bw=1, kern="r"),
          main = "Rectangular Kernel (bw = 1)",
          xlab = "This is the output from ``dtrace''")

     ## Compare this with the density function.
     ## It's hard to see how bw = 1 here..
     plot(density(0, bw=1, kern="r"),
        main = "Rectangular Kernel (bw = 1)",
        xlab = "This is the output from ``density''")

     ## The Gaussian kernel with sd = 1.
     plot(dtrace(0, sd=1, kern="g"),
          main = "Gaussian Kernel (sd = 1)")

     ## A moving-cell histogram and an equivalent
     ## fixed-cell histogram.  Note that the fixed-cell
     ## variant is less stable than it appears.
     x <- rnorm(100)
     d <- dtrace(x, bw=.5, kern = "r", n=1024)
     hist(x, breaks = seq(floor(min(x)), ceiling(max(x)), by = .5),
          prob = TRUE, ylim = d$ylim, border = "gray40",
          main = "Fixed and Moving-Cell Histograms")
     plot(d, col = "red", add = TRUE)

     ## A comparison of rectangular and Gaussian kernels
     ## with the same bandwidth.
     d <- dtrace(list(Rectangular = x,
                            Gaussian = x),
                 bw = 1, kern = c("r", "g"))
     plot(d, col = c("green4", "red"), lty = 1,
          main = "``Equivalent Bandwidth'' Kernels",
          xlab = "Rectangular and Gaussian, (bw = 1, sd = 0.2887)")

     ## Simple density estimation.
     ## Here we are using automatic bandwidth selection.
     data(faithful)
     d <- dtrace(faithful$eruptions, sd = "sj")
     summary(d)
     plot(d, xlab = "Eruption Time in Minutes",
          main = "Old Faithful Data")

     ## A customized plot.  This is just to show how
     ## to access the underlying density structure.
     plot(d$density[[1]], type = "n",
          xlab = "Eruption Time in Minutes",
          ylab = "Density",
          main = "Old Faithful Data")
     polygon(d$density[[1]], col = "wheat")

     ## Estimation and plotting of three densities.
     ## A demonstration of the formula-based interface.
     data(iris)
     d <- dtrace(Petal.Width ~ Species, data = iris)
     plot(d, lty = 1, col = c("red","green4", "blue"),
          legend = c("Setosa", "Versicolor", "Virginica"),
          main = "The Distribution of Iris Petal Width",
          xlab = "Petal Width (cm)")

