dtrace                package:dtrace                R Documentation

_D_e_n_s_i_t_y _T_r_a_c_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function computes kernel density trace objects. Related
     methods can be used to plot and manpulate this kind of object.

_U_s_a_g_e:

     dtrace(x,  bw, sd = "nrd0",
         kernel = "gaussian", ...)

_A_r_g_u_m_e_n_t_s:

       x: A specification or the data to be used to produce the density
          traces. This can be a numeric vector, a list of numeric
          vectors or a modelling formula which specifies grouping for a
           numeric vector.

      bw: The bandwidth of the kernel used to estimate the density.
          This the standard deviation of the kernel divided by
          sqrt(12). This measure has been choosen so that the bandwith
          of a rectangular kernel is the width of the kernel. This
          makes direct comparison with histograms possible. If several
          densities are to be computed with a single call to 'dtrace',
          a vector of 'bw' values be specified. These are recycled if
          necessary.

      sd: The standard deviation of the smoothing kernel. This can be
          used in place of 'bw' to specify the amount of smoothing
          which is to occur.  This can either be a number, or a
          character string which specifies an automatic method for
          choosing the standard deviation (see 'bw.nrd').  If several
          densities are to be computed with a single call to 'dtrace',
          a vector of 'sd' values be specified. These are recycled if
          necessary.

  kernel: The smoothing kernel to be used.  A list of possible kernels
          is given in 'density'.  If several densities are to be
          computed with a single call to 'dtrace', a vector of 'kernel'
          values be specified. These are recycled if necessary.

     ...: Additional arguments which are passed to the 'density'
          function.

_D_e_t_a_i_l_s:

     This function provides a way to estimate probability density
     functions for one or more samples of observations.  The estimation
     is carried out by using a smoothing kernel, which is convolved
     with the empirical distribution function of the data values to
     produce a smooth density estimate.  A variety of different
     smoothing kernels are available, with the default being
     'gaussian'.

     There are two ways in which to specify the degree of smoothing
     provided by the kernel.  The first is to use the argument 'sd' to
     specify the kernel standard deviation.  The second is to use the
     argument 'bw' to specify the _bandwidth_ of the kernel. The
     definition of bandwidth is the width of the rectangular kernel
     which has the same standard deviation as the kernel. Specifying a
     rectangular kernel ('kern="r"') means that 'bw' gives the
     cell-width for a moving cell histogram estimate.

_V_a_l_u_e:

     An object of class 'dtrace'.  Summary, plot and subsetting methods
     exist for this type of object.

     It can be useful to know that the object is implemented as a list
     with components: 

 density: A list containing the individual density estimates

    xlim: The x range spanned by the ordinates of the estimates.

    ylim: The y range spanned by the coordinates of the estimates.

    nobs: The sample size.

  kernel: The kernel(s) used.

      bw: The kernel bandwidth(s).

    nobs: The kernel standard deviation(s).

_N_o_t_e:

     Note that I (Ross Ihaka) take full blame for the use of 'bw' in
     this function to describe the amount of kernel smoothing. 
     Although this might be non-standard, I find it pedagogically
     useful, to motivate the shift from fixed-cell histograms to
     moving-cell histograms and thence to general kernel density
     estimates. I think it also provides a degree of intuition which
     helps in understanding the size of feature which is likely to be
     smoothed away by the kernel.  Most of this is standard in the
     frequency-domain analysis in time series.

_A_u_t_h_o_r(_s):

     Ross Ihaka.  The underlying density estimation code was written by
     Ross Ihaka, Martin Maechler and Brian Ripley.

_R_e_f_e_r_e_n_c_e_s:

     Scott, D. W. (1992) _Multivariate Density Estimation. Theory,
     Practice and Visualization_. New York: Wiley.

     Sheather, S. J. and Jones M. C. (1991) A reliable data-based
     bandwidth selection method for kernel density estimation. _J. Roy.
     Statist. Soc._ *B*, 683-690.

     Silverman, B. W. (1986) _Density Estimation_. London: Chapman and
     Hall.

     Venables, W. N. and Ripley, B. D. (1999) _Modern Applied
     Statistics with S-PLUS_. New York: Springer.

_S_e_e _A_l_s_o:

     'density', 'hist', 'bw.nrd'. 'plot.dtrace', 'print.dtrace',
     'summary.dtrace', 'print.summary.dtrace', '[.dtrace', '[[.dtrace'.

_E_x_a_m_p_l_e_s:

     ## The rectangular kernel with bw = 1.
     plot(dtrace(0, bw=1, kern="r"),
          main = "Rectangular Kernel (bw = 1)",
          xlab = "This is the output from ``dtrace''")

     ## Compare this with the density function.
     ## It's hard to see how bw = 1 here..
     plot(density(0, bw=1, kern="r"),
        main = "Rectangular Kernel (bw = 1)",
        xlab = "This is the output from ``density''")

     ## The Gaussian kernel with sd = 1.
     plot(dtrace(0, sd=1, kern="g"),
          main = "Gaussian Kernel (sd = 1)")

     ## A moving-cell histogram and an equivalent
     ## fixed-cell histogram.  Note that the fixed-cell
     ## variant is less stable than it appears.
     x <- rnorm(100)
     d <- dtrace(x, bw=.5, kern = "r", n=1024)
     hist(x, breaks = seq(floor(min(x)), ceiling(max(x)), by = .5),
          prob = TRUE, ylim = d$ylim, border = "gray40",
          main = "Fixed and Moving-Cell Histograms")
     plot(d, col = "red", add = TRUE)

     ## A comparison of rectangular and Gaussian kernels
     ## with the same bandwidth.
     d <- dtrace(list(Rectangular = x,
                            Gaussian = x),
                 bw = 1, kern = c("r", "g"))
     plot(d, col = c("green4", "red"), lty = 1,
          main = "``Equivalent Bandwidth'' Kernels",
          xlab = "Rectangular and Gaussian, (bw = 1, sd = 0.2887)")

     ## Simple density estimation.
     ## Here we are using automatic bandwidth selection.
     data(faithful)
     d <- dtrace(faithful$eruptions, sd = "sj")
     summary(d)
     plot(d, xlab = "Eruption Time in Minutes",
          main = "Old Faithful Data")

     ## A customized plot.  This is just to show how
     ## to access the underlying density structure.
     plot(d$density[[1]], type = "n",
          xlab = "Eruption Time in Minutes",
          ylab = "Density",
          main = "Old Faithful Data")
     polygon(d$density[[1]], col = "wheat")

     ## Estimation and plotting of three densities.
     ## A demonstration of the formula-based interface.
     data(iris)
     d <- dtrace(Petal.Width ~ Species, data = iris)
     plot(d, lty = 1, col = c("red","green4", "blue"),
          legend = c("Setosa", "Versicolor", "Virginica"),
          main = "The Distribution of Iris Petal Width",
          xlab = "Petal Width (cm)")

