A Technical Guide to Smart Prediction ======================================= This document gives technical details about the smartpred library. It should be read by users needing to write a "smart" function. However, serious users of lm() and glm() in S and/or R should read it too. Unlike the Dummies Guide to Smart Prediction (DummiesGuide.txt, which should be read first), it is in the format of a number of notes. -------------------------------------------------------------------- 1. Smart prediction works for terms such as bs(x), ns(x), poly(x), bs(x):bs(x2), bs(scale(x)), and it also handles multivariate responses (class "mlm"). More extensive testing is needed though. It is no longer a requirement that bs() and ns() be called with both knots and Boundary.knots arguments supplied to obtain correct prediction. Smart prediction removes the need for predict.gam() with lm() and glm() objects in Splus. R 1.6.0 gives prediction for terms such as bs(x), ns(x), poly(x), but fails for terms such as bs(scale(x)), I(scale(x)). -------------------------------------------------------------------- 2. Example. Suppose one wishes to predict on the model lm(y ~ I((x-min(x))^2)). Then, with smart prediction, it would be necessary to write a function such as "my1" <- function(x, minx=min(x)) { x <- x # Evaluate x if(smart.mode.is("read")) { smart <- get.smart() minx <- smart$minx # Overwrite its value } else if(smart.mode.is("write")) put.smart(list(minx=minx)) (x-minx)^2 } attr(my1, "smart") <- TRUE The functions put.smart() and get.smart() are opposites--the former writes a list to a specified location, and the latter retrieves it. The function smart.mode.is() returns TRUE or FALSE depending on its first argument. The function my1() can be tested using the following code. n <- 20 set.seed(99) x <- sort(runif(n)) y <- 1 + 2*x^2 + rnorm(n, sd=0.1) fit <- lm(y ~ my1(x)) newdat <- data.frame(x=seq(from=-0.1, to=1.1, len=n)) p <- predict(fit, newdat) It works, as shown by plot(x, y) lines(x, fitted(fit)) points(newdat$x, p, type="b", col=2, err=-1) More details about writing smart functions are below. -------------------------------------------------------------------- 3. Another example. The function "scale()" was modifed so that models such as lm(y ~ I( (x-mean(x))/sqrt(var(x)) )) could be predicted from. For illustration, we have written a similar function to scale() called "stdze1()" which only standardizes a numerical vector (scale() handles matrices). It does no error checking, and differs from my1() in that the data-dependent parameters are not evaluated at the argument list of the function but inside the body of the function. Here is what it looks like: "stdze1" <- function(x, center=TRUE, scale=TRUE) { x <- x # Evaluate x if(smart.mode.is("read")) { smart <- get.smart() return((x-smart$center)/smart$scale) } if(is.logical(center)) center <- if(center) mean(x) else 0 if(is.logical(scale)) scale <- if(scale) sqrt(var(x)) else 1 if(smart.mode.is("write")) put.smart(list(center=center, scale=scale)) # Normal use (x-center)/scale } attr(stdze1, "smart") <- TRUE It can be tested by n <- 20 set.seed(86) x <- sort(runif(n)) y <- sort(runif(n)) fit <- lm(y ~ stdze1(x)) newx <- x[1:5] # A subset of the original data abs(max(predict(fit)[1:5] - predict(fit, data.frame(x=newx)))) # 0 Note that fit2 <- lm(y ~ stdze1(stdze1(x))) abs(max(predict(fit2)[1:5] - predict(fit2, data.frame(x=newx)))) # 0 is obtained, as expected, because a standardized variable that is standardized is unchanged. -------------------------------------------------------------------- 4. Here are some technical details if you need to write a non-trivial smart function. A smart function operates in three modes: "neutral", "write" and "read". In "neutral" mode (assumed so unless "write" or "read"), it operates like an ordinary function and simply returns the result, e.g., x-min(x). In "write" mode (at fitting time), it writes out the data-dependent parameters that need saving into a special data structure called ".smart.prediction" in frame 1 (S-PLUS) or the workspace (R), and then returns the result as well. When the fitting function is finishing, .smart.prediction is attached to the object. It appears as object$smart.prediction. In "read" mode (at prediction time), the object$smart.prediction is copied into frame 1 or the workspace (and called .smart.prediction) by the prediction function. It is now available for reading by the smart function, which is invoked the second time when the model matrix and/or data frame is computed. If the original parameters are needed, the smart function will access them from .smart.prediction. In "read" mode, smart functions can be programmed in two ways. The first way is if the smart function is written so that it is not necessary for the smart function to call itself. The function "stdze1" is an example. The second way is by recursion, i.e., once the original parameters have been reinstated, a "do.call" is done. The cleanest way is if the information originally written is in a list with components whose names match the function's arguments. See "scale", "stdze2", "bs" and "ns" as examples. If there are only one or two parameters, or if the expression is simple, then the first option is usually the best. A line such as "x <- x" (where x is the primary argument of the smart function) is needed at the very beginning of the function to cause lazy evaluation to work immediately. This is needed because of terms such as bs(stdze1(x)), where a smart function calls another smart function. The inner "stdze1" needs to be evaluated using its smart parameters before the outer "bs" is evaluated. The statement "x <- x" ensures that. Writing smart function has potential pitfalls for inexperienced programmers. We recommend making a copy of "my1" and editing that for your needs. Here is an equivalent function of stdze1, called stdze2: "stdze2" <- function(x, center=TRUE, scale=TRUE) { x <- x # Evaluate x if(smart.mode.is("read")) { return(eval(smart.expression)) } if(is.logical(center)) center <- if(center) mean(x) else 0 if(is.logical(scale)) scale <- if(scale) sqrt(var(x)) else 1 if(smart.mode.is("write")) put.smart(list(center=center, scale=scale, match.call=match.call())) (x-center)/scale } attr(stdze2, "smart") <- TRUE It is runs in the second way by recursion. The expression smart.expression invokes the recursion. In order for smart.expression to work it is crucial that: (a) the list in put.smart() to have exactly the same names as the arguments of the smart function (here it is "center" and "scale"), and (b) the primary argument of the smart function is called "x", and (c) one of the list components in the put.smart is "match.call", which has value match.call(). This saves the name of the smart function, and during prediction, is used by the do.call() to call the same function. [This is now needed because R1.7.1's match.call() gives an error message.] Note that this list component is needed only if smart.expression is used. -------------------------------------------------------------------- 5. In the S-PLUS version, smartpred writes three data structures to frame 1. These are ".smart.prediction", ".smart.prediction.counter", and ".smart.prediction.mode". They are deleted after both fitting and prediction is complete. The user should be oblivious to their existence. They are written to frame 1 because these data structures will be deleted even if lm() or glm() or predict() etc. give an error. In the R version, smartpred writes these data structures to the workspace. This is a bit messy, and potentially dangerous, but it seems to be the only practical solution. The variable .smart.prediction.mode equals "read", "write" or "neutral". It is in "write" mode when the smart functions write their arguments out (i.e., when model is originally fitted), and in "read" mode while predicting, and in "neutral" otherwise, e.g., at the command line, or while predicting but needing to be momentarily out of "read" and "write" mode so that it acts like an ordinary function. On set up, .smart.prediction is a list with 30 empty components and the variable .smart.prediction.counter is assigned 0. When a smart function writes out its data-dependent parameters, .smart.prediction.counter is incremented, and the argument of put.smart() is written to .smart.prediction[[.smart.prediction.counter]]. If more than 30 components are used up then .smart.prediction is lengthened. When .smart.prediction is attached to the object any unused components are trimmed off. Similarly, on prediction, .smart.prediction (object$smart.prediction) is placed back in frame 1 or the workspace, and .smart.prediction.counter is assigned 0. When in read mode, .smart.prediction.counter is incremented, and .smart.prediction[[.smart.prediction.counter]] is returned by get.smart(). In computer science jargon, the data structure .smart.prediction mimics a queue, which is also known as a First-In First-Out stack. -------------------------------------------------------------------- 6. The function is.smart() is also provided. Its usage is, e.g., is.smart(bs), which returns a TRUE or FALSE. In order for this to work on smart functions, one had to assign the value TRUE to the attribute "smart", e.g., attr(mysmartfunction, "smart") <- TRUE. To check that smart prediction works, it is sufficient to check that the modelling function, prediction function and the data-dependent functions are all smart. For example, is.smart(glm), is.smart(predict.glm), is.smart(poly). If all three are true then smart prediction should work with glm(y ~ poly(x, 3), binomial), say. Be careful, however, because scale() actually calls scale.default() because scale() in R is a generic function. Then scale.default() should be smart but scale cannot be assigned to be smart in general. For functions poly() and poly.raw() in Splus6, the former calls the latter, so both are assigned to be smart. -------------------------------------------------------------------- 7. The functions setup.smart(), get.smart.prediction() and wrapup.smart() can be used by writers of modeling functions to implement smart prediction. You need to examine lm() and predict.lm() in detail to know what snippets of code to insert and where. Typically, only a line or two are needed. The smart functions provided (bs, ns, poly, scale) should remain unchanged. Predict methods that inherit from lm, such as predict.glm(), need no modification because the processing for smart prediction is all done in predict.lm(). -------------------------------------------------------------------- 8. We have attempted to write a smart function called "ordcut" which mimics the effect of ordered(cut(x, breaks, labels, include.lowest = F)). Such a term is relevant because it breaks up a continuous variable into an ordered factor, something particularly useful in practice. For the S-PLUS version however, one has to be careful about the endpoints at prediction time, because if they lie outside the smallest or largest breakpoint then NAs will occur, resulting in an error. To stop this happening, use one could add an extend=T argument in ordcut() so that at the time of fitting, and at the time of prediction, the smallest and largest breakpoints will be adjusted to accommodate the new data, e.g., extended to min(x)-1 and max(x)+1. As an exercise, you might want to attempt to write ordcut() yourself, and produce both an S-PLUS and R version. We have not included our ordcut() in the smartpred library. -------------------------------------------------------------------- 9. Acknowledgements: The smartpred library implements an algorithm devised by John Chambers and Trevor Hastie at the time lm() and glm() etc. were written. We also thank Drs W.N. Venables and B.D. Ripley for helpful comments. -------------------------------------------------------------------- 10. Authors: Thomas Yee (yee@stat.auckland.ac.nz) and Trevor Hastie (hastie@stat.Stanford.EDU). Please send any suggestions and bug reports to Thomas. The smart prediction library is available at http://www.stat.auckland.ac.nz/~yee This document last modified: 9 October 2002