A Technical Guide to Smart Prediction 
                   =======================================

This document gives technical details about the smartpred library.  It
should be read by users needing to write a "smart" function.  However,
serious users of lm() and glm() in S and/or R should read it too.
Unlike the Dummies Guide to Smart Prediction (DummiesGuide.txt, which
should be read first), it is in the format of a number of notes.

--------------------------------------------------------------------
1.  Smart prediction works for terms such as

bs(x), ns(x), poly(x), bs(x):bs(x2), bs(scale(x)),

and it also handles multivariate responses (class "mlm").  More
extensive testing is needed though.  It is no longer a requirement that
bs() and ns() be called with both knots and Boundary.knots arguments
supplied to obtain correct prediction.

Smart prediction removes the need for predict.gam() with lm() and glm()
objects in Splus.

R 1.6.0 gives prediction for terms such as bs(x), ns(x), poly(x),
but fails for terms such as bs(scale(x)), I(scale(x)). 


--------------------------------------------------------------------
2.  Example. Suppose one wishes to predict on the model 

lm(y ~ I((x-min(x))^2)).

Then, with smart prediction, it would be necessary to write a function
such as

"my1" <- function(x, minx=min(x)) {

    x <- x  # Evaluate x

    if(smart.mode.is("read")) {
        smart  <- get.smart()
        minx <- smart$minx       # Overwrite its value
    } else
    if(smart.mode.is("write"))
        put.smart(list(minx=minx))

    (x-minx)^2
}
attr(my1, "smart") <- TRUE


The functions put.smart() and get.smart() are opposites--the former
writes a list to a specified location, and the latter retrieves it.
The function smart.mode.is() returns TRUE or FALSE depending on its
first argument.

The function my1() can be tested using the following code.

n <- 20
set.seed(99)
x <- sort(runif(n))
y <- 1 + 2*x^2 + rnorm(n, sd=0.1)
fit <- lm(y ~ my1(x))
newdat <- data.frame(x=seq(from=-0.1, to=1.1, len=n))
p <- predict(fit, newdat)

It works, as shown by

plot(x, y)
lines(x, fitted(fit))
points(newdat$x, p, type="b", col=2, err=-1)

More details about writing smart functions are below.


--------------------------------------------------------------------
3. Another example.
The function "scale()" was modifed so that models such as 

lm(y ~ I( (x-mean(x))/sqrt(var(x)) ))

could be predicted from. For illustration, we have written a similar
function to scale() called "stdze1()" which only standardizes a numerical
vector (scale() handles matrices). It does no error checking, and differs
from my1() in that the data-dependent parameters are not evaluated at
the argument list of the function but inside the body of the function.
Here is what it looks like:

"stdze1" <- function(x, center=TRUE, scale=TRUE) {

    x <- x  # Evaluate x

    if(smart.mode.is("read")) {
        smart  <- get.smart()
        return((x-smart$center)/smart$scale) 
    }

    if(is.logical(center))
        center <- if(center) mean(x) else 0
    if(is.logical(scale))
        scale <- if(scale) sqrt(var(x)) else 1

    if(smart.mode.is("write"))
        put.smart(list(center=center,
                       scale=scale))
    # Normal use
    (x-center)/scale
}
attr(stdze1, "smart") <- TRUE


It can be tested by 

n <- 20
set.seed(86)
x <- sort(runif(n))
y <- sort(runif(n))
fit <- lm(y ~ stdze1(x))
newx <- x[1:5]             # A subset of the original data

abs(max(predict(fit)[1:5] - predict(fit, data.frame(x=newx)))) # 0

Note that 

fit2 <- lm(y ~ stdze1(stdze1(x)))
abs(max(predict(fit2)[1:5] - predict(fit2, data.frame(x=newx)))) # 0

is obtained, as expected, because a standardized variable that is
standardized is unchanged.


--------------------------------------------------------------------
4.
Here are some technical details if you need to write a non-trivial
smart function.  A smart function operates in three modes: "neutral",
"write" and "read".

In "neutral" mode (assumed so unless "write" or "read"), it operates
like an ordinary function and simply returns the result, e.g.,
x-min(x).

In "write" mode (at fitting time), it writes out the data-dependent
parameters that need saving into a special data structure called
".smart.prediction" in frame 1 (S-PLUS) or the workspace (R), and then
returns the result as well.  When the fitting function is finishing,
.smart.prediction is attached to the object. It appears as
object$smart.prediction.

In "read" mode (at prediction time), the object$smart.prediction is
copied into frame 1 or the workspace (and called .smart.prediction) by
the prediction function. It is now available for reading by the smart
function, which is invoked the second time when the model matrix and/or
data frame is computed.  If the original parameters are needed, the
smart function will access them from .smart.prediction.

In "read" mode, smart functions can be programmed in two ways.  The
first way is if the smart function is written so that it is not
necessary for the smart function to call itself. The function "stdze1"
is an example.  The second way is by recursion, i.e., once the original
parameters have been reinstated, a "do.call" is done. The cleanest way
is if the information originally written is in a list with components
whose names match the function's arguments. See "scale", "stdze2", "bs"
and "ns" as examples.  If there are only one or two parameters, or if
the expression is simple, then the first option is usually the best.

A line such as "x <- x" (where x is the primary argument of the smart
function) is needed at the very beginning of the function to cause lazy
evaluation to work immediately.  This is needed because of terms such
as bs(stdze1(x)), where a smart function calls another smart function.
The inner "stdze1" needs to be evaluated using its smart parameters
before the outer "bs" is evaluated.  The statement "x <- x" ensures
that. Writing smart function has potential pitfalls for inexperienced
programmers. We recommend making a copy of "my1" and editing that for
your needs.

Here is an equivalent function of stdze1, called stdze2:

"stdze2" <- function(x, center=TRUE, scale=TRUE) {

    x <- x  # Evaluate x

    if(smart.mode.is("read")) {
        return(eval(smart.expression))
    }

    if(is.logical(center))
        center <- if(center) mean(x) else 0
    if(is.logical(scale))
        scale <- if(scale) sqrt(var(x)) else 1

    if(smart.mode.is("write"))
        put.smart(list(center=center,
                       scale=scale,
                       match.call=match.call()))

    (x-center)/scale
}
attr(stdze2, "smart") <- TRUE


It is runs in the second way by recursion.  The expression
smart.expression invokes the recursion. In order for smart.expression
to work it is crucial that:
(a) the list in put.smart() to have exactly the same names as the
    arguments of the smart function (here it is "center" and "scale"), and
(b) the primary argument of the smart function is called "x", and 
(c) one of the list components in the put.smart is "match.call", which
    has value match.call(). This saves the name of the smart function,
    and during prediction, is used by the do.call() to call the same
    function. [This is now needed because R1.7.1's match.call() gives
    an error message.] Note that this list component is needed only if
    smart.expression is used. 


--------------------------------------------------------------------
5. In the S-PLUS version, smartpred writes three data structures to
frame 1.  These are ".smart.prediction", ".smart.prediction.counter",
and ".smart.prediction.mode".  They are deleted after both fitting and
prediction is complete. The user should be oblivious to their
existence.  They are written to frame 1 because these data structures
will be deleted even if lm() or glm() or predict() etc. give an error.
In the R version, smartpred writes these data structures to the
workspace. This is a bit messy, and potentially dangerous, but it seems
to be the only practical solution.

The variable .smart.prediction.mode equals "read", "write" or
"neutral".  It is in "write" mode when the smart functions write their
arguments out (i.e., when model is originally fitted), and in "read"
mode while predicting, and in "neutral" otherwise, e.g., at the command
line, or while predicting but needing to be momentarily out of "read"
and "write" mode so that it acts like an ordinary function.

On set up, .smart.prediction is a list with 30 empty components and the
variable .smart.prediction.counter is assigned 0. When a smart function
writes out its data-dependent parameters, .smart.prediction.counter is
incremented, and the argument of put.smart() is written to
.smart.prediction[[.smart.prediction.counter]].  If more than 30
components are used up then .smart.prediction is lengthened.  When
.smart.prediction is attached to the object any unused components are
trimmed off.

Similarly, on prediction, .smart.prediction (object$smart.prediction)
is placed back in frame 1 or the workspace, and
.smart.prediction.counter is assigned 0.  When in read mode,
.smart.prediction.counter is incremented, and
.smart.prediction[[.smart.prediction.counter]] is returned by
get.smart().

In computer science jargon, the data structure .smart.prediction mimics
a queue, which is also known as a First-In First-Out stack.


--------------------------------------------------------------------
6. The function is.smart() is also provided. 
Its usage is, e.g., is.smart(bs), which returns a TRUE or FALSE. In order
for this to work on smart functions, one had to assign the value TRUE
to the attribute "smart", e.g., attr(mysmartfunction, "smart") <- TRUE.
To check that smart prediction works, it is sufficient to check
that the modelling function, prediction function and the data-dependent
functions are all smart. 
For example, is.smart(glm), is.smart(predict.glm), is.smart(poly). 
If all three are true then smart prediction should work with
glm(y ~ poly(x, 3), binomial), say. 

Be careful, however, because scale() actually calls scale.default()
because scale() in R is a generic function. Then
scale.default() should be smart but scale cannot be assigned to be
smart in general.  For functions poly() and poly.raw() in Splus6,
the former calls the latter, so both are assigned to be smart.


--------------------------------------------------------------------
7.  The functions setup.smart(), get.smart.prediction() and
wrapup.smart() can be used by writers of modeling functions to
implement smart prediction. You need to examine lm() and predict.lm()
in detail to know what snippets of code to insert and where. Typically,
only a line or two are needed.  The smart functions provided (bs, ns,
poly, scale) should remain unchanged.

Predict methods that inherit from lm, such as predict.glm(), need no
modification because the processing for smart prediction is all done in
predict.lm().


--------------------------------------------------------------------
8.  We have attempted to write a smart function called "ordcut"
which mimics the effect of

ordered(cut(x, breaks, labels, include.lowest = F)). 

Such a term is relevant because it breaks up a continuous variable into
an ordered factor, something particularly useful in practice.  For the
S-PLUS version however, one has to be careful about the endpoints at
prediction time, because if they lie outside the smallest or largest
breakpoint then NAs will occur, resulting in an error. To stop this
happening, use one could add an extend=T argument in ordcut() so that
at the time of fitting, and at the time of prediction, the smallest and
largest breakpoints will be adjusted to accommodate the new data, e.g.,
extended to min(x)-1 and max(x)+1.  As an exercise, you might want to
attempt to write ordcut() yourself, and produce both an S-PLUS and R
version. We have not included our ordcut() in the smartpred library.


--------------------------------------------------------------------
9. Acknowledgements:
The smartpred library implements an algorithm devised by John Chambers
and Trevor Hastie at the time lm() and glm() etc. were written. We also
thank Drs W.N. Venables and B.D. Ripley for helpful comments.


--------------------------------------------------------------------
10. Authors: Thomas Yee (yee@stat.auckland.ac.nz) and Trevor Hastie
(hastie@stat.Stanford.EDU).  Please send any suggestions and bug
reports to Thomas.  The smart prediction library is available at
http://www.stat.auckland.ac.nz/~yee

This document last modified: 9 October 2002