%
% NOTE -- ONLY EDIT S4Objects.Rnw!!!
% S4Objects.tex file will get overwritten.
%
%\VignetteIndexEntry{S4 Classes and Methods}
%\VignetteDepends{methods}
%\VignetteKeywords{programming, methods}

\def\Rfunction#1{\texttt{#1()}}
\def\Robject#1{\texttt{#1()}}

\def\Rpackage#1{\textit{#1}}
\def\Rclass#1{\textit{#1}}

 
\documentclass[11pt]{article}


\usepackage[authoryear,round]{natbib}
\usepackage{times}

\usepackage{hyperref}

\begin{document}
\title{S4 Classes in 15 pages, more or less}
\maketitle


\section*{Overview}

The preferred mechanism for object--oriented programming in R
is described in \citet{Chamb1998}.
The actual implementation is slightly different (in some parts
influenced by Dylan, \citep{Shalit96} ).
This document provides a short
introduction to programming using these classes and methods.
It is worth noting that R also supports an older class system (as
described in \citet{CH92} but we recommend developing all new programs
using the new system and will not refer to the older system further.

The object system is class--based with multiple inheritance and
generic functions that can dispatch on any set of arguments (
\textit{signature}).
In some sense the concepts of classes and methods are distinct and we
will deal with them separately.
The final section of this document will deal with documenting classes
and methods.

\section*{Classes}

A class is defined using \verb+setClass+.
\begin{verbatim}
  setClass(Class, representation, prototype,
    contains=character(), validity, access, where=1, version=FALSE)
\end{verbatim}
Where the arguments are:
\begin{description}
\item[Class] character string; the name of the class
\item[representation]  the slots that the new class should have and/or other
          classes that this class extends.  Usually a call to the
          `representation' function.
%'
\item[prototype] An object providing default data for the slots. New
  instances will pick up these values unless explicitly overridden at time
of construction.
\item[contains] The classes that this class extends.  All slots of parent
classes will be propagated to new instances of the extending class.
\item[validity, access, version] Control arguments included for
  compatibility with S-Plus.
\item[where] The environment to use to store or remove the definition as
          metadata.
\end{description}

Once a class has been defined \textit{instances} of that class can be created
using the function \verb+new+.
The class defines the structure of an object. The instances of that
class represent the objects themselves.

Classes can provide an abstraction of complex objects that helps to
simplify programming with them.
Appropriate design is essential and this is where much of the
programming effort (rather than in actual coding) should be focused.

A class can extend one or more other classes.  We can think of the
class hierarchy as being a directed graph (or tree).  The extended
classes are considered to be parents of the class that extends them.
The graph must be acyclic -- no class can be a parent of itself.
Extension means that the new class will contain all of the slots that
the parent classes have.  It is an error to create a class with
duplicate slot names (either by inclusion or directly).

<<>>=
  library(methods)
  setClass("foo", representation(a="character", b="numeric"))
  setClass("bar", representation(d="numeric", c="numeric"))
  setClass("baz", contains=c("foo", "bar"))
  getClass("baz")
@

Now we can create an instance of \Rclass{baz}. You do not need instances
of {\em foo} or {\em bar} to do so, \Rclass{baz} objects can be
instantiated directly.

Access to the values in a slot is through a special operator, the
\verb+@+ symbol.  There are many advantages to not accessing the
values in slots directly but rather to define and use special methods
for accessing the values. We will discuss some of these issues in the
next section when methods are discussed.

<<>>=
 x <- new("baz", a="xxx", b=5, c=10)
 x
 x@a
@


\subsection*{Virtual Classes}

Virtual classes are classes for which no instances will be (can be)
created.  Rather, they are used to link together classes which may
have distinct representations (and hence cannot inherit from each
other) but for which we want to provide similar functionality.  A
fairly standard mechanism is to create a virtual class and to then
have several other classes extend it.

In practice the virtual classes are then used in different ways:
\begin{enumerate}
\item Methods for the virtual class will apply to any of the classes
  that extend the virtual class.
\item A slot in a new class can have as its type the virtual
  class.  This allows slots to be polymorphic.
%FIXME:
% Is this also
% permitted by the "ANY" and "or" constructions in slot-typing?
%%Yes, but in the first case you have no control over what goes in there
\item If a virtual class has slots they will be common to
  all classes that extend the virtual class.
\end{enumerate}

Suppose that we want to define a class that represents a dendrogram.
Each node in a dendrogram has three values associated with it. The
height, the left node and the right node. Terminal nodes are different
they have a height and a value (possibly an instance of a yet another
class).

One way to implement this structure is to use a virtual class called
\verb+dendNode+, say. Then there are two classes that we can define
that extend this virtual class; terminal and non-terminal nodes.

<<>>=
setClass("dendNode")
setClass("dnode", representation(left="dendNode", right="dendNode",
   height="numeric"), contains="dendNode")
setClass("tnode", representation(height="numeric", value="numeric",
   label="character"), contains="dendNode")
@

Now we can create dendrograms whose nodes are either terminal or
non--terminal. The virtual class {\em dendNode} has been used to allow
two different classes of objects as the \verb+left+ and
\verb+right+ nodes of a dendrogram.
This design makes recursive manipulation of dendrograms
somewhat simpler.

A situation that seems to arise frequently is the desire to allow a
slot in an object to either contain a \Rclass{list} or to be \Robjec{
NULL}. Since the object \Robject{NULL} is not itself a list
these cannot ordinarily {\em share} a slot.  To overcome this we could
create a new virtual class that extends both the \Rclass{list} class and
the \Robject{NULL} class.


%%FIXME: this seems to work but is it really how we want to do this?

As described on page 294 of \citet{Chamb1998} we can operationalize
these ideas with the following code:
<<listornull>>=
setClass("listOrNULL")

setIs("list", "listOrNULL")
setIs("NULL", "listOrNULL")
@
Now if we define an object with a slot that is of type {\tt
  listOrNULL} it should accomodate either.
<<loNex>>=
setClass("c1", representation(value="listOrNULL"))
y<-new("c1", value=NULL)
y
y2 <- new("c1", value=list(a=10))
y2
@

One can create a virtual class in two different ways. First, as shown
above, if the call to \verb+setClass+ has no representation then the
class will be a virtual class. Otherwise, include the
class \Rclass{VIRTUAL} in the representation.

<<virtualClass>>=
setClass("myVclass", representation(a="character", "VIRTUAL"))
getClass("myVclass")
@
And we see that \verb+myVclass+ is indeed a virtual class with a
single slot, \verb+a+.

\subsection*{Initialization and prototyping}

In some situations it is desireable to control the creation of new
instances of a class. In some cases we might want to perform some
computations or other initialization processes.

There are two different mechansims for doing this. The first is to use
the {\tt prototype} argument of {\tt setClass}.
Using this argument any of the slots can be provided with initial values.

<<prototypes>>=
setClass("xx", representation(a="numeric", b="character"),
  prototype(a=3, b="hi there"))

new("xx")
@
And we see that new instances of the class {\tt xx} will all have
the values {\tt 3} and {\tt "hi there"} associated with the slots {\tt
  a} and {\tt b}, respectively.

In some cases this is not sufficiently general and we might, for
example want more control and flexibility.
To achieve this we define an {\tt initialize} method for our
class. This method, if defined will be applied after the {\tt
  prototype}, if one was defined.

<<initialize>>=
setMethod("initialize", "xx", function(.Object, b ) {
   .Object@b <- b
   .Object@a <- nchar(b)
   .Object
  })

 new("xx", b="yowser")

@

Note that the last statement in the initialize method must return the
{\em object}.
This is required because of the pass--by--value semantics of R. While
{\tt initialize} gives the appearance of changing the values of the
slots of its arguments it does not do so. Rather it creates a whole
new object sets the values of its slots and returns that new object.

\section*{Generics and Methods}

An equally important aspect of object oriented programming is the use
of generic functions and methods.  A generic function is essentially a
dispatching mechanism.  The methods are specialized functions that
perform the required task on a specific type of input. The job of the
generic function is to determine which of the methods is most
applicable for a given set of arguments. Once this decision has been
made the method selected is invoked with the appropriate arguments.

Much of the S language already uses generic functions and methods. The
new system is a big improvement for several reasons
\begin{itemize}
\item Dispatching can be done on multiple arguments.
\item Methods should not be called directly but only via the generic.
\item The old style of dispatching made it virtually impossible to
distinguish the appropriate dispatching for a function named
\verb+foo.bar.baz+.
\end{itemize}

When creating methods you usually need to first determine whether a
generic function exists. If there is no generic function you must
create one.  An issue that may arise is the signature of the
generic. The signature of the generic function limits the signatures
of all methods defined for that generic function.

Once the signature for the generic has been specified methods can
implement some or all of the arguments of the generic function but
they cannot add any new arguments.  This implies that the construction
of the argument list for the generic is especially important.  All
potentially relevant arguments should be included.

\subsection*{Accessor Functions}

Slots can always be accessed directly using the \verb+@+
operator. This practice is not recommended (for Bioconductor) since it
produces code that depends on the actual class representation. Should
that representation change, for example some slots dropped in favour
of them being computed or renamed for some reason, then all of the
code that uses the \verb+@+ operator must be identified and modified.
Instead we recommend using the convention of accessing the slots via
methods (while this produces a lot of generic functions the
abstraction gained more than offsets this).

Suppose that the class \verb+foo+ has a slot named \verb+a+. To create
an accessor function for this slot you can proceed as shown below.
%FIXME: is this really what we want to do?
%it seems that there must be a nicer way....
<<>>=
 setClass("foo", representation(a="ANY"))

 if( !isGeneric("a") ) {
   if( is.function("a") )
          fun <- a
   else
         fun <- function(object) standardGeneric("a")
   setGeneric("a", fun)
 }

 setMethod("a", "foo", function(object) object@a)

 b<-new("foo", a=10) 

 a(b)
@

\subsection{Replacement Methods}

As noted previously one of the conceptual hurdles (and hence a real
one) is dealing with the pass-by-value semantics of the S language.
One of the reasons for object oriented programming is to use the computer
representation to mimic real-world objects. Since real-world objects
are mutable and their behavior depends on their history we would like
to mimic that in the computer representation.

However, R has pass-by-value semantics and so every function operates
on a copy of its arguments. Thus, change can not occur through
function calls in a straightforward manner.  The replacement methods
are one mechansim that can be used to provide the appearance of
pass-by-reference semantics.

Basically a replacement method \textit{silently} replaces the whole
object with a suitably modified copy of itself.  For example, in
\verb+x[1]<-10+, it appears to the user that the 1st element of
\verb+x+ has had its value changed to 10.  But that is not what
happens. First an analysis is done to determine that the symbol being
operated on is \Robject{x}. Then a copy of \Robject{x} is made, the
first value of the copy is changed, and finally the binding for
\Robject{x} is changed to the new value.

This process is automatic and most users are not aware of the true
underlying semantics. The procedure is largely conventional.  One of
the requirements is that the last statement in any replacement method
(function) is the whole object.  A second requirement is that the
last argument must be named \verb+value+. This is to ensure that S
can always identify the value that is going to be assigned.

We continue the example given above.
<<replacement>>=

 setGeneric("a<-", function(x, value) 
            standardGeneric("a<-"))

 setReplaceMethod("a", "foo", 
  function(x, value) {
    x@a <- value
    x
  })

  a(b) <- 32
@

We also had to define an appropriate generic function. In this case
the name of the generic is \verb+a<-+ since that is the traditional
expression for assignment functions, the name of the function
for which an assignment version is being defined
concatenated with the assignment operator.

Finally, note that the value in the \verb+a+ slot of the object
\verb+b+ has been {\em changed} (as noted really the whole object has
been changed).


\section*{Documentation}

Documentation of S4 classes and methods is aided by two
functions in the {\em methods} package. They are
{\tt promptClass} and {\tt promptMethods}
both create documentation templates in the .Rd markup
dialect.

First consider the top of the file {\tt iter-methods.Rd} in the {\em
  Biobase} package.
\begin{verbatim}
\name{iter-methods}
\docType{methods}
\title{ Methods for the generic iter.}
\alias{iter-methods}
\alias{iter}
\end{verbatim}
Here we have the name of the file, the documentation type (not sure
just how it works yet), the title and some {\tt alias}es.
The aliases are used by R's help system to find the correct
documentation.
%'
From the {\tt help} man page,
\begin{verbatim}
     help(topic, offline = FALSE, package = .packages(),
          lib.loc = NULL, verbose = getOption("verbose"),
          try.all.packages = getOption("help.try.all.packages"),
          htmlhelp = getOption("htmlhelp"),
          pager = getOption("pager"))
     ?topic
     type?topic
\end{verbatim}

We can see how the \verb+?+ operator works. It is either monadic or
dyadic. In the dyadic form it constructs calls to \verb+help+ for
the string \verb+topic-type+.


Consider the {\tt aggregator}
class in the Biobase package.  When R receives the command
{\tt promptClass("aggregator")}, it will dump the following
text to the file {\tt aggregator-class.Rd}:

\begin{verbatim}
\name{aggregator-class}
\docType{class}
\alias{aggregator-class}
\title{Class aggregator, ~~class for ... ~~ }
\description{  ~~ A concise (1-5 lines) description of what the class is  ~~}
\section{Creating Objects}{
\code{  new('aggregator',}\cr
\code{    aggenv  = ...., # Object of class environment}\cr
\code{    initfun = ...., # Object of class function}\cr
\code{    aggfun  = ...., # Object of class function}\cr
\code{  )}}
\section{Slots}{
  \describe{
    \item{\code{aggenv}:}{Object of class "environment" ~~ }
    \item{\code{initfun}:}{Object of class "function" ~~ }
    \item{\code{aggfun}:}{Object of class "function" ~~ }
  }
}

\section{Prototype}{
  \describe{
    \item{environment aggenv}{<environment> }
    \item{function initfun}{function (name, val) 1 }
    \item{function aggfun}{function (name, current, val) current + 1 }
  }
}
\section{Methods}{
  \describe{
    \item{aggenv}{(aggregator): ... }
    \item{aggfun}{(aggregator): ... }
    \item{initfun}{(aggregator): ... }
  }
}
\keyword{methods}

\end{verbatim}

The developer should add narrative content to this skeleton, and
copy the final file to the {\tt man} section of the
package implementing the class.  See {\tt Biobase/man/aggregator-class.Rd}
for an example.  Note that the {\tt xyz-class} naming convention
permits use of the binary version of {\tt ?}.  Thus
{\tt class?aggregator} is the command that retrieves the manual
page for class aggregator.

\section*{Using {\em methods} in R packages}

S4 methods and classes depend on the availability of metadata.
For this reason suppling class definitions and methods in R packages
requires a little more programming than what is needed for developing
standard R packages (as described in the R Extensions Manual).

We will consider building a small package that will provide the class
definitions defined earlier in this document.
The first thing to do is to collect these functions into a single
file.
A relatively standard mechanism is to place all class, generic and
method definitions inside of a function.
In the example below the function is called {\tt .initFoo}.
The reason it has a name that starts with a dot is so that the user
will not see it and inadvertantly call it.

\begin{verbatim}
.initFoo <- function(where) {
 setClass("foo", representation(a="character", b="numeric"),where=where)
  setClass("bar", representation(d="numeric", c="numeric"), where=where)
  setClass("baz", contains=c("foo", "bar"), where=where)
}
\end{verbatim}

We then create a second file, this contains certain startup code
needed when loading the R package. This file is traditionally named
{\tt zzz.R}.

\begin{verbatim}
.First.lib <- function(libname, pkgname, where) {
    if( !require(methods) ) stop("we require methods for package Foo")
    where <- match(paste("package:", pkgname, sep=""), search())
    .initFoo(where)
}
\end{verbatim}

When a packages is loaded in R one of the initialization mechanisms is
to search for a function called {\tt .First.lib}.
If such a function is found it is evaluated. In the case described
above one of the actions taken is to evaluate the function {\tt
  .initFoo}. That creates the class definitions and stores the
appropriate metadata with the {\tt Foo} package.

\bibliographystyle{apalike}
\bibliography{S4Objects}

\end{document}