% % NOTE -- ONLY EDIT S4Objects.Rnw!!! % S4Objects.tex file will get overwritten. % %\VignetteIndexEntry{S4 Classes and Methods} %\VignetteDepends{methods} %\VignetteKeywords{programming, methods} \def\Rfunction#1{\texttt{#1()}} \def\Robject#1{\texttt{#1()}} \def\Rpackage#1{\textit{#1}} \def\Rclass#1{\textit{#1}} \documentclass[11pt]{article} \usepackage[authoryear,round]{natbib} \usepackage{times} \usepackage{hyperref} \begin{document} \title{S4 Classes in 15 pages, more or less} \maketitle \section*{Overview} The preferred mechanism for object--oriented programming in R is described in \citet{Chamb1998}. The actual implementation is slightly different (in some parts influenced by Dylan, \citep{Shalit96} ). This document provides a short introduction to programming using these classes and methods. It is worth noting that R also supports an older class system (as described in \citet{CH92} but we recommend developing all new programs using the new system and will not refer to the older system further. The object system is class--based with multiple inheritance and generic functions that can dispatch on any set of arguments ( \textit{signature}). In some sense the concepts of classes and methods are distinct and we will deal with them separately. The final section of this document will deal with documenting classes and methods. \section*{Classes} A class is defined using \verb+setClass+. \begin{verbatim} setClass(Class, representation, prototype, contains=character(), validity, access, where=1, version=FALSE) \end{verbatim} Where the arguments are: \begin{description} \item[Class] character string; the name of the class \item[representation] the slots that the new class should have and/or other classes that this class extends. Usually a call to the `representation' function. %' \item[prototype] An object providing default data for the slots. New instances will pick up these values unless explicitly overridden at time of construction. \item[contains] The classes that this class extends. All slots of parent classes will be propagated to new instances of the extending class. \item[validity, access, version] Control arguments included for compatibility with S-Plus. \item[where] The environment to use to store or remove the definition as metadata. \end{description} Once a class has been defined \textit{instances} of that class can be created using the function \verb+new+. The class defines the structure of an object. The instances of that class represent the objects themselves. Classes can provide an abstraction of complex objects that helps to simplify programming with them. Appropriate design is essential and this is where much of the programming effort (rather than in actual coding) should be focused. A class can extend one or more other classes. We can think of the class hierarchy as being a directed graph (or tree). The extended classes are considered to be parents of the class that extends them. The graph must be acyclic -- no class can be a parent of itself. Extension means that the new class will contain all of the slots that the parent classes have. It is an error to create a class with duplicate slot names (either by inclusion or directly). <<>>= library(methods) setClass("foo", representation(a="character", b="numeric")) setClass("bar", representation(d="numeric", c="numeric")) setClass("baz", contains=c("foo", "bar")) getClass("baz") @ Now we can create an instance of \Rclass{baz}. You do not need instances of {\em foo} or {\em bar} to do so, \Rclass{baz} objects can be instantiated directly. Access to the values in a slot is through a special operator, the \verb+@+ symbol. There are many advantages to not accessing the values in slots directly but rather to define and use special methods for accessing the values. We will discuss some of these issues in the next section when methods are discussed. <<>>= x <- new("baz", a="xxx", b=5, c=10) x x@a @ \subsection*{Virtual Classes} Virtual classes are classes for which no instances will be (can be) created. Rather, they are used to link together classes which may have distinct representations (and hence cannot inherit from each other) but for which we want to provide similar functionality. A fairly standard mechanism is to create a virtual class and to then have several other classes extend it. In practice the virtual classes are then used in different ways: \begin{enumerate} \item Methods for the virtual class will apply to any of the classes that extend the virtual class. \item A slot in a new class can have as its type the virtual class. This allows slots to be polymorphic. %FIXME: % Is this also % permitted by the "ANY" and "or" constructions in slot-typing? %%Yes, but in the first case you have no control over what goes in there \item If a virtual class has slots they will be common to all classes that extend the virtual class. \end{enumerate} Suppose that we want to define a class that represents a dendrogram. Each node in a dendrogram has three values associated with it. The height, the left node and the right node. Terminal nodes are different they have a height and a value (possibly an instance of a yet another class). One way to implement this structure is to use a virtual class called \verb+dendNode+, say. Then there are two classes that we can define that extend this virtual class; terminal and non-terminal nodes. <<>>= setClass("dendNode") setClass("dnode", representation(left="dendNode", right="dendNode", height="numeric"), contains="dendNode") setClass("tnode", representation(height="numeric", value="numeric", label="character"), contains="dendNode") @ Now we can create dendrograms whose nodes are either terminal or non--terminal. The virtual class {\em dendNode} has been used to allow two different classes of objects as the \verb+left+ and \verb+right+ nodes of a dendrogram. This design makes recursive manipulation of dendrograms somewhat simpler. A situation that seems to arise frequently is the desire to allow a slot in an object to either contain a \Rclass{list} or to be \Robjec{ NULL}. Since the object \Robject{NULL} is not itself a list these cannot ordinarily {\em share} a slot. To overcome this we could create a new virtual class that extends both the \Rclass{list} class and the \Robject{NULL} class. %%FIXME: this seems to work but is it really how we want to do this? As described on page 294 of \citet{Chamb1998} we can operationalize these ideas with the following code: <>= setClass("listOrNULL") setIs("list", "listOrNULL") setIs("NULL", "listOrNULL") @ Now if we define an object with a slot that is of type {\tt listOrNULL} it should accomodate either. <>= setClass("c1", representation(value="listOrNULL")) y<-new("c1", value=NULL) y y2 <- new("c1", value=list(a=10)) y2 @ One can create a virtual class in two different ways. First, as shown above, if the call to \verb+setClass+ has no representation then the class will be a virtual class. Otherwise, include the class \Rclass{VIRTUAL} in the representation. <>= setClass("myVclass", representation(a="character", "VIRTUAL")) getClass("myVclass") @ And we see that \verb+myVclass+ is indeed a virtual class with a single slot, \verb+a+. \subsection*{Initialization and prototyping} In some situations it is desireable to control the creation of new instances of a class. In some cases we might want to perform some computations or other initialization processes. There are two different mechansims for doing this. The first is to use the {\tt prototype} argument of {\tt setClass}. Using this argument any of the slots can be provided with initial values. <>= setClass("xx", representation(a="numeric", b="character"), prototype(a=3, b="hi there")) new("xx") @ And we see that new instances of the class {\tt xx} will all have the values {\tt 3} and {\tt "hi there"} associated with the slots {\tt a} and {\tt b}, respectively. In some cases this is not sufficiently general and we might, for example want more control and flexibility. To achieve this we define an {\tt initialize} method for our class. This method, if defined will be applied after the {\tt prototype}, if one was defined. <>= setMethod("initialize", "xx", function(.Object, b ) { .Object@b <- b .Object@a <- nchar(b) .Object }) new("xx", b="yowser") @ Note that the last statement in the initialize method must return the {\em object}. This is required because of the pass--by--value semantics of R. While {\tt initialize} gives the appearance of changing the values of the slots of its arguments it does not do so. Rather it creates a whole new object sets the values of its slots and returns that new object. \section*{Generics and Methods} An equally important aspect of object oriented programming is the use of generic functions and methods. A generic function is essentially a dispatching mechanism. The methods are specialized functions that perform the required task on a specific type of input. The job of the generic function is to determine which of the methods is most applicable for a given set of arguments. Once this decision has been made the method selected is invoked with the appropriate arguments. Much of the S language already uses generic functions and methods. The new system is a big improvement for several reasons \begin{itemize} \item Dispatching can be done on multiple arguments. \item Methods should not be called directly but only via the generic. \item The old style of dispatching made it virtually impossible to distinguish the appropriate dispatching for a function named \verb+foo.bar.baz+. \end{itemize} When creating methods you usually need to first determine whether a generic function exists. If there is no generic function you must create one. An issue that may arise is the signature of the generic. The signature of the generic function limits the signatures of all methods defined for that generic function. Once the signature for the generic has been specified methods can implement some or all of the arguments of the generic function but they cannot add any new arguments. This implies that the construction of the argument list for the generic is especially important. All potentially relevant arguments should be included. \subsection*{Accessor Functions} Slots can always be accessed directly using the \verb+@+ operator. This practice is not recommended (for Bioconductor) since it produces code that depends on the actual class representation. Should that representation change, for example some slots dropped in favour of them being computed or renamed for some reason, then all of the code that uses the \verb+@+ operator must be identified and modified. Instead we recommend using the convention of accessing the slots via methods (while this produces a lot of generic functions the abstraction gained more than offsets this). Suppose that the class \verb+foo+ has a slot named \verb+a+. To create an accessor function for this slot you can proceed as shown below. %FIXME: is this really what we want to do? %it seems that there must be a nicer way.... <<>>= setClass("foo", representation(a="ANY")) if( !isGeneric("a") ) { if( is.function("a") ) fun <- a else fun <- function(object) standardGeneric("a") setGeneric("a", fun) } setMethod("a", "foo", function(object) object@a) b<-new("foo", a=10) a(b) @ \subsection{Replacement Methods} As noted previously one of the conceptual hurdles (and hence a real one) is dealing with the pass-by-value semantics of the S language. One of the reasons for object oriented programming is to use the computer representation to mimic real-world objects. Since real-world objects are mutable and their behavior depends on their history we would like to mimic that in the computer representation. However, R has pass-by-value semantics and so every function operates on a copy of its arguments. Thus, change can not occur through function calls in a straightforward manner. The replacement methods are one mechansim that can be used to provide the appearance of pass-by-reference semantics. Basically a replacement method \textit{silently} replaces the whole object with a suitably modified copy of itself. For example, in \verb+x[1]<-10+, it appears to the user that the 1st element of \verb+x+ has had its value changed to 10. But that is not what happens. First an analysis is done to determine that the symbol being operated on is \Robject{x}. Then a copy of \Robject{x} is made, the first value of the copy is changed, and finally the binding for \Robject{x} is changed to the new value. This process is automatic and most users are not aware of the true underlying semantics. The procedure is largely conventional. One of the requirements is that the last statement in any replacement method (function) is the whole object. A second requirement is that the last argument must be named \verb+value+. This is to ensure that S can always identify the value that is going to be assigned. We continue the example given above. <>= setGeneric("a<-", function(x, value) standardGeneric("a<-")) setReplaceMethod("a", "foo", function(x, value) { x@a <- value x }) a(b) <- 32 @ We also had to define an appropriate generic function. In this case the name of the generic is \verb+a<-+ since that is the traditional expression for assignment functions, the name of the function for which an assignment version is being defined concatenated with the assignment operator. Finally, note that the value in the \verb+a+ slot of the object \verb+b+ has been {\em changed} (as noted really the whole object has been changed). \section*{Documentation} Documentation of S4 classes and methods is aided by two functions in the {\em methods} package. They are {\tt promptClass} and {\tt promptMethods} both create documentation templates in the .Rd markup dialect. First consider the top of the file {\tt iter-methods.Rd} in the {\em Biobase} package. \begin{verbatim} \name{iter-methods} \docType{methods} \title{ Methods for the generic iter.} \alias{iter-methods} \alias{iter} \end{verbatim} Here we have the name of the file, the documentation type (not sure just how it works yet), the title and some {\tt alias}es. The aliases are used by R's help system to find the correct documentation. %' From the {\tt help} man page, \begin{verbatim} help(topic, offline = FALSE, package = .packages(), lib.loc = NULL, verbose = getOption("verbose"), try.all.packages = getOption("help.try.all.packages"), htmlhelp = getOption("htmlhelp"), pager = getOption("pager")) ?topic type?topic \end{verbatim} We can see how the \verb+?+ operator works. It is either monadic or dyadic. In the dyadic form it constructs calls to \verb+help+ for the string \verb+topic-type+. Consider the {\tt aggregator} class in the Biobase package. When R receives the command {\tt promptClass("aggregator")}, it will dump the following text to the file {\tt aggregator-class.Rd}: \begin{verbatim} \name{aggregator-class} \docType{class} \alias{aggregator-class} \title{Class aggregator, ~~class for ... ~~ } \description{ ~~ A concise (1-5 lines) description of what the class is ~~} \section{Creating Objects}{ \code{ new('aggregator',}\cr \code{ aggenv = ...., # Object of class environment}\cr \code{ initfun = ...., # Object of class function}\cr \code{ aggfun = ...., # Object of class function}\cr \code{ )}} \section{Slots}{ \describe{ \item{\code{aggenv}:}{Object of class "environment" ~~ } \item{\code{initfun}:}{Object of class "function" ~~ } \item{\code{aggfun}:}{Object of class "function" ~~ } } } \section{Prototype}{ \describe{ \item{environment aggenv}{ } \item{function initfun}{function (name, val) 1 } \item{function aggfun}{function (name, current, val) current + 1 } } } \section{Methods}{ \describe{ \item{aggenv}{(aggregator): ... } \item{aggfun}{(aggregator): ... } \item{initfun}{(aggregator): ... } } } \keyword{methods} \end{verbatim} The developer should add narrative content to this skeleton, and copy the final file to the {\tt man} section of the package implementing the class. See {\tt Biobase/man/aggregator-class.Rd} for an example. Note that the {\tt xyz-class} naming convention permits use of the binary version of {\tt ?}. Thus {\tt class?aggregator} is the command that retrieves the manual page for class aggregator. \section*{Using {\em methods} in R packages} S4 methods and classes depend on the availability of metadata. For this reason suppling class definitions and methods in R packages requires a little more programming than what is needed for developing standard R packages (as described in the R Extensions Manual). We will consider building a small package that will provide the class definitions defined earlier in this document. The first thing to do is to collect these functions into a single file. A relatively standard mechanism is to place all class, generic and method definitions inside of a function. In the example below the function is called {\tt .initFoo}. The reason it has a name that starts with a dot is so that the user will not see it and inadvertantly call it. \begin{verbatim} .initFoo <- function(where) { setClass("foo", representation(a="character", b="numeric"),where=where) setClass("bar", representation(d="numeric", c="numeric"), where=where) setClass("baz", contains=c("foo", "bar"), where=where) } \end{verbatim} We then create a second file, this contains certain startup code needed when loading the R package. This file is traditionally named {\tt zzz.R}. \begin{verbatim} .First.lib <- function(libname, pkgname, where) { if( !require(methods) ) stop("we require methods for package Foo") where <- match(paste("package:", pkgname, sep=""), search()) .initFoo(where) } \end{verbatim} When a packages is loaded in R one of the initialization mechanisms is to search for a function called {\tt .First.lib}. If such a function is found it is evaluated. In the case described above one of the actions taken is to evaluate the function {\tt .initFoo}. That creates the class definitions and stores the appropriate metadata with the {\tt Foo} package. \bibliographystyle{apalike} \bibliography{S4Objects} \end{document}