10.2 Data types and data structures

Individual values are either character values (text), numeric values (numbers), or logical values (TRUE or FALSE). R also supports complex values with an imaginary component.

There is a distinction within numeric values between integers and real values, but integer values tend to be coerced to real values if anything is done to them. If an integer is required, it is best to use a function that explicitly generates integer values, such as as.integer().

On a 32-bit operating system, in an English locale, a character value uses 1 byte per character; an integer uses 4 bytes, as does a logical value; and a real number uses 8 bytes. The function object.size() returns the approximate number of bytes used by an R data structure in memory.

> object.size(1:1000)

4040 bytes

> object.size(as.numeric(1:1000))

8040 bytes

The simplest data structure in R is a vector. All elements of a vector must have the same basic type. Most operators and many functions accept vector arguments and return a vector result.

Matrices and arrays are multidimensional analogues of the vector. All elements must have the same type.

Data frames are collections of vectors where each vector must have the same length, but different vectors can have different types. This data structure is the standard way to represent a data set in R.

Lists are like vectors that can have different types of data structures in each component. In the simplest case, each component of a list may be a vector of values. Like the data frame, each component can be a vector of a different basic type, but for lists there is no requirement that each component has the same size. More generally, the components of a list can be more complex data structures, such as matrices, data frames, or even other lists. Lists can be used to efficiently represent hierarchical data in R.

Paul Murrell

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 New Zealand License.