Summary

Simple text data is stored using 1 byte per character. Integers are stored using 2 or 4 bytes and real values typically use 4 or 8 bytes.

There is a limit to the size of numbers that can be stored digitally and for real values there is a limit on the precision with which values can be stored.

Plain text files are the simplest data storage solution, with the advantage that they are simple to use, work across different computer platforms, and work with virtually any software. The main disadvantage to plain text files is their lack of standard structure, which means that software requires human input to determine where data values reside within the file. Plain text files are also generally larger and slower than other data storage options.

CSV (comma-separated values) files offer the most standardized plain text format.

Binary formats tend to provide smaller files and faster access speeds. The disadvantage is that data stored in a binary format can only be accessed using specific software.

Spreadsheets are ubiquitous, flexible, and easy to use. However, they lack structure so should be used with caution.

XML is a language that can be used for marking up data. XML files are plain text but provide structure that allows software to automatically determine the location of data values within the file (XML files are self-describing).

Databases are sophisticated but relatively complex. They are useful for storing very large or very complex data sets but require specific software and much greater expertise.

Paul Murrell

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 New Zealand License.