The Crohn's Diseases Data Set

Paul Murrell


Table of Contents

Introduction
Variables
Data format

Introduction

These data are genetic sequences for 387 individuals from a study of Crohn's disease, which is an inflammatory bowel disease (Daly et al., 2001a, Nature Genetics, 29, 223-228). The data are in a text-based LINKAGE format.

Variables

Name: pedigree

Type: character

Description: The pedigree of the individual (which genetic family tree the individual belongs to).

Name: ID

Type: integer (min: 0)

Description: A unique identifier for the individual.

Name: parentA

Type: integer (min: 0)

Description: The identifier for one parent.

Name: parentB

Type: integer (min: 0)

Description: The identifier for the other parent.

Name: gender

Type: categorical ( 1 means male, 2 means female )

Description: The individual's gender.

Name: status

Type: categorical ( 0 means unknown, 1 means crohnsnormal )

Description: Whether the individual has Crohn's disease.

Name: marker1A to marker103B

Type: integer (min: 0)

Description: The genotype for the individual at 103 different locations. marker1A gives the individual's first allele at locus 1 and marker1B gives the individual's second allele at locus 1 and so on.

Data format

The data set is provided as a tab-delimited ASCII text file called Dalydata.txt. The format is designed for use with the LINKAGE software.

The file CrohnsMeta.xml provides a StatDataML description of the data set.