Program: FLOSS (flexible ordered subset analysis)
Version: 1.4
Author: Brian L. Browning
Email: brian_browning1@yahoo.com
All rights reserved
You have permission to use and develop the FLOSS and COV programs ("the Program"), provided that the following conditions are met:
The FLOSS software package uses input and output files from the MERLIN linkage analysis package to perform an ordered subset analysis using either nonparametric linkage analysis z-scores or linear allele sharing model LOD scores. The FLOSS program is written in java and requires a java 1.4 interpreter.
If you use FLOSS in your published work, please cite
Browning, BL. FLOSS: Flexible ordered subsets analysis for linkage analysis of complex traits. Bioinformatics. Submitted.
FLOSS is designed to be used with the MERLIN linkage analysis package (Abecasis et al, 2002). The data Merlin (.dat) and pedigree files (.ped) can be used provided that the files are white-space delimited and each line begins with a non-white-space character. Thus the the "/" character should not separate the two alleles in the MERLIN pedigree file. Also, FLOSS does not permit an "E END-OF-DATA" line at the end of the MERLIN data file.
See Download FLOSS for examples of MERLIN pedigree and data files that can be used with FLOSS. output files.
The FLOSS program requires two input files: a linkage score file and a covariate file. The linkage score file is a MERLIN ".lod" output file created using MERLIN with the --perFamily option. The covariate file can be created using the MERLIN pedigree (.ped) and data (.dat) input files. All covariates must be identified with a 'C' in the ".dat" file, and must be numeric (not categorical) data.
To create the covariate file from the MERLIN pedigree and data files enter the command:
java -jar cov.jar [options]
where [options] are combinations of the following flags and arguments:
A short suffix is appended to the name of each covariate that identifies the subject filter, mininum number of subjects, and statistic used to define the family covariate score. For example, if you defined the family covariate score using the flags "-f all -n2 -s avg" for the "age_of_onset" covariate then covariate name in the covariate file would be "age_of_onset.avg2all".
The subject filter specifies the subset of families members that will be used to calculate the family covariate value. Three filters options are available: all, aff, and FDU.
Note: if there are multiple affection status variables specified in the MERLIN data file, only the first affection status variable is used to determine the subject's affection status.
The minimum number of subjects required in order to define a family covariate value. First the subject filter specified with the "-f" option is applied. If there are fewer than the specified number of subjects after the subject filter is applied, the family is assigned an unknown covariate value ("NaN").
The statistic used to create the covariate value. Three statistics are available: min, max, and avg.
FamilyID
followed by the family
identifiers. The first row contains FamilyID
followed by
the covariate identifiers. All other entries of the matrix give the
family covariate scores for the family (determined by the row)
and the covariate (determined by the column). A NaN
in an
entry indicates that a family covariate score is undefined.
When creating the covariate file using the COV
program, an additional column __asm__
is added.
This column gives the maximum value for the allele sharing parameter
for each family when using linear allele sharing model LOD scores. The
__asm__
column is not used when using nonparametric
linkage scores.
Ordered subset analysis program is run using the "floss.jar" program. Enter
java -jar floss.jar [options]
where [options] are combinations of the following flags and arguments:
--perFamily
option. Required.k
smallest or k
largest covariate scores.N
families
linkage analysis is performed using families i
through
j
for 1 ≤ i &le j &le N
. This
option is discouraged since the increased number of subsets makes
it more difficult to detect disease loci associated with unusually low
or high covariate values and requires substantially more computing time.
Ordered subset analysis produces four output files. The output filenames
have the format prefix.extension
where the prefix is the
filename prefix specified with the "-o" flag when running FLOSS and
the extension is ".out", ".fam", ".plt", or ".log"
The summary file (.out) records the analysis options and gives summary information for each covariate analyzed. The file reports the change in linkage score between the entire set of families, and the ordered subset with the highest linkage score, the maximum linkage score for this ordered subset, the optimal interval of family covariate scores, and the Monte Carlo p-value with a 95% confidence interval. The summary file is self-documented with documentation included at the end of the ".out" file.
The ".fam" file gives the families ordered by the covariate values. The ".fam" file is arranged in sections corresponding to each covariate listed in the Covariate file. The sections are separated by a blank line. Each section contains three columns, and the first row in each section contains labels for the columns. The first column is labeled "Family" and contains the identifiers for families with defined covariate values in order of increasing covariate value. The second column is labeled "Subset" and contains "x" if the family in the first column is included in the ordered subset with the highest linkage score (when maximized over all ordered subsets and all loci). The third column is labeled with the covariate name and gives the covariate value for the families in the first column.
The ".plt" file contains data for plots of linkage scores by position. The first column is labeled "Position" and contains the position of all loci used in the ordered subset analysis. All data in each row is computed at the position specified in the first column. The second column is labeled "Orig Score" and lists the linkage scores at the position specified in the first columnobtained using all families . After the first two columns, the columns correspond to the covariates in the ordered subset analysis and are labeled by the covariate names. Each covariate column gives the linkage score at the position specified in the first column for the ordered subset that maximizes the linkage score. The subset of families used to compute linkage scores in each covariate column is obtained by maximizing over all loci and over all ordered subsets (ordered by the specified covariate).
The ".log" file gives details for all ordered subset considered in the ordered subset analysis. The ".log" file is arranged in sections corresponding to each covariate listed in the Covariate file.
The first line in a section contains the name of the covariate. The next line says "Ordered Families:", and the following line or lines list the identifiers for the families used in the ordered subset analysis. The families are listed in order of increasing covariate values.
Following the ordered family identifiers are the results from each ordered subset considered in the ordered subset analysis. The results are presented in five columns. Each line corresponds to a distinct ordered subset.
The first column (First Fam) lists the family with the smallest covariate value in the ordered subset, the second column (Last Fam) lists the family with the largest covariate value in the ordered subset, the third column (Peak) lists the locus where the highest linkage score was observed for the ordered subset of families specified in the first two columns. The fourth column (Subset Score) lists the highest linkage score observed for the ordered subset of families specified in the first two columns. The fifth column (Orig Score) lists the linkage score for the position specified in column 3 for the set of all families. If the maximum scores were determined by maximizing parameters, the remaining columns (Subset Parameters) will list the parameter names and values that yielded the maximum linkage score for the position specified in column 3 for the set of all families.