FLOSS documentation and program files

Documentation and program files for FLOSS version 1.4

Program: FLOSS (flexible ordered subset analysis)
Version: 1.4
Author: Brian L. Browning
Email: brian_browning1@yahoo.com

License

You have permission to use and develop the FLOSS and COV programs ("the Program"), provided that the following conditions are met:

You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.
If the FLOSS or COV software is used for analyses which will be reported or published,you specify the version of the software used, and cite the article noted in the citation section below.
You acknowledge that Brian Browning, GlaxoSmithKline ("GSK") and the GSK developers may develop modifications to the software that may be substantially similar to your modifications of the software and that Brian Browning, GSK and GSK developers shall not be constrained in any way by you in Brian Browning's, GSK's and GSK developers' use or management of such modifications. You acknowledge the right of Brian Browning, GSK and GSK developers to prepare and publish modifications to the software that may be substantially similar or functionally equivalent to your modifications and improvements, and if you obtain patent protection for any modification or improvement to the software, you agree not to allege or enjoin infringement of your patent by Brian Browning, GSK or GSK developers
This software is provided ``AS IS'' and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall Brian Browning or GlaxoSmithKline be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

Introduction
Citation
Using FLOSS with MERLIN
Creating FLOSS input files
Running the FLOSS program
FLOSS output files
Download FLOSS
References

Introduction

The FLOSS software package uses input and output files from the MERLIN linkage analysis package to perform an ordered subset analysis using either nonparametric linkage analysis z-scores or linear allele sharing model LOD scores. The FLOSS program is written in java and requires a java 1.4 interpreter.

Citation

If you use FLOSS in your published work, please cite

Browning, BL. FLOSS: Flexible ordered subsets analysis for linkage analysis of complex traits. Bioinformatics. Submitted.

Using FLOSS with MERLIN

FLOSS is designed to be used with the MERLIN linkage analysis package (Abecasis et al, 2002). The data Merlin (.dat) and pedigree files (.ped) can be used provided that the files are white-space delimited and each line begins with a non-white-space character. Thus the the "/" character should not separate the two alleles in the MERLIN pedigree file. Also, FLOSS does not permit an "E END-OF-DATA" line at the end of the MERLIN data file.

See Download FLOSS for examples of MERLIN pedigree and data files that can be used with FLOSS. output files.

Creating FLOSS input files

The FLOSS program requires two input files: a linkage score file and a covariate file. The linkage score file is a MERLIN ".lod" output file created using MERLIN with the --perFamily option. The covariate file can be created using the MERLIN pedigree (.ped) and data (.dat) input files. All covariates must be identified with a 'C' in the ".dat" file, and must be numeric (not categorical) data.

To create the covariate file from the MERLIN pedigree and data files enter the command:

java -jar cov.jar [options]

where [options] are combinations of the following flags and arguments:

-d [.dat file]: name of MERLIN data file. Required.
-p [.ped file]: name of MERLIN pedigree file. Required.
-o [output file]: name of output file. It is suggested that the output covariate filename end in ".cov". Required.
-f [filter]: subject filter. Optional: defaults to "-f all".
-n [number]: minimum number of subjects required to define a family covariate value. The argument must be an integer. Optional: defaults to "-n 2".
-s [statistic]: statistic used. Optional: defaults to "-s avg".

A short suffix is appended to the name of each covariate that identifies the subject filter, mininum number of subjects, and statistic used to define the family covariate score. For example, if you defined the family covariate score using the flags "-f all -n2 -s avg" for the "age_of_onset" covariate then covariate name in the covariate file would be "age_of_onset.avg2all".

Subject filter: -f

The subject filter specifies the subset of families members that will be used to calculate the family covariate value. Three filters options are available: all, aff, and FDU.

-f all: use all family members who have a covariate value.
-f aff: use all affected family members who have a covariate value.
-f FDU: First Degree Unaffected: use all family members who are not affected (ie whose affection status is either unaffected or unknown) and who have a first degree relative (parent, offspring, or full sibling) who is affected. This filter is useful if you are concerned that the covariate values of affected members will be influenced by treatment for the affection.

Note: if there are multiple affection status variables specified in the MERLIN data file, only the first affection status variable is used to determine the subject's affection status.

Minimum number of subjects: -f

The minimum number of subjects required in order to define a family covariate value. First the subject filter specified with the "-f" option is applied. If there are fewer than the specified number of subjects after the subject filter is applied, the family is assigned an unknown covariate value ("NaN").

Statistic: -s

The statistic used to create the covariate value. Three statistics are available: min, max, and avg.

-s min: minimum covariate value for the family members specified with the subject filter argument.
-s max: maximum covariate value for the family members specified with the subject filter argument.
-s avg: mean covariate value for the family members specified with the subject filter argument.

Covariate File Format

The covariate file is a white-space delimited matrix of entries. The first column contains FamilyID followed by the family identifiers. The first row contains FamilyID followed by the covariate identifiers. All other entries of the matrix give the family covariate scores for the family (determined by the row) and the covariate (determined by the column). A NaN in an entry indicates that a family covariate score is undefined. When creating the covariate file using the COV program, an additional column __asm__ is added. This column gives the maximum value for the allele sharing parameter for each family when using linear allele sharing model LOD scores. The __asm__ column is not used when using nonparametric linkage scores.

Running the FLOSS program

Ordered subset analysis program is run using the "floss.jar" program. Enter

java -jar floss.jar [options]

where [options] are combinations of the following flags and arguments:

-c [.cov file]: name of covariate file. Multiple covariate files can be analyzed in a single run by including a separate "-c" flag before each covariate filename. Required (see Creating FLOSS input files).
-merlin [.lod file]: name of MERLIN ".lod" file. Created by MERLIN when using the --perFamily option. Required.
-o [output prefix]: prefix of output files . Required.
-seed [integer]: seed for random number generator. Optional: defaults to "-seed 0".
-subsets [type]: Type of ordered subsets used. Type must be "extreme" or "slice". Optional: defaults to "-subsets extreme".
-minperm [integer]: minimum number of permutations for the permutation test. Optional: defaults to "-minperm 100".
-maxperm [integer]: maximum number of permutations for the permutation test. Optional: defaults to "-maxperm 10000".
--npl: Compute a nonparametric linkage (NPL) Z-score statistic for each subset of families. Note that two hypens "--" are required. Optional: linear allele sharing model LOD scores are used if "--npl" option is absent.

Type of ordered subsets: -subsets

-subsets extreme: Rank the families in order of increasing family covariate score and perform linkage on all subsets of families with the k smallest or k largest covariate scores.
-subsets slice: Rank the families in order of increasing family covariate score and perform linkage on all subsets of families with consecutive covariate scores. For example, if there are N families linkage analysis is performed using families i through j for 1 ≤ i &le j &le N. This option is discouraged since the increased number of subsets makes it more difficult to detect disease loci associated with unusually low or high covariate values and requires substantially more computing time.

FLOSS output files

Ordered subset analysis produces four output files. The output filenames have the format prefix.extension where the prefix is the filename prefix specified with the "-o" flag when running FLOSS and the extension is ".out", ".fam", ".plt", or ".log"

Summary file (.out)

The summary file (.out) records the analysis options and gives summary information for each covariate analyzed. The file reports the change in linkage score between the entire set of families, and the ordered subset with the highest linkage score, the maximum linkage score for this ordered subset, the optimal interval of family covariate scores, and the Monte Carlo p-value with a 95% confidence interval. The summary file is self-documented with documentation included at the end of the ".out" file.

Families file (.fam)

The ".fam" file gives the families ordered by the covariate values. The ".fam" file is arranged in sections corresponding to each covariate listed in the Covariate file. The sections are separated by a blank line. Each section contains three columns, and the first row in each section contains labels for the columns. The first column is labeled "Family" and contains the identifiers for families with defined covariate values in order of increasing covariate value. The second column is labeled "Subset" and contains "x" if the family in the first column is included in the ordered subset with the highest linkage score (when maximized over all ordered subsets and all loci). The third column is labeled with the covariate name and gives the covariate value for the families in the first column.

Plotting file (.plt)

The ".plt" file contains data for plots of linkage scores by position. The first column is labeled "Position" and contains the position of all loci used in the ordered subset analysis. All data in each row is computed at the position specified in the first column. The second column is labeled "Orig Score" and lists the linkage scores at the position specified in the first columnobtained using all families . After the first two columns, the columns correspond to the covariates in the ordered subset analysis and are labeled by the covariate names. Each covariate column gives the linkage score at the position specified in the first column for the ordered subset that maximizes the linkage score. The subset of families used to compute linkage scores in each covariate column is obtained by maximizing over all loci and over all ordered subsets (ordered by the specified covariate).

Log file (.log)

The ".log" file gives details for all ordered subset considered in the ordered subset analysis. The ".log" file is arranged in sections corresponding to each covariate listed in the Covariate file.

The first line in a section contains the name of the covariate. The next line says "Ordered Families:", and the following line or lines list the identifiers for the families used in the ordered subset analysis. The families are listed in order of increasing covariate values.

Following the ordered family identifiers are the results from each ordered subset considered in the ordered subset analysis. The results are presented in five columns. Each line corresponds to a distinct ordered subset.

The first column (First Fam) lists the family with the smallest covariate value in the ordered subset, the second column (Last Fam) lists the family with the largest covariate value in the ordered subset, the third column (Peak) lists the locus where the highest linkage score was observed for the ordered subset of families specified in the first two columns. The fourth column (Subset Score) lists the highest linkage score observed for the ordered subset of families specified in the first two columns. The fifth column (Orig Score) lists the linkage score for the position specified in column 3 for the set of all families. If the maximum scores were determined by maximizing parameters, the remaining columns (Subset Parameters) will list the parameter names and values that yielded the maximum linkage score for the position specified in column 3 for the set of all families.

Download

The executable files for COV and FLOSS can be run using a java 1.4 (or later) interpreter with the "-jar" flag. See Creating FLOSS input files and Running the FLOSS program for details.

References

Abecasis GR, Cherny, SS, Cookson, WO, Cardon, LR (2002) MERLIN--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30:97-101.
Browning, BL. FLOSS: Flexible ordered subsets analysis for linkage analysis of complex traits. Bioinformatics. Submitted.
Hauser ER, Watanabe RM, Duren WL, Bass MP, Langefeld CD, Boehnke M (2004) Ordered subset analysis in genetic linkage mapping of complex traits. Genet Epi 27:53-63.
Kong A, Cox NJ (1997) Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet 61:1179-1188.
Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 53:1347-1363.