Filters: Unfiltered - all conserved regions (criteria of conservation are described separately for Berkeley, UCSC, and PSU sets) No RefSeq - CNCs overlapped with RefSeq (CDS and UTR) are excluded No RNA - CNCs overlapped with RefSeq and all GenBank RNA (human or not) are excluded No GenScan - CNCs overlapped with RefSeq and GenScan are excluded Filtered - CNCs overlapped with RefSeq, all GenBank RNA, and GenScan are excluded Filtered, No EST - CNCs overlapped with RefSeq, all GenBank RNA, GenScan, and ESTs (including unspliced) are excluded
CNC Definitions:
Berkeley The Berkeley set is based on the whole genome anchored AVID alignments of the Dec 2001 Human and the Arachne Feb 2002 mouse. The set contains all regions that are at least 100bp long at 70% identity. Gaps in the conserved regions are allowed.
Penn State The conserved non-coding sequences in the Penn State collection are based on alignments of Human Dec 2001 and MGSCv3 mouse generated with the blastz program. The set contains gap-free alignments of at least 70% identity and at least 100 bp in length.
UCSC The UCSC conserved non-coding set is based on rescoring blastz alignments of Human Dec 2001 and MGSCv3 with the following matrix, gap open, and gap extension costs, and then selecting maximal scoring contiguous subsets that score at least 4000.| | | A | C | G | T | | 100 | -200 | -150 | -100 | | -200 | 150 | -150 | -100 | | -100 | -150 | 150 | -200 | | -150 | -100 | -200 | 100 | O = 1200, E = 20
|