Intro

Mouse SNP and Genotype Data Download

Overview

Perlegen's SNP, genotype (empirical and imputed), haplotype, trace, and PCR primer data has been compiled with NCBI Mouse Build information to produce data files for public use. This data, grouped by chromosome, is available here as flat files for download. SNP and genotype positions have been mapped from their original reference coordinates to NCBI Mouse Build 37 coordinates (see Data Release History).

Note that C57BL6/J strain was not selected for re-sequencing as this data would have been almost entirely redundant with the NCBI reference sequence. Since we did not actually determine genotypes for C57BL6/J, we did not submit genotypes for this strain to dbSNP. However, implicit genotypes for C57BL6/J can be obtained from the reference sequence at each SNP position (the reference allele is the first allele in the ALLELES column).

The data is available for download in two different compressed file formats. The files are saved as both PC ".zip" files and Unix compressed ".gz" files. Although tools to uncompress both formats are available on many platforms (PC, Mac, Unix-like, etc.), typically PC users use the .zip and users of Unix-like operating systems use the .gz file format.

Once the files have been downloaded they are quite large: there are typically over 100,000 rows of data per chromosome. This is too large for desktop applications such as Excel or Notepad, and is probably better loaded into a database application such as FileMaker, Oracle, MySQL, or Microsoft Access. For specific instructions about how to load data into a database, please consult the application's documentation. The files are stored as plain text. Each column is delimited by a tab character, each row is delimited by a newline character, and the first row contains the column names.

Genotype & SNP Data

To download genotype, imputed genotype, or SNP data, right-click on each appropriate link and choose "Save As". Flanking sequence of 200 base pairs (100 base pair on 5' and 3' of each SNP) is available inside the SNP download file. Depending on your connection speed some downloads may take minutes or hours to complete. For example, with a 56K modem it may take as much as 8 hours to download the entire data set. The "Genotype Data" files contain empirically-derived genotypes for the 15 original strains used for SNP-discovery (Phase 1 of the project), while the "Imputed Genotype Data" files contain imputed genotypes for an additional 40 strains (Phase 2 of the project).


ChromosomeNumber of SNPsGenotype Data (Description)File Size Imputed Genotype Data (Description)File SizeSnp Data (Description)File Size
Chr01725002(Save as .gz) (Save as .zip)11MB (Save as .gz) (Save as .zip)9MB (Save as .gz) (Save as .zip)38MB
Chr02537210(Save as .gz) (Save as .zip)8MB (Save as .gz) (Save as .zip)7MB (Save as .gz) (Save as .zip)29MB
Chr03528538(Save as .gz) (Save as .zip)8MB (Save as .gz) (Save as .zip)7MB (Save as .gz) (Save as .zip)28MB
Chr04498014(Save as .gz) (Save as .zip)7MB (Save as .gz) (Save as .zip)6MB (Save as .gz) (Save as .zip)27MB
Chr05517475(Save as .gz) (Save as .zip)8MB (Save as .gz) (Save as .zip)7MB (Save as .gz) (Save as .zip)28MB
Chr06543829(Save as .gz) (Save as .zip)8MB (Save as .gz) (Save as .zip)7MB (Save as .gz) (Save as .zip)29MB
Chr07446851(Save as .gz) (Save as .zip)7MB (Save as .gz) (Save as .zip)6MB (Save as .gz) (Save as .zip)24MB
Chr08459246(Save as .gz) (Save as .zip)7MB (Save as .gz) (Save as .zip)6MB (Save as .gz) (Save as .zip)24MB
Chr09372032(Save as .gz) (Save as .zip)5MB (Save as .gz) (Save as .zip)5MB (Save as .gz) (Save as .zip)20MB
Chr10414362(Save as .gz) (Save as .zip)6MB (Save as .gz) (Save as .zip)5MB (Save as .gz) (Save as .zip)22MB
Chr11263444(Save as .gz) (Save as .zip)4MB (Save as .gz) (Save as .zip)4MB (Save as .gz) (Save as .zip)15MB
Chr12416902(Save as .gz) (Save as .zip)6MB (Save as .gz) (Save as .zip)5MB (Save as .gz) (Save as .zip)22MB
Chr13421364(Save as .gz) (Save as .zip)6MB (Save as .gz) (Save as .zip)5MB (Save as .gz) (Save as .zip)22MB
Chr14364963(Save as .gz) (Save as .zip)5MB (Save as .gz) (Save as .zip)5MB (Save as .gz) (Save as .zip)20MB
Chr15346575(Save as .gz) (Save as .zip)5MB (Save as .gz) (Save as .zip)5MB (Save as .gz) (Save as .zip)18MB
Chr16314057(Save as .gz) (Save as .zip)4MB (Save as .gz) (Save as .zip)4MB (Save as .gz) (Save as .zip)17MB
Chr17288229(Save as .gz) (Save as .zip)4MB (Save as .gz) (Save as .zip)4MB (Save as .gz) (Save as .zip)15MB
Chr18298171(Save as .gz) (Save as .zip)4MB (Save as .gz) (Save as .zip)4MB (Save as .gz) (Save as .zip)16MB
Chr19232933(Save as .gz) (Save as .zip)3MB (Save as .gz) (Save as .zip)3MB (Save as .gz) (Save as .zip)12MB
ChrM201(Save as .gz) (Save as .zip)2KB    (Save as .gz) (Save as .zip)8KB
ChrX246933(Save as .gz) (Save as .zip)3MB (Save as .gz) (Save as .zip)3MB (Save as .gz) (Save as .zip)15MB
ChrY2433(Save as .gz) (Save as .zip)38KB    (Save as .gz) (Save as .zip)158KB
ChrUn610(Save as .gz) (Save as .zip)10KB    (Save as .gz) (Save as .zip)31KB
unmapped83169(Save as .gz) (Save as .zip)1MB    (Save as .gz) (Save as .zip)4MB

Primer Data

To download primer data, right-click on the appropriate link and choose “Save As”. Unmapped primers refer to the 3776 primers that do not map to NCBI genome Build 37.


TypePrimer Data(Description)File Size
Mapped(Save as .gz) (Save as .zip)8MB
Unmapped(Save as .gz) (Save as .zip)85KB


Trace Data

To download sequence trace mappings, right-click on each appropriate link and choose "Save As".


ChromosomeNumber of TracesData (Description)File Size
Chr012223240(Save as .gz) (Save as .zip)16MB
Chr022097923(Save as .gz) (Save as .zip)14MB
Chr031718970(Save as .gz) (Save as .zip)12MB
Chr041749501(Save as .gz) (Save as .zip)12MB
Chr051834920(Save as .gz) (Save as .zip)13MB
Chr061706025(Save as .gz) (Save as .zip)12MB
Chr071617414(Save as .gz) (Save as .zip)11MB
Chr081501170(Save as .gz) (Save as .zip)10MB
Chr091526805(Save as .gz) (Save as .zip)11MB
Chr101507365(Save as .gz) (Save as .zip)10MB
Chr111474191(Save as .gz) (Save as .zip)10MB
Chr121337265(Save as .gz) (Save as .zip)9MB
Chr131325340(Save as .gz) (Save as .zip)9MB
Chr141322745(Save as .gz) (Save as .zip)9MB
Chr151161555(Save as .gz) (Save as .zip)8MB
Chr161108305(Save as .gz) (Save as .zip)7MB
Chr171043280(Save as .gz) (Save as .zip)7MB
Chr181007655(Save as .gz) (Save as .zip)7MB
Chr19711570(Save as .gz) (Save as .zip)5MB
ChrM15(Save as .gz) (Save as .zip)0KB
ChrX1537860(Save as .gz) (Save as .zip)10MB
ChrY37845(Save as .gz) (Save as .zip)316KB
ChrUn2985(Save as .gz) (Save as .zip)22KB

Haplotype Block Data

To download the chromosomal locations of each haplotype block, right-click on the appropriate link and choose "Save As".
Haplotype BlockData (Description)File Size
All Haplotype Blocks(Save as .gz) (Save as .zip)316KB
Haplotype Blocks 5KB(Save as .gz) (Save as .zip)22KB

Data File Descriptions

Genotype Data File Description

The files b04_ChrXX_genotype.dat contain the diploid genotypes for SNPs for each of the individual 15 strains. Each file represents the genotypes of each strain by chromosome. [ Get data ]

Column NameDescription
local_identifierPerlegen internal SNP identifier. Matches the submitter_ID in dbSNP.
SS_IDNCBI Assay ID (ss#). The ID assigned to the SNP by dbSNP at submission time.
chromosomeChromosome of the NCBI Build 37 contig on which the best alignment was found. The "ChrM" files contain data from the mitochondrion, and the "ChrUn" files contain data mapped to NCBI contigs that are not assigned to a chromosome. Data that could not be mapped to any NCBI Build 37 contig can be found in the files labeled "unmapped."
accession_num The accession number from NCBI Build 37 of the contig to which the SNP aligns.
positionThe nucleotide position in NCBI Build 37 contig of the reference base in the alignment.
strandThe orientation of the reported SNP flanking sequences, alleles, and genotypes against the NCBI Build 37 sequence.
alleles The nucleotide code for the alleles of this SNP. The first allele is the reference allele of the C57BL6/J strain and the second allele is the alternate allele discovered. For example, G/A.
129S1/SvImJ
CAST/EiJ
BTBR T+ tf/J
A/J
MOLF/EiJ
KK/HlJ
AKR/J
PWD/PhJ
NZW/LacJ
BALB/cByJ
WSB/EiJ
C3H/HeJ
DBA/2J
FVB/NJ
NOD/LtJ
The nucleotide code for the two alleles found at this position for each strain. Nucleotide codes can be A, G, T, C, N for unknown, and "-" for strains for which genotypes were not attempted. Expect to see "AA","GG","TT","CC","NN", or "--" in each column

SNP Data File Description

The files b04_ChrXX_snp.dat have the following information for SNPs that were identified as being polymorphic in the 15 strains genotyped. Each file represents the SNPs discovered from all strains by chromosome. [ Get data ]

Column NameDescription
local_identifierPerlegen internal SNP identifier. Matches the submitter_ID in dbSNP.
SS_IDNCBI Assay ID (ss#). The ID assigned to the SNP by dbSNP at submission time.
chromosomeChromosome of the NCBI Build 37 contig on which the best alignment was found. The "ChrM" files contain data from the mitochondrion, and the "ChrUn" files contain data mapped to NCBI contigs that are not assigned to a chromosome. Data that could not be mapped to any NCBI Build 37 contig can be found in the files labeled "unmapped."
accession_num The accession number from NCBI Build 37 of the contig to which the SNP aligns.
positionThe nucleotide position in NCBI Build 37 contig of the reference base in the alignment.
strandThe orientation of the reported SNP flanking sequences, alleles, and genotypes against the NCBI Build 37 sequence.
alleles The nucleotide code for the alleles of this SNP. The first allele is the reference allele of the C57BL6/J strain and the second allele is the alternate allele discovered. For example, G/A.
five_prime_flankThe 100 base pairs from the original reference sequence that flank the SNP on the 5' end.
three_prime_flankThe 100 base pairs from the original reference sequence that flank the SNP on the 3' end.

Primer Data File Description

The b04_primer_pair.dat file has the following information for primer pairs. [ Get data ]

Column NameDescription
primer_pair_idPerlegen internal primer identifier (PSMP = Perlegen Sciences Mouse Primer).
chromosomeChromosome of the NCBI Build 37 contig on which the best alignment was found. Data that could not be mapped to any NCBI Build 37 contig can be found in the file labeled "b04_primer_pairs_unmapped."
accession_numThe accession number from NCBI Build 37 of the contig to which the primers align.
amplicon_startThe nucleotide position in NCBI Build 37 contig of the start of the primer pair in the alignment.
amplicon_endThe nucleotide position in NCBI Build 37 contig of the end of the primer pair in the alignment.
forward_sequenceThe forward primer sequence.
reverse_sequenceThe reverse primer sequence.
strandThe orientation of the primer pair against the NCBI Build 37 sequence.
working_statusOne if the primer pair amplified successfully, zero if it failed.

Trace Mapping File Description

The b04_ChrXX_frag.dat file has the following information for each fragment. [ Get data ]

Column NameDescription
frag_idPerlegen internal fragment identifier.
chromosomeChromosome of the NCBI Build 37 contig on which the best alignment was found.
accession_numThe accession number from NCBI Build 37 of the contig to which the trace align.
min_posThe nucleotide position in NCBI Build 37 contig of the start of the trace in the alignment.
max_posThe nucleotide position in NCBI Build 37 contig of the end of the trace in the alignment.
strandThe orientation of the trace against the NCBI Build 37 sequence.
sample_nameThe mouse strain used to generate the trace.
a_strandThe name of the forward sequence trace.
z_strandThe name of the reverse sequence trace.



Haplotype Block File Description

The block_summary.dat file contains the following columns. [ Get data ]

Column NameDescription
CHROMOSOMEThe chromosome that contains the haplotype block
BLOCK_START_BPThe absolute position of the start of the haplotype block on the chromosome.
BLOCK_END_BPThe absolute position of the end of the haplotyep block on the chromosome.
BLOCK_LENGTH_BPThe length of the haplotype block, in bases.
NUM_SNPS_IN_BLOCKThe number of SNPs contained within the haplotype block.







Quick Links