Intro

Phase 1 Overview

Introduction

The National Institute of Environmental Health Sciences (NIEHS) entered into contract with Perlegen Sciences, Inc. to conduct a study of genome-wide DNA variation in 15 commonly used, genetically diverse strains of inbred laboratory mice. The study identified single nucleotide polymorphisms (SNPs) and other genetic differences between these strains and a reference strain of known sequence (C57BL/6J). Perlegen scientists used the same high-density oligonucleotide array technology they previously used to discover DNA variation in over 20 human genomes [1]. See the Array Technology page for a detailed description of this technology. All data generated from this project are publicly available through this website and the National Center for Biotechnology Information (NCBI) website.

Specifically, the aims of Phase 1 of the project were as follows:

  1. Resequence the nuclear DNA genomes of 15 inbred laboratory mouse strains, using the publicly-available sequence of strain C57BL/6J as a standard, and organize the sequence data by chromosome and chromosomal location.

  2. Use the resequencing data to identify genetic variations between the strains.

  3. Make the sequence and variation data publicly available.

  4. Make the long-range PCR primers and PCR conditions, developed for amplification of the mouse genome, publicly available.

  5. Construct a website providing easy access to all the resources generated by the project, including data in the form of downloadable files, a Mouse Genome Browser, and descriptions of the methods used.

The following resources were generated by the NIEHS and Perlegen collaborative project:
  1. In NCBI databases:

    8,322,543 SNPs have been deposited in dbSNP and submitted under the handle "PERLEGEN". SNP details can be retrieved using Entrez SNP. Note that incorporation of the SNP details into dbSNP is dependent on the schedule of new dbSNP build releases and may be separated by weeks or months from the time of submission. Trace files of all sequence generated have been deposited at the NCBI Trace Archive. See the Trace File Submissions page for an explanation of how these trace files are generated and how they compare to trace files generated from dideoxy sequencing data.

    A summary of all the data released to the public databases to date is presented in the strain and chromosome reports.

  2. On this website

    The SNPs and genotypes, as well as the positions and sequences of the 240,814 long-range PCR primer pairs used for amplification of the genomic DNA, are available for download on the Download Data page of this website. Here the SNPs and primers are mapped to the NCBI Mouse Build 37 and are available for download by chromosome. Questions about the format and interpretation of Perlegen's array-based data are answered on the FAQ page.

    A Mouse Genome Browser is also provided. The browser lets users visualize the SNPs and LR-PCR primer pairs by chromosome, chromosome regions, gene name, transcript accession number, or SNP identifier. The browser can also be used to access the SNP genotypes for the 15 strains, the trace files at the NCBI Trace Archive, and the positions and sequences of the LR-PCR primer pairs.


References

[1] Patil, N, et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719-23 (2001).

[2] Frazer, K.A. et al. Nature. 2007 A sequence-based variation map of 8.27 million SNPs in inbred mouse strains. Nature 448, 1050-3 (2007).










Quick Links