newNote: July 25, 2024
Description
These tracks contain mappings of single nucleotide variantsand small insertions and deletions (indels)from the European Variation Archive(EVA) for the mouse mm10 genome. The dbSNP database at NCBI no longerhosts non-human variants.
Interpreting and Configuring the Graphical Display
Variants are shown as single tick marks at most zoom levels.When viewing the track at or near base-level resolution, the displayedwidth of the SNP variant corresponds to the width of the variant in thereference sequence. Insertions are indicated by a single tick mark displayedbetween two nucleotides, single nucleotide polymorphisms are displayed as thewidth of a single base, and multiple nucleotide variants are represented by ablock that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode.
Searching, details, and filtering
Navigation to an individual variant can be accomplished by typing or copyingthe variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser.
A click on an item in the graphical display displays a page with data aboutthat variant. Data fields include the Reference and Alternate Alleles, theclass of the variant as reported by EVA, the source of the data, the amino acidchange, if any, and the functional class as determined by UCSC's Variant AnnotationIntegrator.
Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, orby color, which bins the UCSC functional effects into general classes.
Mouse-over
Mousing over an item shows the ucscClass, which is the consequence according to theVariant Annotation Integrator, andthe aaChange when one is available, which is the change in amino acid in HGVS.pterms. Items may have multiple ucscClasses, which will all be shown in the mouse-overin a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsIDseparated by spaces describing all possible AA changes.
Multiple items may appear due to different variant predictions on multiple gene transcripts.For all organisms the gene models used were the NCBI RefSeq curated when available, if not then ensembl genes, or finally UCSC mappings of RefSeq if neither of the previous models was possible.
Track colors
Variants are colored according to the most potentially deleterious functional effect predictionaccording to the Variant Annotation Integrator. Specific bins can be seen in the Methods sectionbelow.
Color | Variant Type |
---|---|
Protein-altering variants and splice site variants | |
Synonymous codon variants | |
Non-coding transcript or Untranslated Region (UTR) variants | |
Intergenic and intronic variants |
Sequence ontology (SO)
Variants are classified by EVA into one of the following sequence ontology terms:
- substitution — A single nucleotide in the reference is replaced by another, alternate allele
- deletion — One or more nucleotides is deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is a deletion of an A maybe be represented as Ref = GA and Alt = G.
- insertion — One or more nucleotides is inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is an insertion of a T maybe be represented as Ref = G and Alt = GT
- delins — Similar to tandemRepeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC.
- multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC.
- sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet.
Methods
Data were downloaded from the European Variation Archive EVAcurrent_ids.vcf.gz files corresponding to the proper assembly.
Chromosome names were converted to UCSC-styleand the variants passed through theVariant Annotation Integrator topredict consequence. For every organism the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous modelswere possible.
Variants were then colored according to their predicted consequence in the following fashion:
- Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation
- Synonymous codon variants- synonymous_variant, stop_retained_variant
- Non-coding transcript or Untranslated Region (UTR) variants- 5_prime_UTR_variant,3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant
- Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant,intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration
Sequence Ontology ("SO:")terms were converted to the variant classes, then the files were converted to BED,and then bigBed format.
No functional annotations were provided by the EVA (e.g., missense, nonsense, etc).These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016).Amino-acid substitutions for missense variants are basedon RefSeq alignments of mRNA transcripts, which do not always match the amino acidspredicted from translating the genomic sequence. Therefore, in some instances, thevariant and the genomic nucleotide and associated amino acid may be reversed.E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro fromthe persepective the genomic sequence. Also, in bosTau9, galGal5, rheMac8, danRer10 and danRer11 the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, see the makedoc correspondingto the version of interest. For example, theEVA Release 6 MakeDoc.
Data Access
Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies,and more information about how to convert SNPs between assemblies can be found on the followingFAQ entry.
The data can be explored interactively with the Table Browser,or the Data Integrator. For automated analysis, the data may bequeried from our REST API. Please refer to ourmailing list archivesfor questions, or our Data Access FAQ for moreinformation.
For automated download and analysis, this annotation is stored in a bigBed file thatcan be downloaded from our download server. Use the corresponding version number for the trackof interest, e.g.
evaSnp6.bb.Individual regions or the whole genome annotation can be obtained using our tool
bigBedToBedwhich can be compiled from the source code or downloaded as a precompiledbinary for your system. Instructions for downloading source code and binaries can be foundhere.The tool can also be used to obtain only features within a given range, e.g.
bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/mm10/bbi/evaSnp6.bb -chrom=chr21 -start=0 -end=100000000 stdoutCredits
This track was produced from the EuropeanVariation Archive release data. Consequences were predicted using UCSC's Variant AnnotationIntegrator and NCBI's RefSeq as well as ensembl gene models.
References
Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF,Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for allspecies. Nucleic Acids Res. 2021 Oct 28:gkab960.doi:10.1093/nar/gkab960.Epub ahead of print. PMID: 34718739. PMID: PMC8728205.
Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS,Haussler D, Kent WJ.UCSC Data Integrator and Variant Annotation Integrator.Bioinformatics. 2016 May 1;32(9):1430-2.PMID: 26740527; PMC:PMC4848401