Before you are able to download your personal set of sequences, you have to fill the List by selecting sequences or groups in the browser or using the search functionality of SILVA.
To select sequences in the Browser:
, to remove a complete group (subnode) click on


To select sequences using the extended Search functionalities:
With the extended
Search functionalities you are able to perform complex queries by adding constraints to your query (AND) or to combine results from several queries (OR).
Example: You would like to get all Gammaproteobacteria with a minimal length of 1400 bases and an alignment quality better than 90. The respective query is this: Search for Gammaproteobacteria in taxonomy; >1400 in sequence length; >90 in alignment quality; Match AND.
will be shown) or add all sequences to the List by clicking on "Add complete result to List"Tricks: Using < and > allows to search for sequences obove or below a certain value (length or quality). To get all sequences from a specific publication you can use the DOI or Pubmed ID in the field "publication"; try 9572969 as an example. Remember: Complex queries might take some time - please be patient.
or "generate download"To download sequences click on Download. After the file generation process is finished the file can be retrieved by clicking on "Click here to download". The files will be available for download for up to 24h.
The colored bars on the search page and in the short and detailed sequence views of the browser give a fast overview of the different quality aspects assigned to every sequence. The length of the bars is a graphical representation of the respective quality value.
The colors classify the information into four categories: A green bar (
) represents a value equal to or greater than 75. Yellow bars (
) stand for values equal to or greater than 50 but less than 75. Values less than 50 are expressed by an orange bar (
). Red bars (
) are only used for scores of 0. Since “problematic” sequences, sequences of inadequate quality, as well as insufficiently aligned sequences were discarded from the databases only the Pintail scores can have 0.
The sequence quality score is a combination of the percentages of ambiguities, homopolymers longer 4 bases and possible vector contaminations. The overall score was normalized to fit into our unified scoring system ranging between 0 and 100 such as 100 is the best. The alignment quality is currently represented by the identity of a certain sequence, normalized between 0 and 100, to its next relatives in the SEED. The color of the Pintail bar represents the probability that the rRNA sequence contains anomalies or is a chimera, where 100 means that the probability for beeing anomalous or chimeric is low. If you like to know more about Pintail please have a look at the
Pintail website.
rRNA Sequences with less than 300 nucleotides and more than 2% of ambiguities and homopolymers or more than 5% of vector contamination have been rejected by the initial quality check procedure.
To get all type strains search for [T] in the strain field of the search page. To get all cultivated strains (with type strains) search for s[C] or s[T] in the strain field. Searching for e[G] provides all ribosomal RNAs from genome sequences. More information can be found in the corresponding release background section.
The reason is that on the webpage the number of EMBL sequence entries (accession numbers) is shown. Since some sequence entries can have more than one "rRNA region" - just think about genome sequences with multiple rRNA operons - the real number of "rRNA regions" is much higher. In the export you will get all the rRNA sequences from any sequence entry.
Yes, for sure!
Every sequence in the SILVA databases carries the EBI-EMBL taxonomy assignment. Where available, the greengenes and RDP taxonomies are added for comparison. The EMBL taxonomy is retrieved simultaneously with the sequences, whereas the other taxonomies are assigned to the sequences based on accession numbers. For LSU rRNA sequences no additional up to date taxonomies are available.
For the SSU and LSU Ref(erence) databases guide trees are reconstructed. The trees are incrementally built using the ARB parsimony tool with filters to remove highly variable positions. Based on the guide trees, all phylogenetic assignments are manually curated, taking into account taxonomic information provided by Bergey’s Taxonomic Outline of the Prokaryotes (Garrity et al. 2004), the taxonomic outlines for Volumes 3, 4 and 5 of Bergey's Manual and the
List of Prokaryotic names with Standing in Nomenclature (Euzeby 1997) to supplement the Bergey’s taxonomic outlines with the latest information of validly described bacterial and archaeal taxa. Currently, all sequences in the SILVA Ref datasets are associated with a full taxonomic path.
Furthermore, extensive effort is spent to represent prominent uncultured, and not-validly published environmental clades, groups, and taxa, respectively. The majority of these clades and groups are annotated in the guide tree for the SSU Ref dataset based on literature surveys and personal communications. Taxonomic groups consisting only of sequences from uncultured organisms are named after the clone sequence submitted earliest. Due to this exhaustive manual approach SILVA currently contains the most up to date and detailed bacterial and archaeal taxonomic classification.
Scanning for unknown fields is necessary when you open your custom ARB database for the first time. The reason is that the SILVA database contains much more information assigned to each sequence than the original ARB databases. Please do the following steps:
Show differences: The combination of Search species that don't match the query with no search string in the search field name shows all the sequences in the Hitlist which are different between DB I and DB II.
Preserve Alignment: No. Tick this box only in case the sequences in the two databases have different alignments and ARB should try to adjust the alignments according to a reference species which must be part of both databases while transferring the sequences.
Quick phylogenetic classification of your sequences can be obtained with the "add nearest neighbor to output" function of the
SINA Webaligner. The nearest neighbors are provided based on the SSURef or LSURef datasets which contain all nearly full length SSU or LSU sequences, respectively.
If you have further questions related to ARB itself, have a closer look at our
ARB Support section.
Here, the following issues are addressed: