SILVA FAQs

How to "Select and download sequences"?

Before you are able to download your personal set of sequences, you have to fill the List by selecting sequences or groups in the browser or using the search functionality of SILVA.
To select sequences in the Browser:

  • Click on Browser
  • Navigate to the group (subnode) or sequences you are interested in by clicking on the names
  • Mouse-over shows more information e.g. how much sequences are present in a certain group (subnode)
  • To add a complete group (subnode) to your List click on , to remove a complete group (subnode) click on
  • To add a single sequence to your list works in the same way. Single sequences are indicated by a
  • All selected groups and sequences are shown in the List field below the browser
  • To generate an ARB or FASTA file for downloading click on

To select sequences using the extended Search functionalities:

With the extended Opens internal link in current windowSearch functionalities you are able to perform complex queries by adding constraints  to your query (AND) or to combine results from several queries (OR).

  • Click on Search
  • Type in your keyword(s) or value(s) in the "Search for" field(s) (wildcards will be added automatically).
  • Select Match AND or OR for complex queries
  • Have a look at the Search Tutorial to get an overview about additional search functionalities.

Example: You would like to get all Gammaproteobacteria with a minimal length of 1400 bases and an alignment quality better than 90. The respective query is this:  Search for Gammaproteobacteria in taxonomy; >1400 in sequence length; >90 in alignment quality; Match AND. 

  • Press return or click on Search
  • A list with search results will be shown
  • The list can be sorted by clicking on the blue headers of the respective columns (like Organism name)
  • Select your sequences of interest by ticking the box in front of each sequence ( will be shown) or add all sequences to the List by clicking on "Add complete result to List"

Tricks: Using < and > allows to search for sequences obove or below a certain value (length or quality). To get all sequences from a specific publication you can use the DOI or Pubmed ID in the field "publication"; try 9572969 as an example. Remember: Complex queries might take some time - please be patient.

  • Click on List
  • All selected sequences and groups (subnodes) (from the Browser and from Search) are listed
  • Mouse-over shows more information, e.g. how much sequences are present in a certain group (subnode)
  • To generate an ARB or FASTA file for downloading click onor "generate download"

To download sequences click on Download. After the file generation process is finished the file can be retrieved by clicking on "Click here to download". The files will be available for download for up to 24h.

What do the green, yellow and orange quality bars tell me?

The colored bars on the search page and in the short and detailed sequence views of the browser give a fast overview of the different quality aspects assigned to every sequence. The length of the bars is a graphical representation of the respective quality value.

The colors classify the information into four categories: A green bar () represents a value equal to or greater than 75. Yellow bars () stand for values equal to or greater than 50 but less than 75. Values less than 50 are expressed by an orange bar (). Red bars () are only used for scores of 0. Since “problematic” sequences, sequences of inadequate quality, as well as insufficiently aligned sequences were discarded from the databases only the Pintail scores can have 0.

The sequence quality score is a combination of the percentages of ambiguities, homopolymers longer 4 bases and possible vector contaminations. The overall score was normalized to fit into our unified scoring system ranging between 0 and 100 such as 100 is the best. The alignment quality is currently represented by the identity of a certain sequence, normalized between 0 and 100, to its next relatives in the SEED. The color of the Pintail bar represents the probability that the rRNA sequence contains anomalies or is a chimera, where 100 means that the probability for beeing anomalous or chimeric is low. If you like to know more about Pintail please have a look at the Opens external link in new windowPintail website.

rRNA Sequences with less than 300 nucleotides and more than 2% of ambiguities and homopolymers or more than 5% of vector contamination have been rejected by the initial quality check procedure.

How can I get all type strains? How can I get all cultivated strains?

To get all type strains search for [T] in the strain field of the search page. To get all cultivated strains (with type strains) search for s[C] or s[T] in the strain field. Searching for e[G] provides all ribosomal RNAs from genome sequences. More information can be found in the corresponding release background section.

Why is there a difference between the number of sequences shown in the Popups, the List view or Download page and the number of exported sequences?

The reason is that on the webpage the number of EMBL sequence entries (accession numbers) is shown. Since some sequence entries can have more than one "rRNA region" - just think about genome sequences with multiple rRNA operons  - the real number of "rRNA regions" is much higher. In the export you will get all the rRNA sequences from any sequence entry.

What kind of information is not available for the LSU dataset?

  • Pintail quality - currently Pintail can not be applied to LSU sequences
  • Greengenes and RDP taxonomy

ARB/SILVA FAQs

How can I integrate SILVA in my daily work with ARB? Is there a suggested workflow?

Yes, for sure!

What does the database fields in the ARB databases mean and how are they related to EMBL?

SILVA Taxonomy and Classifications

Every sequence in the SILVA databases carries the EBI-EMBL taxonomy assignment. Where available, the greengenes and RDP taxonomies are added for comparison. The EMBL taxonomy is retrieved simultaneously with the sequences, whereas the other taxonomies are assigned to the sequences based on accession numbers. For LSU rRNA sequences no additional up to date taxonomies are available.

For the SSU and LSU Ref(erence) databases guide trees are reconstructed. The trees are incrementally built using the ARB parsimony tool with filters to remove highly variable positions. Based on the guide trees, all phylogenetic assignments are manually curated, taking into account taxonomic information provided by Bergey’s Taxonomic Outline of the Prokaryotes (Garrity et al. 2004), the taxonomic outlines for Volumes 3, 4 and 5 of Bergey's Manual and the List of Prokaryotic names with Standing in Nomenclature (Euzeby 1997) to supplement the Bergey’s taxonomic outlines with the latest information of validly described bacterial and archaeal taxa. Currently, all sequences in the SILVA Ref datasets are associated with a full taxonomic path.

Furthermore, extensive effort is spent to represent prominent uncultured, and not-validly published environmental clades, groups, and taxa, respectively. The majority of these clades and groups are annotated in the guide tree for the SSU Ref dataset based on literature surveys and personal communications. Taxonomic groups consisting only of sequences from uncultured organisms are named after the clone sequence submitted earliest. Due to this exhaustive manual approach SILVA currently contains the most up to date and detailed bacterial and archaeal taxonomic classification.

How to "Scan for unknown fields" in ARB?

Scanning for unknown fields is necessary when you open your custom ARB database for the first time. The reason is that the SILVA database contains much more information assigned to each sequence than the original ARB databases. Please do the following steps:

  • Start ARB with your database
  • Go to Species -> Search and Query
  • When the Search and Query window pops up click on Search and select one sequence in the Hitlist
  • The Species Information window should pop up. Go to Fields and click on Scan unknown fields. An extended set of Databases fields should now be visible.

How to "Merge two ARB Databases"?

  • Start ARB and click on Merge Two ARB Databases in the ARB Intro window
  • Select Database I (source db) and Database II (target db) in the directories fields
  • Click on Go
  • The ARB Merge window pops up
  • To make sure that the name fields are unique and synchronized in both databases click on Check Names ...
  • When the Synchronize Names window appears, Rename Database I and II!!
  • Click on Transfer Species ...
  • Search for entries in the left database (I) you would like to transfer to the right database (II) using the Query menu
  • All db entries you would like to transfer should now be shown in the Hitlist
  • Click on Transfer Listed Species - Delete Duplicates in DB II
  • Click on Close
  • Click on Save Whole DB II ... or Save Changes of DB II as...
  • To see your new sequences in the guide tree, you have to add them first using the Parsimony 'Quick add' procedure in ARB

Show differences: The combination of Search species that don't match the query with no search string in the search field name shows all the sequences in the Hitlist which are different between DB I and DB II.

Preserve Alignment: No. Tick this box only in case the sequences in the two databases have different alignments and ARB should try to adjust the alignments according to a reference species which must be part of both databases while transferring the sequences.

How to do a "Quick Phylogenetic Classification" of your sequences?

Quick phylogenetic classification of your sequences can be obtained with the "add nearest neighbor to output" function of the SINA Webaligner. The nearest neighbors are provided based on the SSURef or LSURef datasets which contain all nearly full length SSU or LSU sequences, respectively.

  • paste or upload your sequence
  • activate add 'nearest neighbors to output' and select the maximal number of relatives the aligner should report
  • select ARB or FASTA with metadata as output format
  • in the FASTA output look at the 'tax_embl' field of the additional sequences
  • to show the information in ARB, open the ARB output file and activate the 'tax_embl' field in Menu -> Tree -> Select visible info (NDS). Please do not forget to 'scan for unknown fields' first.

How to quickly get an unaliged FASTA sequence for an entry?

  • Browse or Search for your sequence entry of interest
  • If you are in the Browser click on Link to EMBL and copy the Fasta sequence from the EMBL entry
  • If your are on the Search or List page click on to jump to the Browser and then click on Link to EMBL

What else?

If you have further questions related to ARB itself, have a closer look at our Opens internal link in current windowARB Support section.

Here, the following issues are addressed:

  • ARB on Mac OS X
  • ARB FAQs
  • ARB Mailing List
  • ARB Bug Tracker
  • Professional ARB/SILVA support