de.NBI Logo

Release Information

Version 90 of the SSU and LSU databases was released on 21.05.2007

 

Content of version 90 (difference to 89)


SSULSU
Ref175,911 (+38,123)6342   (+682)
Parc422,987 (+69,621)73,855 (+26,876)

 

Click Opens internal link in current windowhere for SILVA 89 statistics.

 
Sequence Retrieval and Initial Quality Check: 847,421 SSU and 402,603 LSU sequences have been retrieved from EMBL Opens external link in new windowRelease 90 (March 2007) using a complex keyword search procedure. Cross checks with RDP II indicated no loss of primary data. The subsequent initial quality check removed  342,167 short rRNA sequences below 300 bases and 7680, 18,318, and 14,312 rRNA sequences due to extended amounts of ambiguities (> 2%), homopolymers (> 2%), and vector contamination (> 5%) for the SSU databases, respectively. For the LSU databases, 297,246 short rRNA sequences below 300 bases and 2064, 4487, and 2575 rRNA sequences due to extended amounts of ambiguities (> 2%), homopolymers  (> 2%), and vector contamination (> 5%) were removed, respectively. In the alignment process, 46,952 and 12,661 rRNA sequences where rejected due to a lack of relatives for SSU and LSU, respectively. Most of these sequences were classified as non-ribosomal RNA sequences by manual inspection.

 
Databases: SSUParc contains all aligned sequences with an alignment quality value and a basepair score above 30. No further curation has been applied.

To create SSURef, aditionally all sequences below 1,200 bases for Bacteria and Eukarya and below 900 bases for Archaea or an alignment quality value below 50 have been removed from SSUParc. A guide tree was calculated by adding all sequences to the tree_1200 of SILVA release 89 which is based on tree_1000 from the ssujan04 release. For tree calculation, highly variable positions were removed for Bacteria, Archaea, and Eukarya with the respective position variability filters. Phyla and most of the classes for Bacteria and Archaea have been organized according to the Bergey's taxonomic outline. After manually inspection of the tree, around 450 sequences have been removed due to long branches. Position variability filters for Bacteria, Archaea and Eukarya have been calculated and added to the dataset. Please take into account that also sequences below an alignment quality value of 70 need further attention. All sequences with a Pintail value < 50 or an alignment quality value < 70 have been assigned to color group 1 in ARB (red). Before using the alignment for extensive phylogenetic reconstructions all sequences should be checked carefully. 

LSUParc contains all aligned sequences with an alignment quality value and a basepair score above 30. No further curation has been applied, only a guide tree has been added by the most parsimonious addition of 67,513 sequences to the LSURef guide tree.

Additionally, for LSURef all sequences below 1,900 bases have been removed, a guide tree was calculated based on the tree_1900 of SILVA release 89, and basic filters have been added. All sequences with an alignment quality value < 70 have been assigned to color group 1 in ARB (red). Please take into account that the SEED consisted only of around 2,800 sequences and there is no guaranty that well aligned close relatives have always been available. We would recommend additional manual curation before using it for extensive phylogenetic reconstructions.

Updates: For all four datasets, ARB change files are available in the Opens internal link in current windowdownload section. These datasets contain only sequences that are either new or the accession number has changed between SILVA release 89 and 90. 

 
Quality values: The flashlight system gives a first indication on the sequence and alignment quality as well as the risk for sequence anomalies based on Pintail analysis. After downloading the sequences as an ARB file, sequences that need attention can be selected by searching for low quality (alignment, sequence) or Pintail values in the corresponding ARB database fields. A full description of all database fields available in the ARB files can be found in the Opens internal link in current windowFAQ section. Taking into account the rich set of sequence associated information that comes along with every SILVA sequence, user designed ARB databases can be easily generated.

 
Alternative Names: All names of validly described species in the SSU and LSU databases have been checked for changes (basonyms, synonyms and orthographical corrections) against the DSMZ "Nomenclature up to date" catalogue (Opens external link in new windowhttp://www.dsmz.de/download/bactnom/names.txt) released in April 2007.

 
Alternative Taxonomies and Type strain information (NEW):

Besides the EMBL Taxonomy, alternative classifications taken from the greengenes and the RDP II project are now available in SILVA. On the webpage, the user can switch using the Taxonomy menu. In ARB, the different taxonomies can be found in the fields: embl_tax, gg_tax and rdp_tax for EMBL, greengenes and RDP II, respectively. The corresponding *_name fields shows the respective sequence name for each entry.  Please take into account that both greengenes and RDP II provide only a subset of the sequences hosted by SILVA. If no taxonomic mapping to greengenes or RDP II was available they are assigned as "unclassified" and the respective sequence name equals EMBL. For the LSU datasets, there are no alternative taxonomies available. Type strain information has been added to the field strain and is indicated by [T]. Mapping was done based on the April 2007 RDP II dataset and is therefore only available for SSU Bacteria.

 
SEED: All rRNA sequences have been aligned based on a completely manually re-checked SEED alignment of 49,697 rRNA sequences for SSU and 2,868 rRNA sequences for LSU. The SSU alignment is based on the official ssu_jan04 release of the ARB Project. The SSU SEED alignment has been considerably improved for Archaea by manual addition of more than 1,000 sequences by Katrin Knittel. All SSU Eukaryotic sequences (18S) have been cross-checked by Wolfgang Ludwig before their addition to the SEED. Most of the bacterial sequences have also undergone a curation process carried out by the SILVA Team. We would rate our SSU SEED alignment for all Bacteria and Archaea as good and for Eukarya as reasonable.

The LSU alignment was provided by Wolfgang Ludwig and has not been released before. It was cross-checked by the SILVA Team before using it as the SEED for automatic alignment.  Bacteria and Archaea could be rated as good. The Eukaryotes need definitely further attention.

 
Known bugs:  The term "sequences" need to be replaced by "accession numbers" in all pop ups since it is misleading. Around 2000 sequences on the webpage and around 13,000 sequences in the SSU ARB databases have no Pintail values due to reverse & complementary issues. Sequence length for genomes might be wrong.

 
Future: Similarity based search and aligner functionalities are planned for mid of 2007. The SEED and Ref databases require further extension and curation.

Statistics

1. Growth of the ribosomal RNA databases since 1992

 Initiates file download

Blue: RDP II, orange: SILVA SSUParc based on the EMBL release 90

2. Length distribution in the SILVA 90 databases

 

Initiates file download

Red: raw data, black: the quality checked & aligned SSUParc sequences

 

Initiates file download

Red: raw data, black: the quality checked & aligned LSUParc sequences

3. Sequence quality in relation to length in SSUParc 90

Initiates file download

Basic statistics for the SILVA databases, release 90

    SSUParc   SSURef   LSUParc   LSURef  
Version   90   90   90   90  
Total   422,987   175,911   73,855   6342  
Bacteria   337,144   140,504   6126   3462  
Archaea   19,390   7520   139   132  
Eukarya   64,139   27,887   67,480   2748  
Cultured #   139,823   79,167   67,657   6269  
Uncultured   283,164   96,744   6198   73  
Type strain   5052   5011   na   na  
                   
# searched for not   matching   *uncult*   or *unident*   or *clone*  
in full name. Contains   false   positives.