de.NBI Logo

Release Information

Version 89 of the SSU and LSU databases was released on 06. and 11. February 2007

SSURef was updated on 25.02.2007

LSUParc was updated on 26.02.2007

 

Content of version 89:

 


SSULSU
Ref137,7885,660
Parc353,36646,979

 

Sequence Retrieval and Initial Quality Check: 490,549 SSU and 116,307 LSU sequences have been retrieved from EMBL Opens external link in new windowRelease 89 (December 2006) using a complex keyword search procedure. Cross checks with RDP II did not indicate a significant loss of primary data. The subsequent initial quality check removed 72,449 short rRNA sequences (below 300 bases), 6268, 14,119 and 8365 rRNA sequences due to extended amounts of ambiguities (>2%), homopolymers (>2%) and vector contamination (>5%) for the SSU db and 53,834 short rRNA sequences (below 300 bases), 1114, 1274 and 1298 rRNA sequences due to extended amounts of ambiguities (>2%), homopolymers  (>2%) and vector contamination (>5%) for the LSU db. In the alignment process 33,557 and 4,134 rRNA sequences where rejected due to a lack of relatives for SSU and LSU, respectively. Most of these sequences were classified as non-ribosomal RNA sequences by manual inspection.

 

SEED: All remaining rRNA sequences have been aligned based on a completely manually re-checked SEED alignment of 49,697 rRNA sequences for SSU and 2,868 rRNA sequences for LSU. The SSU alignment is based on the official ssu_jan04 release of the ARB Project. The SSU SEED alignment has been considerably improved for Archaea by manual addition of more than 1,000 sequences by Katrin Knittel. All SSU Eukaryotic sequences (18S) have been cross-checked by Wolfgang Ludwig before their addition to the SEED. Most of the bacterial sequences have also undergone a curation process carried out by the SILVA Team. We would rate our SSU SEED alignment for all Bacteria and Archaea as good and for Eukarya as reasonable.

The LSU alignment was provided by Wolfgang Ludwig and has not been released before. It was cross-checked by the SILVA Team before using it as the SEED for automatic alignment.  Bacteria and Archaea could be rated as good. The Eukaryotes need definitely further attention.

 

Databases: SSU: Sequences with an alignment quality value below 30 have been removed from the SSUParc database. For SSURef all sequences below 1,200 bases for Bacteria and Eukarya and below 900 bases for Archaea or an alignment quality value below 50 have been removed. A guide tree was calculated by adding all sequences to the tree_1000 from the ssujan04 release. For tree calculation highly variable positions were removed for Bacteria, Archaea and Eukarya with the respective position variability filters. Phyla and most of the classes for Bacteria and Archaea have been organized according to the Bergey's taxonomic outline. Around 400 sequences have been removed after manually inspecting the tree for long branches. Position variability filters for Bacteria, Archaea and Eukarya have been calculated and added to the dataset. Please take into account that also sequences below an alignment quality value of 70 might need further attention. All sequences with a Pintail value < 50 or an Alignment quality value < 70 have been assigned to color group 1 in ARB. Before using the alignment for extensive phylogenetic reconstructions all sequences should be checked carefully. 

LSU: Please take into account that the SEED consisted only of around 2,800 sequences and there is no guaranty that well aligned close relatives have always been available. We would recommend additional manual curation before using it for extensive phylogenetic reconstructions. For the LSUParc all sequences with a quality value below 30 (7,352 sequences) had been removed. Additionally, in LSURef all sequences below 1,900 bases have been removed, a guide tree was calculated for both dbs and basic filters have been added.

 

Quality values: The flashlight system gives a first indication about the sequence and alignment quality as well as the risk for sequence anomalies based on Pintail analysis. After downloading the sequences as an ARB file, sequences that need attention can be selected by searching for low quality (alignment, sequence) or Pintail values in the corresponding ARB db fields. A full description of all db fields available in the ARB files can be found in the Opens internal link in current windowFAQ section. Taking into account the righ annotation information that comes along with every SILVA sequence user designed ARB databases can be easily generated.

 

Alternative Names: All names of validly described species in the SSU and LSU databases have been checked for changes (basonyms, synonyms and orthographical corrections) against the DSMZ "Nomenclature up to date" catalogue (Opens external link in new windowhttp://www.dsmz.de/download/bactnom/names.txt) released in December 2006.

 

Known bugs:  The select complete results button in the List is not really working. Do not sort the Search List when searching within the LSU db - you will get strange effects. The cutoff head and tail values in the exported ARB Dbs are wrong and have been removed from SSUParc. The term "Sequences" need to be replaced by "accession numbers" in all pop ups since it is misleading. On the Search page you have to reload the page after each request. Reloading can be done by clicking again on Search in the menu bar.

 

Future: Extended search functionalities including a similarity based search are planned for beginning of 2007. The known bugs have to be fixed to improve speed and reliability of the Webpage. The thresholds of the quality value system are still under discussion. The SEED and REF dbs require further extension and curation.

Statistics

1. Growth of the ribosomal RNA Databases since 1992

Blue: RDP II, Orange: SILVA SSUParc based on the EMBL Release 89

 Initiates file download

 

2. Length distribution in the databases

Red: raw data, Black: the quality checked & aligned SSUParc sequences

Initiates file download

Red: raw data, Black: the quality checked & aligned LSUParc sequences

Initiates file download

Basic statistics for the SILVA databases

    SSUParc   SSURef   LSUParc   LSURef  
Version   89   89.1   89.1   89  
Total   353,366   137,788   46,979   5562  
Bacteria   272,450   105,354   5371   3156  
Archaea   17,270   6517   126   120  
Eukarya   61,290   25,926   41,482   2384  
Cultured #   129,257   73,683   44,313   5562  
Uncultured   224,109   65,073   2988   98  
                   
# searched for not   matching   *uncult*   or *unident*   or *clone*  
Contains false positives!