de.NBI Logo

Release information: SILVA 94

Version 94 of the SSU and LSU databases as released 14.04.2008

 

SSU LSU
Parc 606,879 (+40,832) 108,803 (+6782)
Ref 237,960 (+12,993) 10,647 (+519)

 

Former statistics:

Opens internal link in current windowSILVA 89, Opens internal link in current windowSILVA 90, Opens internal link in current windowSILVA 91, Opens internal link in current windowSILVA 92, Opens internal link in current windowSILVA 93

Small Subunit rRNA Database

SSU Parc (Web database & ARB file) contains all aligned sequences with an alignment quality value and a basepair score equal and above 30. All sequences with a Pintail value < 50 or an alignment quality value < 75 have been assigned to color group 1 in ARB (red). All Living Tree Project typestrains have been assigned to color group 2 in ARB (light blue). No further sequence curation has been applied.

To create SSU Ref (ARB file), additionally all sequences below 1,200 bases for Bacteria and Eukarya and below 900 bases for Archaea or an alignment quality value below 50 have been removed from SSUParc. A guide tree was calculated by adding all sequences to the tree_1200 of SILVA release 91 which is based on tree_1000 from the ssujan04 release. For tree calculation, highly variable positions were removed for Bacteria, Archaea, and Eukarya with the respective position variability filters. Phyla and most of the classes for Bacteria and Archaea have been organized according to the Bergey's taxonomic outline. After manually inspection of the tree, around 190 sequences have been removed due to long branches. Position variability filters for Bacteria, Archaea and Eukarya have been calculated and added to the dataset. Please take into account that also sequences below an alignment quality value of 75 need further attention. All sequences with a Pintail value < 50 or an alignment quality value < 75 have been assigned to color group 1 in ARB (red). All Living Tree Project typestrains have been assigned to color group 2 in ARB (light blue). Before using the alignment for extensive phylogenetic reconstructions all sequences should be checked carefully.

Large Subunit rRNA Databases

LSU Parc (Web database & ARB file) contains all aligned sequences with an alignment quality value and a basepair score equal and above 30. No further curation has been applied, only a guide tree has been added by the most parsimonious addition of around 10,000 sequences to the LSUParc guide tree from SILVA 92.

Additionally, for LSU Ref (ARB file) all sequences below 1,900 bases have been removed, a guide tree was calculated based on the tree_1900 of SILVA release 92, and basic filters have been added. All sequences with an alignment quality value < 75 have been assigned to color group 1 in ARB (red). Please take into account that the SEED consisted only of around 2,800 sequences and there is no guaranty that well aligned close relatives have always been available. We would recommend additional manual curation before using it for extensive phylogenetic reconstructions.

Quality Values

The length and colours of the bars give a first indication on the sequence and alignment quality as well as the risk for sequence anomalies based on Pintail analysis. After downloading the sequences as an ARB file, sequences that need attention can be selected by searching for low quality (alignment, sequence) or Pintail values in the corresponding ARB database fields. A full description of the colour code and all database fields available in the ARB files can be found in the Opens internal link in current windowFAQ section. Taking into account the rich set of sequence associated information that comes along with every SILVA sequence, user designed subdatabases can be easily generated.

Alternative Names

All names of validly described species in the SSU and LSU databases have been checked for changes (basonyms, synonyms and orthographical corrections) against the DSMZ "Nomenclature up to date" catalogue (Opens external link in new windowhttp://www.dsmz.de/download/bactnom/names.txt) released in March 2008.

Alternative Taxonomies and Type Strain & Genome Information

Alternative Taxonomies

Besides the EMBL Taxonomy, alternative classifications taken from the greengenes and the RDP II project are also available in SILVA. On the webpage, the user can switch using the Taxonomy menu. In ARB, the different taxonomies can be found in the fields: embl_tax, gg_tax and rdp_tax for EMBL, greengenes and RDP II, respectively. The corresponding *_name fields shows the respective sequence name for each entry.  Please take into account that both greengenes and RDP II provide only a subset of the sequences hosted by SILVA. If no taxonomic mapping to greengenes or RDP II was available they are assigned as "unclassified" and the respective sequence name equals EMBL. For the LSU datasets, there are no alternative taxonomies available.

Type strain and cultured information has been added to the field strain and is indicated by [T] and [C]. Several sources have been used to compile the information: The Opens external link in new windowStraininfo.net bioportal, The Ribosomal Database Project II (9.58) and the Living Tree Project which provides manually curated information compliant with Euzebys "Opens external link in new windowList of Prokaryotic names with Standing in Nomenclature".

Genome information is provided by the "Opens external link in new windowGenomic Standards Consortium" in cooperation with Peter Sterk from Opens external link in new windowEMBL.

Detailed information about the corresponding identifiers and target databases can be found in the table to the right.

The identifiers can be used for data retrieval by searching in the strain field see Opens internal link in current windowFAQ.

RNAmmer

RNAmmer is a computational predictor for the major rRNA species (SSU, LSU) from all three domains of life. The program uses hidden Markov models trained on data from the European ribosomal RNA database project. SILVA runs RNAmmer on all whole (meta)genome shotgun data of the EMBL archive to complement the existing predictions. All predictions are marked with RNAmmer in the ann_src_field. More information about RNAmmer can be found in the Opens external link in new windowpaper.

Thanks to Felix Schlesinger for RNAmmer extensions and adaptations.

SEED

All rRNA sequences have been aligned based on a completely manually re-checked SEED alignment of 51,601 rRNA sequences for SSU and 2,868 rRNA sequences for LSU. The SSU alignment is based on the official ssu_jan04 release of the ARB Project. The SSU SEED alignment has been considerably improved for Archaea by manual addition of more than 1,000 sequences by Katrin Knittel. All SSU Eukaryotic sequences (18S) have been cross-checked by Wolfgang Ludwig before their addition to the SEED. Most of the bacterial sequences have also undergone a curation process carried out by the SILVA Team. We would rate our SSU SEED alignment for all Bacteria and Archaea as good and for Eukarya as reasonable.

The LSU alignment was provided by Wolfgang Ludwig and has not been released before. It was cross-checked by the SILVA Team before using it as the SEED for automatic alignment.  Bacteria and Archaea could be rated as good. The Eukaryotes need definitely further attention.

Update Files

Update files are not longer provided. Because of the constant improvement we do on the SILVA pipeline we recommend to always take the latest version of SILVA and update it with your personal sequences. The difference between SILVA and your own database can be easily determined using the Opens internal link in current windowARB Merge Tool.

Statistics

Sequence Retrieval and Processing

SSU LSU
candidates (total) 1,257554 477,411
RNAmmer61693864
< 300 bases 504,838 326,485
> 2% ambiguities 9623 2563
> 2% homopolymers 22,637 5735
> 5% vector contamination 10,041 7616
overlap RNAmmer & EMBL467361
rejected by SINA 93,761 21,423

alignment quality and bp

score < 30

8754

4207

Sequences have been retrieved from EMBL Opens external link in new windowRelease 94 (March 08) using a complex keyword search procedure and sequence based search with Opens external link in new windowRNAmmer for all whole (meta)genome shotgun (wgs) sequences. Cross checks with RDP II indicated no loss of primary data. Most of the sequences rejected by the new SINAligner were classified as not ribosomal RNA sequences by manual inspection or the remaining aligned sequence fragments were below 300 bases.

1. Growth of the ribosomal RNA databases since 1992

 Initiates file download
Blue: RDP II, orange: SILVA SSUParc based on the EMBL release 94

2. Length Distribution (SSU & LSU)

Initiates file download

Red: raw data, black: the quality checked & aligned SSUParc sequences

Initiates file download

Red: raw data, black: the quality checked & aligned LSUParc sequences

3. Sequence quality in relation to length in SSUParc 90

Initiates file download

Basic statistics for the SILVA databases, release 94

    SSUParc   SSURef   LSUParc   LSURef  
Version   94   94   94   94  
Total   606,879   237,960   108,803   10,647  
Bacteria   483,112   190,128   10,117   6069  
Archaea   28,700   9519   187   176  
Eukarya   89,355   38,313   98,499   4402  
Cultured #   20,706   16,778   8139   1177  
Typestrains #   9759   9698   319   309  
                   
# according to   straininfo.net   and the   Living Tree   Project  

Strain Identifiers

Source Information Tag Datasets
EMBL Typestrains (t) SSU, LSU
Straininfo.net Cultured s[C] SSU, LSU
Straininfo.net Typestrains s[T] SSU, LSU
Genomic Standards      
Consortium/EMBL Genomes (curated) e[G]  SSU, LSU
Living Tree Project  Typestrains (curated)  l[T] SSU
RDP II Typestrains r[T] SSU

New in Release 94

  • improved RNAmmer system
  • new Vector database to determine vector contaminations
  • the Living "All Species" Tree has been Opens internal link in current windowlaunched
  • a Opens internal link in current windowSearch Tutorial has been added
  • the Opens internal link in current windowFile Archive has been added

Known Bugs

  • For about 31 sequences the Pintail value is missing

Future Developments

Similarity based search functionalities are planned for mid of 2008