de.NBI Logo

News

03.09.2019 20:00 Age: 5 yrs

Update on SILVA Release 138

We are sorry to inform you that the SILVA Release 138 is delayed further and we have decided to release the SSU and LSU datasets separately to compensate for the delay. We estimate the SSU datasets to be available in October and the LSU datasets by the end of the year. This news article will give you some background information on the release.

After the release of SILVA 132, we decided to change the approach of how we generate the SILVA Ref NR 99 datasets. Previously, the order of the sequences for clustering was only based on the length of the sequences. For identity-based clustering tools, the order of the sequences is important as the first sequences of a cluster will become its reference. Whereas the length of the sequence is an important quality criterion for phylogenetic reconstruction, it is not the only one. Therefore, we now consider additional sequence quality values for the sorting order as well. Additionally, to keep the SILVA Ref NR 99 more stable in future releases, we put all members of the SILVA Ref NR 99 from the last releases at the top of the order. In this process, we also changed the clustering tool from the original uclust to Opens external link in new windowvsearch. Changing the clustering tool, of course, made the reference sequences for this release more unstable in comparison to previous releases. But it was a necessary step to be able to provide better and more stable results in the future.

When we started to analyze the EMBL/ENA release 138 we were facing a new challenge. For the SSU dataset, more than 2.9 million RNA sequences predicted by RNAmmer were not included in any of the other RNA sequence databases and are, therefore, unique to the SILVA database. For the LSU dataset, the total number was lower at about 350,000 sequences. Proportional, for both datasets, these predicted candidates constitute about 20% of all candidates. In combination with the changed clustering approach and tool, the large number of predicted candidates led to a large number of sequences changing in the Ref NR datasets for both sub-units. Having to add a lot of new sequences to the guide trees made the process more time consuming and also increased the effort for our taxonomic curator(s).

For the SILVA 138 release, Pablo Yarza from Opens external link in new windowRibocon GmbH is for the first time the main curator of the SILVA taxonomy, replacing Pelin Yilmaz who stepped down from her role in the SILVA team to work as a consultant in the industry. Pablo is also a member of the Opens internal link in current windowLTP team and recently started a collaboration with the Opens external link in new windowLPSN.

SILVA 138 is also the first release for which the taxonomies from the Opens external link in new windowGenome Taxonomy Database (GTDB) and Opens external link in new windowUniEuk were used for the classification and taxonomic curation of sequences. These new taxonomic information sources further increased the curational burden and led to two-thirds of the sequence classifications having changed at the higher taxonomic levels. The manual curation is still ongoing and the adoption of the new taxonomic sources will be continued in the next SILVA releases.

Judging from our experience with the SSU Ref NR 99 we are expecting the curation of the LSU Ref NR 99 (which has not started yet) to take some additional months. For this reason, we decided - for the first time in SILVA history - to do a split release and release the SSU datasets before the LSU datasets.

Estimated release dates:

  • SSU datasets: October 2019
  • LSU datasets: December 2019

You can access the preliminary release information Opens internal link in current windowhere.