Organization & Storage of Contextual Data
For maximum usability, we prepared an Excel-based solution to organize and store your contextual data.
+++ Download version 1.03 of the
rRNA Contextual Data Spreadsheet +++
A detailed documentation is included and also an
example file (pre-filled) is available.
The file is under constant development and new fields will be added in the future. Feedback is welcome! Please send an email to
contact (at) arb-silva.de.
Integration of Contextual Data
Primary sequence information and corresponding contextual data are independently recorded and should be merged as early as possible within the process of sequence preparation and analysis. Then, both kinds of data are available as a single file for the whole workflow (click thumbnail on the right for an overview).
This requires extended FASTA files, called "Metadata-FASTA" files.
Full files for ARB/SILVA import contain all information from the rRNA Contextual Data Spreadsheet (
example 1). INSDC compliant files only contain INSDC fields for direct submissions (
example 2).
How to produce such a Metadata-FASTA file from the rRNA Contextual Data Spreadsheet and standard FASTA files?
ARB/SILVA import & export filters
New extended FASTA import and export filters for ARB have been set up.
+++ The respective filters (version 1.03) can be downloaded from the
SILVA Archive +++
For installation in ARB - please have a short look at the README file.
Last update: November 3, 2008
What are contextual data?
Contextual data (also called "metadata") are secondary data (information) attached to primary sequence data. Simply spoken, "data about data".
Why are contextual data of outstanding importance?
Because only these additional data allow to turn primary sequence information into sound biological knowledge. An example:
A 16S rRNA sequence deposited in the public databases annotated as "uncultured bacterium" but without any additional information (contextual data) is of limited use only.
In contrast, if it was just supplemented with the sample location (lat, lon, time, depth) it can already be used to:
Already now, a number of specific fields to store contextual data are offered by the
INSDC databases.
Examples are ...
More information on the INSDC fields currently available and the standards for completing them can be found in the
INSDC Feature Table Document.
However, if you search in e.g. SILVA 96 SSURef (based on the EMBL release 96) for the contextual data available, you will find the following:
Only a very small portion of the total entries (324,342) contains this kind of information.
What is the reason for this unsatisfying situation?
The researchers are simply not submitting their contextual data information together with the primary sequence data to the INSDC databases.
This is mainly caused by missing software solutions for the integration of contextual data and primary sequence information.
To resolve this limitation, we have prepared solutions for
Besides the limitations due to missing solutions for contextual data integration and their local storage and organization, many contextual data fields can not yet be submitted to the INSDC databases.

In October 2008 the MIENS (Minimum Information about an ENvironmental Sequence) working group has been formed within the Genomic Standards Consortium to work on two main issues:
Survey results, the current list of fields and additional information are available at the
Wiki page of the Genomic Standards Consortium.
Your contribution is always welcome, just contact the SILVA team at
contact (at) arb-silva.de.