In October 2008 the MIMARKS (Minimum Information about an MARKer gene Sequence) working group has been formed within the Genomic Standards Consortium to work on three main issues:
Status of MIMARKS/MIxS:
In November 2010 the first version of the contextual (meta)data standard MIMARKS has been released, after two years of discussion with almost 100 experts.
In December 2010 the manuscript describing MIMARKS (formerly MIENS) has been made available for community voting at
Nature Precedings.
In May 2011 the MIMARKS/MIxS paper has been published in
Nature Biotechnology
Access:
Further information about MIMARKS is available at the
Wiki page of the Genomic Standards Consortium.
MIMARKS is a living standard and changes can be requested on the
MIxS Trac page
You can also subscribe to the
MIMARKS mailing list to actively participate in the discussions.
CDinFusion (Contextual Data and FASTA infusion) is a submission-preparation-tool for the integration of contextual data (CD) with sequence data. The software enriches uploaded multi Fasta files with contextual data in compliance to the Genomic Standards Consortium (GSC) specifications MIGS/MIMS/MIMARKS (MIxS). The generated contextual data enriched files can be used for submission to the databases of the International Nucleotide Sequence Data Consortium (INSDC). The tool aims to offer scientists in all disciplines of life sciences a software to increase the quantity and quality of contextual data in the INSDC databases.
CDinFusion can be accessed at
http://www.megx.net/cdinfusion. Have a look at the
Video Tutorial.
What are contextual data?
Contextual data (also called "metadata") are secondary data (information) attached to primary sequence data. Simply spoken, "data about data".
They describe aspects like:
Why are contextual data of outstanding importance?
Because only these additional data allow to turn primary sequence information into sound biological knowledge. An example:
A 16S rRNA sequence deposited in the public databases annotated as "uncultured bacterium" but without any additional information (contextual data) is of limited use only. In contrast, if it was just supplemented with the sample location (lat, lon, time, depth) it can already be used to: