ARB specific fields and entries

ARB field name

owned by

description

aligned

user

user defined entry, e.g. name and date of the person who aligned the sequence

ambig

ARB

ambiguities calculated in ARB using ‘count ambiguities’

ARB_color

ARB

stores the information about sequence colors

name

ARB

internal ARB database ID, do not change!

nuc

ARB

number of nucleotides; calculated by ARB using ‘count nucleotides’

nuc_term

ARB

number of nucleotides coding for the respective rRNA gene; calculated by ‘count nucleotides gene’

remark

user

field for remarks

tmp

ARB

used by diverse ARB modules

 

Fields and entries imported from EMBL

ARB field name

EMBL field

description

acc

AC

accession number

ali_xx/data

sequence

sequence information

author

RA

reference author(s)

bio_material

FT /bio_material

identifier for the biological material from which the nucleic acid sequenced was obtained

clone

FT /clone

clone from which the sequence was obtained

clone_lib

FT /clone_lib

clone library from which the sequence was obtained

collected_by

FT /collected_by

name of the person who collected the specimen

collection_date

FT /collection_date

date that the sample/specimen was collected

country

FT /country

geographical origin of sequenced sample

culture_collection

FT /culture_collection

institution code and identifier for the culture from which the nucleic acid sequenced was obtained, with optional collection code

date

DT

entry creation and update date separated by ;

description

DE

description

env_sample

FT /environmental_sample

identifies sequences derived by direct molecular isolation from a bulk environmental DNA sample (by PCR with or without subsequent cloning of the product, DGGE, or other anonymous methods) with no reliable identification of the source organism.

Indicated by ‘yes’ in the ARB files

full_name

OS

organism species

gene

FT /gene

symbol of the gene corresponding to a sequence region

haplotype

FT /haplotype

name for a specific set of alleles that are linked together on the same physical chromosome.

identified_by

FT /identified_by

name of the taxonomist who identified the specimen

insdc

PR

the International Nucleotide Sequence Database Collaboration (INSDC) Project Identifier that has been assigned to the entry

isolate

FT /isolate

individual isolate from which the sequence was obtained

isolation_source

FT /isolation_source

describes the physical, environmental and/or local geographical source of the biological sample from which the sequence was derived

journal

RL

reference location

lab_host

FT /lab_host

scientific name of the laboratory host used to propagate the source organism from which the sequenced molecule was obtained

lat_lon

FT /lat_lon

geographical coordinates of the location where the specimen was collected

nuc_region

FT source

identifies the biological source of the specified span of the sequence

nuc_rp

RP

reference positions

pcr_primers

FT /PCR_primers

PCR primers that were used to amplify the sequence.

plasmid

FT /plasmid

name of naturally occurring plasmid from which the sequence was obtained, where plasmid is defined as an independently replicating genetic unit that cannot be described by /chromosome or /segment.

product

FT /product

name of the product associated with the feature

publication_doi

RX

cross-reference DOI number

pubmed_id

RX

cross-reference Pubmed ID

host

FT /host

natural host from which the sequence was obtained.

Formerly specific_host

specimen_voucher

FT /specimen_

Voucher

an identifier of the individual or collection of the source organism and the place where it is currently stored, usually an institution

start

FT rRNA

start of the ribosomal RNA gene

stop

FT rRNA

stop of the ribosomal RNA gene

strain

FT /strain

strain from which the sequence was obtained.

(t) or [T]: typestrains, [C]: cultivated, [G]: genomes

submit_author

RL

submission authors from reference location

submit_date

RL

submission date from reference location

sub_species

FT /sub_species

name of sub-species of organism from which sequence was obtained

tax_embl

OC

organism classification according to EMBL

tax_embl_name

OC

organism name taken from the classification field

tax_xref_embl

FT /db_xref

database cross-reference: pointer to related information in another database

title

RT

reference title

version

ID SV

subversion from identification line

SILVA specific fields and entries

ARB field name

description

align_bp_score_slv

calculates the number of bases in helices in the aligned sequence taken into account canonical and non canonical basepairing. The cost matrix is taken from ARB Probe_Match 2

align_cutoff_head_slv

unaligned bases at the beginning of the sequence

align_cutoff_tail_slv

unaligned bases at the end of the sequence

align_log_slv

indicates if the sequence was revered and/or complemented

align_quality_slv

maximal similarity to reference sequence in the seed

aligned_slv

data and time of alignment by Silva

alternative_name_slv

synonyms or basonyms of the species according to the DSMZ ‘nomenclature up to date’ catalogue

ambig_slv

Calculated percent ambiguities in the sequences, a maximum of 2% is allowed

ann_src_slv

additional sources of sequence information is indicated in this field. Current identifiers: RNAmmer and RDP

homop_slv

Calculated percentages repetitive bases with more than four bases, a maximum of 2% is allowed

homop_events_slv

absolute number of repetitive elements with more than four bases

nuc_gene_slv

aligned bases within gene boundaries

pintail_slv

information about potential sequence anomalies detected by Pintail (1); 100 means no anomalies found.

project_name_slv

name of the sequencing project

seq_quality_slv

summary sequence quality value calculated based on values from vector, ambiguities and homopolymers, 100 means very good

tax_gg

taxonomy mapped from greengenes

tax_gg_name

organism name in greengenes

tax_rdp

nomenclatural taxonomy mapped from RDP II

tax_rdp_name

organism name in RDP II

tax_slv

SILVA taxonomy path

vector_slv

percent vector contamination, a maximum of 5% is allowed

 

Environmental parameters (introduced with SILVA release 93, extended in SILVA 96)

altitude_slv

the altitude of sampling location above sea level

chlorophyll_slv

chlorophyll concentration in the environment at time of sampling

collection_time_slv

time that the sample was collected in hours and minutes (formerly sampling_time_slv)

depth_slv

depth of the water column or sediment from where the sample was collected (formerly water_depth and sediment_depth)

dissolved_oxygen_slv

dissolved oxygen concentration in the environment at time of sampling

DOC_slv

dissolved organic carbon concentration in the environment at time of sampling

geodetic_datum_slv

geodetic datum e.g. WGS 84

habitat_slv

description of the habitat, like marine, freshwater etc..

lat_lon_details_slv

details of the measurement of geographic coordinates, like: Was latitude and longitude measured by GPS, derived from map, retrieved from literature?

nitrate_slv

nitrate concentration in the environment at time of sampling

pH_slv

pH value in the environment at time of sampling

phosphate_slv

phosphate concentration in the environment at time of sampling

POC_slv

Particulate Organic Carbon concentration in the environment at time of sampling

salinity_slv

salinity concentration in the environment at time of sampling

sample_identifier_slv

a unique identifier (ID) given to the sample that allows to cross-reference samples and contextual data

sample_material_slv

describes the sample material that was collected, e.g. water, sediment, biofilm, vent fluid etc.

sample_size_slv

volume of the sample that was collected (renamed from sample volume_slv)

silicate_slv

silicate concentration in the environment at time of sampling

temperature_slv

temperature in the environment at time of sampling

 

Green: changes introduced with SILVA release 100

 

Release: 29.01.2010

 

1. Ashelford, K. E., N. A. Chuzhanova, J. C. Fry, A. J. Jones, and A. J. Weightman. 2005. At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Appl. Environ. Microbiol. 71:7724-7736.

2. Ludwig, W., O. Strunk, R. Westram, L. Richter, H. Meier, Yadhukumar, A. Buchner, T. Lai, S. Steppi, G. Jobb, W. Forster, I. Brettske, S. Gerber, A. W. Ginhart, O. Gross, S. Grumann, S. Hermann, R. Jost, A. Konig, T. Liss, R. Lussmann, M. May, B. Nonhoff, B. Reichel, R. Strehlow, A. Stamatakis, N. Stuckmann, A. Vilbig, M. Lenke, T. Ludwig, A. Bode, and K. H. Schleifer. 2004. ARB: a software environment for sequence data. Nucleic Acid Res. 32:1363-1371.