de.NBI Logo

SILVA ACT Tutorial

  1. Paste your sequence into the text field or choose an file to upload. Your sequence data must be in FASTA format. 
  2. Choose the basic settings  

    Parameters:

    • Gene: Select the appropriate RNA type (SSU/LSU)
    • Output settings:
      • File format: "FASTA with meta-data" will add additional meta-data to the header of each sequence (You can also get the same data in CSV format from the result tables).
      • File compression: Using an compression reduces the file size and download time. The ZIP format can be used on most systems, the TGZ format has a higher compression rate.
      • Reject sequences below identity (%): Include only sequences in the output that have at least the configured identity with at least one of the SEED sequences. Only affects FASTA output (ARB users can easily filter within ARB).
    • Job Name: Give your job a name to make it easier to identify later on.
  3. After clicking "Run Aligner" the "Aligner Taskmanager" will display an additional job:
  4. Once the job is "Finished", the "Alignment Result Table" appears. By default, basic alignment scores are displayed. If you enabled "search and classify" when submitting your alignment job, the "Display Classifications" button switches the set of visible columns to show the classification results. Click on "Export to CSV" to download a CSV file containing all values generated by SINA. 
    Column Descriptions:
    Sequence Identifier:
    The first word of the respective FASTA header line (this word should be unique for each sequence in a multi-FASTA file).
    Full Name:
    The remainder of the FASTA header.
    Identity:
    The highest identity the aligned sequence has with any sequence in the alignment SEED.
    Score:
    The SINA alignment score.
    Cutoff Head/Tail:
    The number of unaligned bases at the head/tail of the sequence. These are likely not to be part of the gene. If you selected "remove" when submitting the job, sequences will be automatically truncated and the values will be zero.
    E. coli Pos.:
    The position of the first aligned base of your sequence within the Escherichia coli reference sequence.
    Gene Bps:
    The number of base pairs within the SSU or LSU gene.
    Turn:
    Indicates whether your sequence was complemented, reversed or both.
  5. When selecting an finished Aligner job in the Task manager the "Download File" menu button is shown and the aligned sequences and the log file can be downloaded. By using the right mouse buttons "Save Target as..." option you save the file on disk without showing it in the browser.

 

 

Search and Classify

  1. Enabling "Search and classify" will also activate the "Basic search parameters". The search stage of SINA will compare your sequences to our Ref NR databases. For each of your sequences, the sequences most similar according to the alignment will be returned. You can control the size of the search result with two parameters: The "Min. identity with query sequence" limits the search result to sequences having at least that identity with the respective query sequence. The "Number of results..." parameter limits the number of results per query sequence.
    SINA uses the search result to derive a classification with the LCA (lowest common ancestor) method. Each query sequence is assigned the shared part of the classifications of the search results. Thus, configuring a "Min identity" of 0.5 and a maximum number of search results of 1 per sequence results in "best match" type classification (with the inherent danger of classifying to a deeper rank than justified). Configuring the identity to 0.9 and the number of results to 10 will base classification on the ten most similar sequences having an identity with the query of at least 90% (which may leave some sequences unclassified or not classified down to genus rank). 

    Parameters:

    • Min. identity with query sequence: The sequence identity is computed as the number of shared bases (common base-column pairs) divided by the length of the query sequence.
    • Number of results per query sequence: The limit of the number of reported results.
  2. The classification of the submitted sequences is shown in the "Alignment Results Table" by clicking the "Display Classifications" Button. If no classification could be made (no search results found or results have different domains), the query sequence will be assigned the classification "Unclassified". The submitted sequences are classified in all available taxonomies for the chosen gene type. Different classifications can be shown by either activating additional columns in the tables header or downloading the data by choosing "Export To CSV".
  3. The result of the neighbor search of the submited sequences can be added to the cart by choosing the respective aligner job in the "Taskmanager" and clicking on "Add neighbors to cart":
  4. After adding the neighbor sequences to the cart they can be shown by opening the "Search" page and choosing "Show":

Compute Tree

  1. Enabling "Compute tree" will also activate the "Advanced tree parameters". Based on the aligned sequences a maximum likelihood tree will be computed using the specified parameters.

    Parameters:

    • Workflow: The tool offers three different workflows among which the user can choose. Two denovo computations allow to compute a tree either using only the sequences uploaded by the user (user sequences only) or using both the uploaded sequences and all neighbours found by the aligner (including neighbours). A third workflow (Add to neighbours tree) consists of first computing a denovo tree using only the neighbours found by the aligner, then adding to that tree the sequences uploaded by the user. This last workflow is available only for the RAxML tool and is indicated for short sequences.
    • Program to use for tree computation: The tool to be used for computing the tree, currently Opens external link in new windowRAxML and Opens external link in new windowFastTree.
    • Model for tree computation: The substitution model.
    • Rate model for likelihoods: The model/approximation used to compute the likelihoods (note that the CAT approximation is recommended only for datasets with more than 50 sequences).
  2. The computed tree is available in the results archive for download. The archive also contains the log file of the tool's execution and the aligned sequences used to compute the tree (note that, due to positional variability filtering and possible inclusion of neighbours, these can be different from the original aligned sequences - see advanced tree parameters for details). 

Advanced Parameters

Tip: Hovering with the cursor over options will also show the enhanced descriptions.

This is still in development.

Advanced Alignment Parameters

Advanced Search and Classification Parameters

Advanced Tree Parameters

The tree computation can be fine-tuned with the advanced parameters.

Parameters

  • Automatic filter Newick reserved characters: when enabled (currently it cannot be disabled), it performs a substitution of characters that conflict with the Newick standard to ensure only compatible identifiers are provided as input of the tree computation. Replacement is performed as follows:
    • : -> &a
    • ; -> &b
    • ( -> &c
    • ) -> &d
    • [ -> &e
    • ] -> &f
    • , -> &g
    • ' -> &h
  • Positional variability filter: the PosVar filter is applied to the aligned sequences before reconstructing the tree. The filter removes the positions according to their variability. Lower numbers represent high variability positions.
  • Domain: Domain specific filters are computed on the respective subset of the SILVA database at each release and then applied to user provided sequences by ACT. Choose the appropriate domain for your dataset to ensure the most relevant filter is applied.

Positional variability

The positional variability filter feature is provided by the ARB software. A description of the computation of positional variability and the meaning of the digits can be found in the Opens external link in new windowarb help documentation. The characters "." and "-" are always filtered. The higher the number the more conserved sites get filtered out.

Results

Computation results

Depending on the tools and functions selected, the results archive contains different files. The name has a prefix, identifying the elaboration task, which looks like arb-silva.de_YYYY-MM-DD_idNNNNNN, simply referred to as prefix in the following list:

  • prefix.fasta: holds the aligned sequences that have been submitted to the task (user sequences). Alignment is provided as standard SILVA alignment with 50000 positions.
  • prefix_merged.fasta: holds the user sequences and, when enabled, the neighbouring sequences from the SILVA database. These sequences are filtered according to the positional variability filter selected and have a variable number of positions since common gaps are also removed.
  • prefix.tree: the phylogenetic tree inferred on the merged and filtered dataset (prefix_merged.fasta) with the selected tool.
  • prefix.tree.log: the execution log file of the tree computation tool, provided for inspection and reproducibility of results.