- Enabling "Search and classify" will also activate the "Basic search parameters". The search stage of SINA will compare your sequences to our Ref NR databases. For each of your sequences, the sequences most similar according to the alignment will be returned. You can control the size of the search result with two parameters: The "Min. identity with query sequence" limits the search result to sequences having at least that identity with the respective query sequence. The "Number of results..." parameter limits the number of results per query sequence.
SINA uses the search result to derive a classification with the LCA (lowest common ancestor) method. Each query sequence is assigned the shared part of the classifications of the search results. Thus, configuring a "Min identity" of 0.5 and a maximum number of search results of 1 per sequence results in "best match" type classification (with the inherent danger of classifying to a deeper rank than justified). Configuring the identity to 0.9 and the number of results to 10 will base classification on the ten most similar sequences having an identity with the query of at least 90% (which may leave some sequences unclassified or not classified down to genus rank). Parameters:
- Min. identity with query sequence: The sequence identity is computed as the number of shared bases (common base-column pairs) divided by the length of the query sequence.
- Number of results per query sequence: The limit of the number of reported results.
- The classification of the submitted sequences is shown in the "Alignment Results Table" by clicking the "Display Classifications" Button. If no classification could be made (no search results found or results have different domains), the query sequence will be assigned the classification "Unclassified". The submitted sequences are classified in all available taxonomies for the chosen gene type. Different classifications can be shown by either activating additional columns in the tables header or downloading the data by choosing "Export To CSV".
- The result of the neighbor search of the submited sequences can be added to the cart by choosing the respective aligner job in the "Taskmanager" and clicking on "Add neighbors to cart":
- After adding the neighbor sequences to the cart they can be shown by opening the "Search" page and choosing "Show":
- Enabling "Compute tree" will also activate the "Advanced tree parameters". Based on the aligned sequences a maximum likelihood tree will be computed using the specified parameters.
Parameters:
- Workflow: The tool offers three different workflows among which the user can choose. Two denovo computations allow to compute a tree either using only the sequences uploaded by the user (user sequences only) or using both the uploaded sequences and all neighbours found by the aligner (including neighbours). A third workflow (Add to neighbours tree) consists of first computing a denovo tree using only the neighbours found by the aligner, then adding to that tree the sequences uploaded by the user. This last workflow is available only for the RAxML tool and is indicated for short sequences.
- Program to use for tree computation: The tool to be used for computing the tree, currently RAxML and FastTree.
- Model for tree computation: The substitution model.
- Rate model for likelihoods: The model/approximation used to compute the likelihoods (note that the CAT approximation is recommended only for datasets with more than 50 sequences).
- The computed tree is available in the results archive for download. The archive also contains the log file of the tool's execution and the aligned sequences used to compute the tree (note that, due to positional variability filtering and possible inclusion of neighbours, these can be different from the original aligned sequences - see advanced tree parameters for details).
Tip: Hovering with the cursor over options will also show the enhanced descriptions.
This is still in development.
Advanced Alignment Parameters
Advanced Search and Classification Parameters
Advanced Tree Parameters
The tree computation can be fine-tuned with the advanced parameters.
Parameters
- Automatic filter Newick reserved characters: when enabled (currently it cannot be disabled), it performs a substitution of characters that conflict with the Newick standard to ensure only compatible identifiers are provided as input of the tree computation. Replacement is performed as follows:
- : -> &a
- ; -> &b
- ( -> &c
- ) -> &d
- [ -> &e
- ] -> &f
- , -> &g
- ' -> &h
- Positional variability filter: the PosVar filter is applied to the aligned sequences before reconstructing the tree. The filter removes the positions according to their variability. Lower numbers represent high variability positions.
- Domain: Domain specific filters are computed on the respective subset of the SILVA database at each release and then applied to user provided sequences by ACT. Choose the appropriate domain for your dataset to ensure the most relevant filter is applied.
Positional variability
The positional variability filter feature is provided by the ARB software. A description of the computation of positional variability and the meaning of the digits can be found in the arb help documentation. The characters "." and "-" are always filtered. The higher the number the more conserved sites get filtered out.
Computation results
Depending on the tools and functions selected, the results archive contains different files. The name has a prefix, identifying the elaboration task, which looks like arb-silva.de_YYYY-MM-DD_idNNNNNN, simply referred to as prefix in the following list: