Aligning sequences

  1. Paste your sequence into the text field or choose an file to upload. Your sequence data must be in FASTA format.
  2. Choose the basic settings

    Parameters:

    • Gene: Select the appropriate RNA type (SSU/LSU)
    • Output settings:
      • File format: "FASTA with meta-data" will add additional meta-data to the header of each sequence (You can also get the same data in CSV format from the result tables).
      • File compression: Using an compression reduces the file size and download time. The ZIP format can be used on most systems, the TGZ format has a higher compression rate.
      • Reject sequences below identity (%): Include only sequences in the output that have at least the configured identity with at least one of the SEED sequences. Only affects FASTA output (ARB users can easily filter within ARB).
    • Job Name: Give your job a name to make it easier to identify later on.
  3. After clicking "Run Aligner" the "Aligner Taskmanager" will display an additional job:
  4. Once the job is "Finished", the "Alignment Result Table" appears. By default, basic alignment scores are displayed. If you enabled "search and classify" when submitting your alignment job, the "Display Classifications" button switches the set of visible columns to show the classification results. Click on "Export to CSV" to download a CSV file containing all values generated by SINA. 
    Column Descriptions:
    Sequence Identifier:
    The first word of the respective FASTA header line (this word should be unique for each sequence in a multi-FASTA file).
    Full Name:
    The remainder of the FASTA header.
    Identity:
    The highest identity the aligned sequence has with any sequence in the alignment SEED.
    Score:
    The SINA alignment score.
    Cutoff Head/Tail:
    The number of unaligned bases at the head/tail of the sequence. These are likely not to be part of the gene. If you selected "remove" when submitting the job, sequences will be automatically truncated and the values will be zero.
    E. coli Pos.:
    The position of the first aligned base of your sequence within the Escherichia coli reference sequence.
    Gene Bps:
    The number of base pairs within the SSU or LSU gene.
    Turn:
    Indicates whether your sequence was complemented, reversed or both.
  5. When selecting an finished Aligner job in the Task manager the "Download File" menu button is shown and the aligned sequences and the log file can be downloaded. By using the right mouse buttons "Save Target as..." option you save the file on disk without showing it in the browser.

Search and Classify

  1. Enabling "Search and classify" will also activate the "Basic search parameters". The search stage of SINA will compare your sequences to our Ref NR databases. For each of your sequences, the sequences most similar according to the alignment will be returned. You can control the size of the search result with two parameters: The "Min. identity with query sequence" limits the search result to sequences having at least that identity with the respective query sequence. The "Number of results..." parameter limits the number of results per query sequence.
    SINA uses the search result to derive a classification with the LCA (lowest common ancestor) method. Each query sequence is assigned the shared part of the classifications of the search results. Thus, configuring a "Min identity" of 0.5 and a maximum number of search results of 1 per sequence results in "best match" type classification (with the inherent danger of classifying to a deeper rank than justified). Configuring the identity to 0.9 and the number of results to 10 will base classification on the ten most similar sequences having an identity with the query of at least 90% (which may leave some sequences unclassified or not classified down to genus rank).

    Parameters:

    • Min. identity with query sequence: The sequence identity is computed as the number of shared bases (common base-column pairs) divided by the length of the query sequence.
    • Number of results per query sequence: The limit of the number of reported results.
  2. The classification of the submitted sequences is shown in the "Alignment Results Table" by clicking the "Display Classifications" Button. If no classification could be made (no search results found or results have different domains), the query sequence will be assigned the classification "Unclassified". The submitted sequences are classified in all available taxonomies for the chosen gene type. Different classifications can be shown by either activating additional columns in the tables header or downloading the data by choosing "Export To CSV".
  3. The result of the neighbor search of the submited sequences can be added to the cart by choosing the respective aligner job in the "Taskmanager" and clicking on "Add neighbors to cart":
  4. After adding the neighbor sequences to the cart they can be shown by opening the "Search" page and choosing "Show":

Advanced Parameters

Tip: Hovering with the cursor over options will also show the enhanced descriptions.

This is still in development.

Advanced Alignment Parameters

Advanced Search and Classification Parameters