All Collections
Software
Why are so many of my reads "unclassified" by WIMP?
Why are so many of my reads "unclassified" by WIMP?
Updated over a week ago

In general, reads reported as unclassified are those for which no unique sequence has been found. Normally this will correlate with low mean q-scores, but it does not mean that there is no information in it.

WIMP attempts to assign each read to a taxon in the NCBI taxonomy. The Centrifuge classification results are then filtered and aggregated to calculate, and report counts of reads at the species rank.

For reads without a reliable assignment at the species rank, higher ranks of the taxonomy tree are used for the assignment. If no placement is reliable enough (below a scoring threshold), the sequence is labeled as " Unclassifed ".

More accurately, this means that it cannot assign confidently a taxon id in the taxonomy of complete whole genomes available in RefSeq (bacteria, fungi, archaea, and virus).

With other computational methods (e.g, alignment) it might be possible to identify where that read comes from.

For more information on scoring and assignment, please see the WIMP description.

Did this answer your question?