core Module

Find anchors with find_my_anchors()

The module also provides combine() to combine/select/remove anchors and cutout() to cut out subsequences.

anchorna.core._split_cutout_pos(pos, mode, seqs, anchors, defaultB='^')[source]

Split a string, .i.e. “A10>+5”, into its three parts A, B, C

anchorna.core._start_parallel_jobs(tasks, do_work, results, njobs=0, pbar=True, threaded=False)[source]
anchorna.core._transform_cutout_index(A, B, C, id_, seq, mode)[source]

Transform a fluke and position given by A,B,C to an integer index

anchorna.core.anchor_at_pos(i, aas, w, gseqid, search_range, score_add_word, thr_quota_add_anchor, thr_score_add_anchor, scoring)[source]

Find an anchor for a specific position i in the guiding sequence gseqid

Return anchor or None, for description of options, see example configuration file and anchorna go -h.

  1. Add fluke at position i for gseqid to anchor, set word to gword, set words set to {word}

  2. Add all other ids to todo list

  3. Until todo list is empty

  1. find best (score, index j) with word for each sequence in todo and add (score, j, seqid) to heap

  2. pop (score, j, seqid) pair with highest score from heap, until empty

  • if seqid not in todo -> continue 3b)

  • add to anchor, remove seqid from todos

  • if score < thr_score_add_anchor, check if thr_quota_add_anchor can still be fulfilled, otherwise return None (no anchor found)

  • if new word not in words, add it to words and set as new word, break loop 3b)

  1. Recalculate score, create and return anchor

anchorna.core.combine(lot_of_anchors, convert_nt=False)[source]

Combine lists of anchors into a single AnchorList

Deal with possibly different offset values. Is called by anchorna combine.

anchorna.core.cutout(seqs, anchors, pos1, pos2, mode='nt', score_use_fluke=None, gap=None, update_fts=False)[source]

Cutout subsequences from pos1 to pos2 (i.e. between two anchors)

For help about this command and about how to define positions, see anchorna cutout -h.

anchorna.core.find_anchors_winlen(aas, w, gseqid, indexrange=None, anchors=None, njobs=0, pbar=True, threaded=False, **kw)[source]

Find multiple anchors in aa sequences for a specific word length

Calls anchor_at_pos() for each position in the guiding sequence, possibly in parallel.

anchorna.core.find_my_anchors(seqs, remove=True, aggressive_remove=True, continue_with=None, no_cds=False, scoring=None, **kw)[source]

Find and return anchors in CDS region of nucleotide sequences

This function is called by the anchorna go command. For a description of arguments see the example configuration file and the CLI help. The function applies three steps:

A Find anchors of predefined word length in all sequences, this is done in the find_anchors_winlen() function, unresolved kwargs are passed on,
B Merge overlapping anchors with AnchorList.merge_overlapping_anchors() and
C Remove conflicting anchors with AnchorList.remove_contradicting_anchors().
anchorna.core.maxes(a, key=None, default=None)[source]

Like max function, but returns list of all maximal values

anchorna.core.shift_and_find_best_word(seq, word, starti, w, sm, maxshift=None, maxshift_right=None)[source]

Find position of the most similar word in a sequence

Returns:

tuple with similarity and index