core Module¶
Find anchors with find_my_anchors()
The module also provides combine() to combine/select/remove anchors and
cutout() to cut out subsequences.
- anchorna.core._split_cutout_pos(pos, mode, seqs, anchors, defaultB='^')[source]¶
Split a string, .i.e. “A10>+5”, into its three parts A, B, C
- anchorna.core._start_parallel_jobs(tasks, do_work, results, njobs=0, pbar=True, threaded=False)[source]¶
- anchorna.core._transform_cutout_index(A, B, C, id_, seq, mode)[source]¶
Transform a fluke and position given by A,B,C to an integer index
- anchorna.core.anchor_at_pos(i, aas, w, gseqid, search_range, score_add_word, thr_quota_add_anchor, thr_score_add_anchor, scoring)[source]¶
Find an anchor for a specific position i in the guiding sequence gseqid
Return anchor or None, for description of options, see example configuration file and
anchorna go -h.Add fluke at position i for gseqid to anchor, set word to gword, set words set to {word}
Add all other ids to todo list
Until todo list is empty
find best (score, index j) with word for each sequence in todo and add (score, j, seqid) to heap
pop (score, j, seqid) pair with highest score from heap, until empty
if seqid not in todo -> continue 3b)
add to anchor, remove seqid from todos
if score < thr_score_add_anchor, check if thr_quota_add_anchor can still be fulfilled, otherwise return None (no anchor found)
if new word not in words, add it to words and set as new word, break loop 3b)
Recalculate score, create and return anchor
- anchorna.core.combine(lot_of_anchors, convert_nt=False)[source]¶
Combine lists of anchors into a single AnchorList
Deal with possibly different offset values. Is called by
anchorna combine.
- anchorna.core.cutout(seqs, anchors, pos1, pos2, mode='nt', score_use_fluke=None, gap=None, update_fts=False)[source]¶
Cutout subsequences from pos1 to pos2 (i.e. between two anchors)
For help about this command and about how to define positions, see
anchorna cutout -h.
- anchorna.core.find_anchors_winlen(aas, w, gseqid, indexrange=None, anchors=None, njobs=0, pbar=True, threaded=False, **kw)[source]¶
Find multiple anchors in aa sequences for a specific word length
Calls
anchor_at_pos()for each position in the guiding sequence, possibly in parallel.
- anchorna.core.find_my_anchors(seqs, remove=True, aggressive_remove=True, continue_with=None, no_cds=False, scoring=None, **kw)[source]¶
Find and return anchors in CDS region of nucleotide sequences
This function is called by the
anchorna gocommand. For a description of arguments see the example configuration file and the CLI help. The function applies three steps:A Find anchors of predefined word length in all sequences, this is done in thefind_anchors_winlen()function, unresolved kwargs are passed on,B Merge overlapping anchors withAnchorList.merge_overlapping_anchors()andC Remove conflicting anchors withAnchorList.remove_contradicting_anchors().