MultiplePairwiseAlignmentsToOneSubject.Rd
This function is useful plot the alignment position of multiple patterns in one subject. It uses Biostrings::pairwiseAlignment(), obtains the individual alignment limits and converts that to a ggplot object with a few optional complementing information. The function will not work if a gap is induced in the subject and at least two patterns overlap at this gap.
MultiplePairwiseAlignmentsToOneSubject( subject, patterns, type = "global-local", perfect.matches.only = F, order.patterns = F, pattern.lim.size = 2, subject.lim.lines = F, attach.nt = T, tile.border.color = NA, fix_indels = F )
subject | a named character or named DNAStringSet of one subject (only the DNAStringSet but not DNAString can hold a name) |
---|---|
patterns | a named character vector or named DNAStringSet of patterns to align to the subject sequence |
type | the type of alignment passed to Biostrings::pairwiseAlignment; not every type may work well with this function (if there are overlapping ranges of the alignments to the subject for example) |
perfect.matches.only | filter patterns for those which match the subject without gaps, insertions or substitutions before pairwise alignment |
order.patterns | order pattern increasingly by alignment position (start) |
pattern.lim.size | size of printed limits of aligned patterns (at which nt does the alignment to the subject starts and ends); set to 0 to avoid plotting |
subject.lim.lines | print vertical lines at the outermost subject-nts of all aligned patterns |
attach.nt | add the length of the string to the name on the axis |
tile.border.color | character; tiles from geom_tile are used to plot nts - should they have a border color, e.g. "black"; only useful for short alignment and only an aesthetic thing |
fix_indels | in case of overlapping indels and shared subject ranges, cut respective patterns to avoid indels |
a list: base.plot ggplot object of alignment shows patterns colored by nt, match.plot ggplot object of alignment shows patterns colored match, mismatch, etc, base.df = df and match.df are the respective data.frames used for plotting, seq min.max.subject.position indicates the outer limits of all aligned patterns (min = start position of first aligned pattern, max = end position of the last aligned pattern)
if (FALSE) { s <- stats::setNames("AAAACCCCTTTTGGGGAACCTTCC", "sub") s <- Biostrings::DNAStringSet(s) p <- stats::setNames(c("TTCC", "CCCC", "TTTT", "GGGG", "AAAA"), c("pat1", "pat2", "pat3", "pat4", "pat5")) p <- Biostrings::DNAStringSet(p) als <- igsc::MultiplePairwiseAlignmentsToOneSubject(subject = s, patterns = p, tile.border.color = "black") als_ordered <- igsc::MultiplePairwiseAlignmentsToOneSubject(subject = s, patterns = p, tile.border.color = "black", order.patterns = T) }