This function is useful plot the alignment position of multiple patterns in one subject. It uses Biostrings::pairwiseAlignment(), obtains the individual alignment limits and converts that to a ggplot object with a few optional complementing information. The function will not work if a gap is induced in the subject and at least two patterns overlap at this gap.

MultiplePairwiseAlignmentsToOneSubject(
  subject,
  patterns,
  type = "global-local",
  perfect.matches.only = F,
  order.patterns = F,
  pattern.lim.size = 2,
  subject.lim.lines = F,
  attach.nt = T,
  tile.border.color = NA,
  fix_indels = F
)

Arguments

subject

a named character or named DNAStringSet of one subject (only the DNAStringSet but not DNAString can hold a name)

patterns

a named character vector or named DNAStringSet of patterns to align to the subject sequence

type

the type of alignment passed to Biostrings::pairwiseAlignment; not every type may work well with this function (if there are overlapping ranges of the alignments to the subject for example)

perfect.matches.only

filter patterns for those which match the subject without gaps, insertions or substitutions before pairwise alignment

order.patterns

order pattern increasingly by alignment position (start)

pattern.lim.size

size of printed limits of aligned patterns (at which nt does the alignment to the subject starts and ends); set to 0 to avoid plotting

subject.lim.lines

print vertical lines at the outermost subject-nts of all aligned patterns

attach.nt

add the length of the string to the name on the axis

tile.border.color

character; tiles from geom_tile are used to plot nts - should they have a border color, e.g. "black"; only useful for short alignment and only an aesthetic thing

fix_indels

in case of overlapping indels and shared subject ranges, cut respective patterns to avoid indels

Value

a list: base.plot ggplot object of alignment shows patterns colored by nt, match.plot ggplot object of alignment shows patterns colored match, mismatch, etc, base.df = df and match.df are the respective data.frames used for plotting, seq min.max.subject.position indicates the outer limits of all aligned patterns (min = start position of first aligned pattern, max = end position of the last aligned pattern)

Examples

if (FALSE) {
s <- stats::setNames("AAAACCCCTTTTGGGGAACCTTCC", "sub")
s <- Biostrings::DNAStringSet(s)
p <- stats::setNames(c("TTCC", "CCCC", "TTTT", "GGGG", "AAAA"), c("pat1", "pat2", "pat3", "pat4", "pat5"))
p <- Biostrings::DNAStringSet(p)
als <- igsc::MultiplePairwiseAlignmentsToOneSubject(subject = s, patterns = p, tile.border.color = "black")
als_ordered <- igsc::MultiplePairwiseAlignmentsToOneSubject(subject = s, patterns = p, tile.border.color = "black", order.patterns = T)
}