vdjdb_hits.Rd
Matching CDR3s from sequencing with CDR3s of known antigen specificity may help to narrow potential antigen specificities of one's sequenced TCRs. Export a tsv of all or selected entries here https://vdjdb.cdr3.net/search. Provide that tsv as data frame (vdjdb). From vdjdb and tcrs distinct rows of tr_cols and cdr3_col are compared by levensthein distance and if selected (PAM30_similarity) by amino acid similarity. Returned data frame may be used to join original columns from vdjdb and tcrs followed by plotting with ggplot2 (see example). Original column names (vdj_tr_col, tcr_tr_col and vdj_cdr3_col, tcr_cdr3_col) are kept in the returned data frame to allow easy subsequent joining of vdjdb and tcrs. It requires though that these columns have different names in vdjdb and tcrs.
vdjdb_hits( vdjdb, tcrs, vdj_tr_col = "Gene", tcr_tr_col = "chain", vdj_cdr3_col = "CDR3", tcr_cdr3_col = "CDR3_aa_cr", TRAxTRB = F, max_lvdist = 3, PAM30_similarity = F, mapply_fun = mapply, lapply_fun = lapply, ... )
vdjdb | data frame of TCR data from vdjdb; vdj_cdr3_col and vdj_tr_col must be present; if not provided system.file("extdata", "vdjdb.tsv.rds", package = "igsc") is used (table downloaded 06/12/2021) |
---|---|
tcrs | data frame of TCR data from sequencing, tcr_cdr3_col and tcr_tr_col must be present; e.g. c_long from clonotype prep |
vdj_tr_col | column which codes the TCR chain in the vdjdb reference data frame (should contain TRA and/or TRB only) |
tcr_tr_col | column which codes the TCR chain in the tcrs data frame (should contain TRA and/or TRB only) |
vdj_cdr3_col | column which codes the CDR3s in the vdjdb reference data frame |
tcr_cdr3_col | column which codes the CDR3s in the tcrs data frame |
TRAxTRB | logical, apart from comparing TRA vs TRA and TRB vs TRB between tcrs and vdjdb also check for TRA vs TRB? |
max_lvdist | maximum levensthein distance between CDR3s for return data frame (also filtering before PAM30 similarity calculation) |
PAM30_similarity | logical whether to calculate the similarity (pairwise alignment) between CDR3s based on PAM30 substitution matrix |
mapply_fun | mapply function for PAM30_similarity, suggested are mapply, pbapply::pbmapply, parallel::mcmapply |
lapply_fun | lapply function for levensthein distance calculation, suggested are lapply, pblapply::pblapply, parallel::mclapply |
... | arguments passed to mapply_fun and lapply_fun, most relevant: mc.cores (e.g parallel::detectCores()) |
data frame of matched CDR3s from tcrs and vdjdb, column lv indicates the levensthein between CDR3 from vdjdb and tcrs
if (FALSE) { # tcrs data frame with sequencing data from TCRs has to be created (cl_long is appropriate) matches <- vdjdb_hits(vdjdb, tcrs, mapply_fun = parallel::mcmapply, mc.cores = parallel::detectCores(), nthread = parallel::detectCores()) matches_append <- matches %>% dplyr::left_join(tcrs) %>% dplyr::left_join(vdjdb) df <- df %>% dplyr::distinct(CDR3, CDR3_aa_cr, lv, Gene, chain, cl_name, patient, Epitope, Epitope.gene, Epitope.species, Score) %>% tidyr::drop_na() %>% dplyr::filter(cl_name %in% c("Madalyn", "Mikaylla", "Morgan")) %>% dplyr::filter(lv < 3) %>% dplyr::filter(Score > 0) # use jitter for many hits (when beeswarms become overlapping) ggplot(df, aes(x = Score, y = lv, color = Epitope.species)) + ggbeeswarm::geom_beeswarm(groupOnX = F, cex = 4, size = 2.5) + #geom_jitter(width = 0.15, height = 0.15, size = 3) + theme_bw() + theme(panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), strip.background = element_rect(fill = "white"), text = element_text(family = "Courier")) + scale_x_continuous(breaks = fcexpr::int_breaks(n = 3), limits = c(0.8,3.2)) + scale_y_reverse(breaks = fcexpr::int_breaks(n = 3)) + guides(fill = guide_legend(override.aes = list(size = 5)), color = guide_legend(override.aes = list(size = 5))) + facet_grid(rows = vars(chain), cols = vars(patient, cl_name)) }