Calculate dimension reductions and cluster annotations with data from one or more flow frames and add these parameters to a (concatenated) fcs file

Prepare a list of flowframes (ff.list) with fcexpr::wsp_get_ff or fcexpr::inds_get_ff. The objects returned from these function will compatible with fcexpr::dr_to_fcs. The list must contain logicle transformed fluorescence intensities (FI) and optionally may contain corresponding inverse transformed FI (which is the transformation you see in flowjo). For dimension reduction logicle transformed FI are used. Currently, this is obligatory. This transformation, optionally additional scaling operations alogn the way (scale.whole, scale.samples) as well as cluster annotation will be written to the resulting (concatenated) fcs file for manual inspection in flowjo. Certain dimension reduction algorithms and cluster detection algorithm may become slow with a large number of events (e.g. > 1e6, see the details). In order to get a quick impression of what the algorithms can pull out for you, you may use the 'downsample' argument in fcexpr::wsp_get_ff or fcexpr::inds_get_ff to conveniently sample a random subset of events from each fcs files (flowframe).

dr_to_fcs(
  ff.list,
  channels = NULL,
  add.sample.info = NULL,
  scale.whole = c("z.score", "min.max", "none"),
  scale.samples = c("none", "z.score", "min.max"),
  run.harmony = F,
  run.pca = F,
  run.lda = F,
  run.umap = T,
  run.tsne = F,
  run.som = T,
  run.gqtsom = F,
  run.louvain = F,
  run.kmeans = F,
  run.minibatchkmeans = F,
  run.kmeans_arma = F,
  run.kmeans_rcpp = F,
  run.leiden = F,
  run.hclust = F,
  run.flowClust = F,
  run.MUDAN = F,
  n.pca.dims = 0,
  calc.cluster.markers = NULL,
  extra.cols = NULL,
  mc.cores = 1,
  save.to.disk = c("fcs", "rds"),
  save.path = file.path(getwd(), paste0(substr(gsub("\\.", "",
    make.names(as.character(Sys.time()))), 2, 15), "_FCS_dr")),
  exclude.extra.channels = ifelse(length(ff.list) == 1 && names(ff.list) == "logicle",
    "cluster.id", "FSC|SSC|Time|cluster.id"),
  write.scaled.channels.to.FCS = F,
  timeChannel = "Time",
  ...
)

Arguments

ff.list: a list of flowFrames as received from fcexpr::wsp_get_ff (compensated with Compensation Matrix as defined in FlowJo by default) or as received from fcexpr::inds_get_ff (directly from FCS files, not compensated by default)
channels: a named vector of channels to use for dimension reduction. values can be channel names (v-450LP..., b-520..., or so), or channel descriptions (e.g. CD3 or LAG3-PeCy7 for example) names will be used as new description in the fcs file to be written; if no names provided, names of the very first FCS file will be used
add.sample.info: named list of additional channels to identify samples or group them in flowjo; e.g. for 9 fcs files to be concatenated: add.sample.info = list(condition = c(1,2,3,1,2,3,1,2,3), donor = c(1,1,1,2,2,2,3,3,3))
scale.whole: if and how to scale channels after concatenation of flowframes in ff.list
scale.samples: if and how to scale channels of flowframes in ff.list individually before concatenation
run.harmony: attempt batch correction using harmony::HarmonyMatrix; if TRUE, then harmony__meta_data has to be provided in ... indicating the groups to be batch corrected; harmony is conducted before run.pca; to calculate a pca before harmony, pass harmony__do_pca = T and optional a number of pcs with harmony__npcs. Set run.pca = F when a pca is before in harmony.
run.pca: run principle component analysis only, or prior to other dimension reduction (umap, SOM, tSNE, ...)
run.lda: run Linear Discriminant Analysis before dimension reduction; should be F (FALSE) or a clustering calculated before, e.g. louvain_0.8 or leiden_1.1, kmeans_12 etc.; respective clustering calculation has to be provided in ...
run.umap: calculate UMAP dimension reduction with uwot::umap
run.tsne: calculate tsne dimension reduction with Rtsne::Rtsne
run.som: calculate SOM dimension reduction EmbedSOM::SOM followed by EmbedSOM::EmbedSOM
run.gqtsom: calculate GQTSOM dimension reduction EmbedSOM::GQTSOM followed by EmbedSOM::EmbedSOM
run.louvain: detect clusters (communities, groups) of cells with the louvain algorithm, implemented in Seurat::FindClusters (subsequent to snn detection by Seurat::FindNeighbors)
run.kmeans: detect clusters with stats::kmeans
run.minibatchkmeans: detect clusters with ClusterR::MiniBatchKmeans
run.kmeans_arma: detect clusters with ClusterR::KMeans_arma
run.kmeans_rcpp: detect clusters with ClusterR::KMeans_rcpp
run.leiden: detect clusters (communities, groups) of cells with the leiden algorithm, with leiden::leiden (subsequent to snn detection by Seurat::FindNeighbors)
run.hclust: detect clusters with stats::dist, stats::hclust and stats::cutree
run.flowClust: detect clusters with flowClust::flowClust
run.MUDAN: detect clusters with MUDAN::getComMembership
n.pca.dims: number of principle components to calculate
calc.cluster.markers: if NULL nothing is calculated; otherwise marker features (stained markers) are determined by wilcox test using presto::wilcoxauc for the provided clustering(s). each cluster is tested against other events and clusters are compaired pairwise. respective clustering calculation has to be provided in ...; e.g. if louvain__resolution = 0.5 is provided set calc.cluster.markers = louvain_0.5; and if in addition leiden__resolution_parameter = 0.7 then set calc.cluster.markers = c(louvain_0.5, leiden_0.7).
extra.cols: vector of one extra column (or matrix of multiple columns) to add to the final fcs file; has to be numeric; has to be equal to the number of rows in all flowframes provided; colnames will be the channel names in the FCS file; could be a previously calculated dimension reduction or cluster annotation.
mc.cores: mc.cores to calculate clusterings, limited to parallel::detectCores()-1
save.to.disk: what to save to disk: (concatenated) and appended FCS file and/or rds file with several elements in a list
save.path: where to save elements specified in save.to.disk; set to NULL to have nothing written to disk
exclude.extra.channels: when scaled and transform channels are written to FCS file, some channels may be redundant and will only occupy disk space, those are specified here; matched with grepl
write.scaled.channels.to.FCS: do save scaled channels (scale.whole, scale.samples) to FCS file
timeChannel: name of the Time channel to exclude from all analyses and calculation; if NULL will be attempted to be detected automatically
...: additional parameters to calculations of UMAP, tSNE, SOM, GQTSOM, EmbedSOM, louvain, leiden, harmony, flowClust, hclust, MUDAN, kmeans; provide arguments as follows: UMAP__n_neighbors = c(15,20,25), or tsne__theta = 0.3, etc. see respected help files to get to know which arguments can be passed: uwot::umap, Rtsne::Rtsne, EmbedSOM::SOM, EmbedSOM::GQTSOM, EmbedSOM::EmbedSOM, harmony::HarmonyMatrix, flowClust::flowClust, louvain: Seurat::FindNeighbors and Seurat::FindCluster, leiden: Seurat::FindNeighbors and leiden::leiden. hclust: stats::dist and stats::hclust, MUDAN: MUDAN::getComMembership, stats::kmeans, ClusterR::MiniBatchKmeans, ClusterR::KMeans_rcpp, ClusterR::KMeans_arma

Value

A list with 3 elements: (i) The matrix of fluorescence intensities and appended information (dim red, clustering). This is the table which is written into a newly generated fcs file. (ii) A character vector of meaningful column names which may be used for the table in R (rather for convenience). (iii) Tables of marker features (each cluster vs all other events and all clusters pairwise).

Details

Logicle transformation of FI has been described here: Parks, et al., 2006, PMID: 16604519 DOI: 10.1002/cyto.a.20258. Different transformations have been compared for instance here: Transformations. Dimension reduction (low dimension embedding) algorithms are: UMAP, tSNE, SOM, GQTSOM, PCA. tSNE is expected to be too slow for more then 1e4 or 1e5 cells (depending precision set by tsne__theta). UMAP has been developed later and is well-know. SOM and GQTSOM are similar to flowSOM but are much more convenient to use in programming as they accept matrices whereas flowSOM strictly requires flowFrames. Also SOM and GQTSOM may be superior with respect to calculation speed. PCA will be orders of magnitude faster than UMAP especially on large numbers of events (1e6 and above). On the other hand it will not produce as nicely separated clusters as it is a linear algorithm. Cluster (community) detection algorithms are louvain, leiden, kmeans, hclust, flowClust and MUDAN. These assign events with similar properties in the high dimensional space a common discrete label. kmeans will be the quickest. Start with this one and progress to using louvain which is slower but may yield better results. For a low number of events (e.g. below 1e6) louvain will perform in a reasonable amount of time and could used immediately in parallel to kmeans. leiden will require the original python library and is approximately similar to louvain with respect to calculation time, maybe a bit slower. leiden is an enhancement of louvain, though louvain does not produce "wrong" results per se. flowClust, MUDAN, and hclust have not been tested extensively. For kmeans, hclust, flowClust the number of expected cluster can be provided. louvain and leiden take a resolution parameter (lower resolution --> less clusters). MUDAN takes the a number of nearest neighbors (less neighbors --> more clusters). An initial PCA may not be required if the selected channels are chosen thoughtfully (only those with stained markers). If you decide though to simply use all channels (with and and without stained marker) for dimension reduction, PCA will automatically lead to focus on the ones with highest variation (so those channels which contain staining on a subset of events). In this case you may set n.pca.dims roughly equal to the number of channels which contain a staining.

Examples

if (FALSE) {
############################
### Plot cluster markers ###
############################
dr <- fcexpr::dr_to_fcs(ff.list = ffs,
channels = channels,
louvain__resolution = 0.5,
run.lda = "louvain_0.5",
calc.cluster.markers = c("louvain_0.5"),
save.path = NULL)
marker <- dr[[3]][[1]][[1]]
marker$channel_desc2 <- sapply(strsplit(marker$channel_desc, "_"), "[", 1)
marker <-
 marker %>%
 dplyr::mutate(pvalue = ifelse(pvalue == 0, 1e-300, marker$pvalue)) %>%
 dplyr::group_by(channel_desc2) %>%
 dplyr::mutate(mean_scaled = fcexpr:::min.max.normalization(mean))

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = -log10(pvalue))) +
 geom_tile(color = "black") +
 theme_bw() +
 geom_text(aes(label = diff_sign)) +
 scale_fill_viridis_c()

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = mean_diff)) +
 geom_tile(color = "black") +
 theme_bw() +
 geom_text(aes(label = diff_sign)) +
 scale_fill_viridis_c()

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = mean_scaled)) +
 geom_tile(color = "black") +
 theme_bw() +
 scale_fill_viridis_c()

ggplot(marker, aes(x = mean_diff, y = -log10(pvalue), label = channel_desc2)) + #color = channel_desc2,
 geom_point() +
 theme_bw() +
 labs(title = "cluster markers (vs all other cells each)") +
 ggrepel::geom_text_repel(max.overlaps = 20, show.legend = F) +
 theme(panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank(), strip.background = element_rect(fill = "hotpink2")) +
 geom_vline(xintercept = 0, col = "tomato2", linetype = "dashed") +
 geom_hline(yintercept = 100, col = "tomato2", linetype = "dashed") +
 facet_wrap(vars(cluster))

 ##############################################
 ### find clusters which may be subject #######
 ### to bi- or multimodality in any channel ###
 ##############################################
 # make one data frame
 marker_all <- dplyr::bind_rows(lapply(names(dr[["marker"]]), function(x) dplyr::mutate(dr[["marker"]][[x]][["marker_table"]], clustering = x)))

 # sort by diptest p value; or low p indicates bi- or multimodality
 marker_all <- dplyr::arrange(marker_all, diptest_p)
 # see ?diptest::dip.test
 # multimodality indicates heterogeneity within in the cluster
 # and may justify to separate that cluster further into sub-clusters
 # this depends on the interpretation of the scientist though


}