Prepare a list of flowframes (ff.list) with fcexpr::wsp_get_ff or fcexpr::inds_get_ff. The objects returned from these function will compatible with fcexpr::dr_to_fcs. The list must contain logicle transformed fluorescence intensities (FI) and optionally may contain corresponding inverse transformed FI (which is the transformation you see in flowjo). For dimension reduction logicle transformed FI are used. Currently, this is obligatory. This transformation, optionally additional scaling operations alogn the way (scale.whole, scale.samples) as well as cluster annotation will be written to the resulting (concatenated) fcs file for manual inspection in flowjo. Certain dimension reduction algorithms and cluster detection algorithm may become slow with a large number of events (e.g. > 1e6, see the details). In order to get a quick impression of what the algorithms can pull out for you, you may use the 'downsample' argument in fcexpr::wsp_get_ff or fcexpr::inds_get_ff to conveniently sample a random subset of events from each fcs files (flowframe).

dr_to_fcs(
  ff.list,
  channels = NULL,
  add.sample.info = NULL,
  scale.whole = c("z.score", "min.max", "none"),
  scale.samples = c("none", "z.score", "min.max"),
  run.harmony = F,
  run.pca = F,
  run.lda = F,
  run.umap = T,
  run.tsne = F,
  run.som = T,
  run.gqtsom = F,
  run.louvain = F,
  run.kmeans = F,
  run.minibatchkmeans = F,
  run.kmeans_arma = F,
  run.kmeans_rcpp = F,
  run.leiden = F,
  run.hclust = F,
  run.flowClust = F,
  run.MUDAN = F,
  n.pca.dims = 0,
  calc.cluster.markers = NULL,
  extra.cols = NULL,
  mc.cores = 1,
  save.to.disk = c("fcs", "rds"),
  save.path = file.path(getwd(), paste0(substr(gsub("\\.", "",
    make.names(as.character(Sys.time()))), 2, 15), "_FCS_dr")),
  exclude.extra.channels = ifelse(length(ff.list) == 1 && names(ff.list) == "logicle",
    "cluster.id", "FSC|SSC|Time|cluster.id"),
  write.scaled.channels.to.FCS = F,
  timeChannel = "Time",
  ...
)

Arguments

ff.list

a list of flowFrames as received from fcexpr::wsp_get_ff (compensated with Compensation Matrix as defined in FlowJo by default) or as received from fcexpr::inds_get_ff (directly from FCS files, not compensated by default)

channels

a named vector of channels to use for dimension reduction. values can be channel names (v-450LP..., b-520..., or so), or channel descriptions (e.g. CD3 or LAG3-PeCy7 for example) names will be used as new description in the fcs file to be written; if no names provided, names of the very first FCS file will be used

add.sample.info

named list of additional channels to identify samples or group them in flowjo; e.g. for 9 fcs files to be concatenated: add.sample.info = list(condition = c(1,2,3,1,2,3,1,2,3), donor = c(1,1,1,2,2,2,3,3,3))

scale.whole

if and how to scale channels after concatenation of flowframes in ff.list

scale.samples

if and how to scale channels of flowframes in ff.list individually before concatenation

run.harmony

attempt batch correction using harmony::HarmonyMatrix; if TRUE, then harmony__meta_data has to be provided in ... indicating the groups to be batch corrected; harmony is conducted before run.pca; to calculate a pca before harmony, pass harmony__do_pca = T and optional a number of pcs with harmony__npcs. Set run.pca = F when a pca is before in harmony.

run.pca

run principle component analysis only, or prior to other dimension reduction (umap, SOM, tSNE, ...)

run.lda

run Linear Discriminant Analysis before dimension reduction; should be F (FALSE) or a clustering calculated before, e.g. louvain_0.8 or leiden_1.1, kmeans_12 etc.; respective clustering calculation has to be provided in ...

run.umap

calculate UMAP dimension reduction with uwot::umap

run.tsne

calculate tsne dimension reduction with Rtsne::Rtsne

run.som

calculate SOM dimension reduction EmbedSOM::SOM followed by EmbedSOM::EmbedSOM

run.gqtsom

calculate GQTSOM dimension reduction EmbedSOM::GQTSOM followed by EmbedSOM::EmbedSOM

run.louvain

detect clusters (communities, groups) of cells with the louvain algorithm, implemented in Seurat::FindClusters (subsequent to snn detection by Seurat::FindNeighbors)

run.kmeans

detect clusters with stats::kmeans

run.minibatchkmeans

detect clusters with ClusterR::MiniBatchKmeans

run.kmeans_arma

detect clusters with ClusterR::KMeans_arma

run.kmeans_rcpp

detect clusters with ClusterR::KMeans_rcpp

run.leiden

detect clusters (communities, groups) of cells with the leiden algorithm, with leiden::leiden (subsequent to snn detection by Seurat::FindNeighbors)

run.hclust

detect clusters with stats::dist, stats::hclust and stats::cutree

run.flowClust

detect clusters with flowClust::flowClust

run.MUDAN

detect clusters with MUDAN::getComMembership

n.pca.dims

number of principle components to calculate

calc.cluster.markers

if NULL nothing is calculated; otherwise marker features (stained markers) are determined by wilcox test using presto::wilcoxauc for the provided clustering(s). each cluster is tested against other events and clusters are compaired pairwise. respective clustering calculation has to be provided in ...; e.g. if louvain__resolution = 0.5 is provided set calc.cluster.markers = louvain_0.5; and if in addition leiden__resolution_parameter = 0.7 then set calc.cluster.markers = c(louvain_0.5, leiden_0.7).

extra.cols

vector of one extra column (or matrix of multiple columns) to add to the final fcs file; has to be numeric; has to be equal to the number of rows in all flowframes provided; colnames will be the channel names in the FCS file; could be a previously calculated dimension reduction or cluster annotation.

mc.cores

mc.cores to calculate clusterings, limited to parallel::detectCores()-1

save.to.disk

what to save to disk: (concatenated) and appended FCS file and/or rds file with several elements in a list

save.path

where to save elements specified in save.to.disk; set to NULL to have nothing written to disk

exclude.extra.channels

when scaled and transform channels are written to FCS file, some channels may be redundant and will only occupy disk space, those are specified here; matched with grepl

write.scaled.channels.to.FCS

do save scaled channels (scale.whole, scale.samples) to FCS file

timeChannel

name of the Time channel to exclude from all analyses and calculation; if NULL will be attempted to be detected automatically

...

additional parameters to calculations of UMAP, tSNE, SOM, GQTSOM, EmbedSOM, louvain, leiden, harmony, flowClust, hclust, MUDAN, kmeans; provide arguments as follows: UMAP__n_neighbors = c(15,20,25), or tsne__theta = 0.3, etc. see respected help files to get to know which arguments can be passed: uwot::umap, Rtsne::Rtsne, EmbedSOM::SOM, EmbedSOM::GQTSOM, EmbedSOM::EmbedSOM, harmony::HarmonyMatrix, flowClust::flowClust, louvain: Seurat::FindNeighbors and Seurat::FindCluster, leiden: Seurat::FindNeighbors and leiden::leiden. hclust: stats::dist and stats::hclust, MUDAN: MUDAN::getComMembership, stats::kmeans, ClusterR::MiniBatchKmeans, ClusterR::KMeans_rcpp, ClusterR::KMeans_arma

Value

A list with 3 elements: (i) The matrix of fluorescence intensities and appended information (dim red, clustering). This is the table which is written into a newly generated fcs file. (ii) A character vector of meaningful column names which may be used for the table in R (rather for convenience). (iii) Tables of marker features (each cluster vs all other events and all clusters pairwise).

Details

Logicle transformation of FI has been described here: Parks, et al., 2006, PMID: 16604519 DOI: 10.1002/cyto.a.20258. Different transformations have been compared for instance here: Transformations. Dimension reduction (low dimension embedding) algorithms are: UMAP, tSNE, SOM, GQTSOM, PCA. tSNE is expected to be too slow for more then 1e4 or 1e5 cells (depending precision set by tsne__theta). UMAP has been developed later and is well-know. SOM and GQTSOM are similar to flowSOM but are much more convenient to use in programming as they accept matrices whereas flowSOM strictly requires flowFrames. Also SOM and GQTSOM may be superior with respect to calculation speed. PCA will be orders of magnitude faster than UMAP especially on large numbers of events (1e6 and above). On the other hand it will not produce as nicely separated clusters as it is a linear algorithm. Cluster (community) detection algorithms are louvain, leiden, kmeans, hclust, flowClust and MUDAN. These assign events with similar properties in the high dimensional space a common discrete label. kmeans will be the quickest. Start with this one and progress to using louvain which is slower but may yield better results. For a low number of events (e.g. below 1e6) louvain will perform in a reasonable amount of time and could used immediately in parallel to kmeans. leiden will require the original python library and is approximately similar to louvain with respect to calculation time, maybe a bit slower. leiden is an enhancement of louvain, though louvain does not produce "wrong" results per se. flowClust, MUDAN, and hclust have not been tested extensively. For kmeans, hclust, flowClust the number of expected cluster can be provided. louvain and leiden take a resolution parameter (lower resolution --> less clusters). MUDAN takes the a number of nearest neighbors (less neighbors --> more clusters). An initial PCA may not be required if the selected channels are chosen thoughtfully (only those with stained markers). If you decide though to simply use all channels (with and and without stained marker) for dimension reduction, PCA will automatically lead to focus on the ones with highest variation (so those channels which contain staining on a subset of events). In this case you may set n.pca.dims roughly equal to the number of channels which contain a staining.

Examples

if (FALSE) {
############################
### Plot cluster markers ###
############################
dr <- fcexpr::dr_to_fcs(ff.list = ffs,
channels = channels,
louvain__resolution = 0.5,
run.lda = "louvain_0.5",
calc.cluster.markers = c("louvain_0.5"),
save.path = NULL)
marker <- dr[[3]][[1]][[1]]
marker$channel_desc2 <- sapply(strsplit(marker$channel_desc, "_"), "[", 1)
marker <-
 marker %>%
 dplyr::mutate(pvalue = ifelse(pvalue == 0, 1e-300, marker$pvalue)) %>%
 dplyr::group_by(channel_desc2) %>%
 dplyr::mutate(mean_scaled = fcexpr:::min.max.normalization(mean))

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = -log10(pvalue))) +
 geom_tile(color = "black") +
 theme_bw() +
 geom_text(aes(label = diff_sign)) +
 scale_fill_viridis_c()

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = mean_diff)) +
 geom_tile(color = "black") +
 theme_bw() +
 geom_text(aes(label = diff_sign)) +
 scale_fill_viridis_c()

ggplot(marker, aes(x = as.factor(cluster), y = channel_desc2, fill = mean_scaled)) +
 geom_tile(color = "black") +
 theme_bw() +
 scale_fill_viridis_c()

ggplot(marker, aes(x = mean_diff, y = -log10(pvalue), label = channel_desc2)) + #color = channel_desc2,
 geom_point() +
 theme_bw() +
 labs(title = "cluster markers (vs all other cells each)") +
 ggrepel::geom_text_repel(max.overlaps = 20, show.legend = F) +
 theme(panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank(), strip.background = element_rect(fill = "hotpink2")) +
 geom_vline(xintercept = 0, col = "tomato2", linetype = "dashed") +
 geom_hline(yintercept = 100, col = "tomato2", linetype = "dashed") +
 facet_wrap(vars(cluster))

 ##############################################
 ### find clusters which may be subject #######
 ### to bi- or multimodality in any channel ###
 ##############################################
 # make one data frame
 marker_all <- dplyr::bind_rows(lapply(names(dr[["marker"]]), function(x) dplyr::mutate(dr[["marker"]][[x]][["marker_table"]], clustering = x)))

 # sort by diptest p value; or low p indicates bi- or multimodality
 marker_all <- dplyr::arrange(marker_all, diptest_p)
 # see ?diptest::dip.test
 # multimodality indicates heterogeneity within in the cluster
 # and may justify to separate that cluster further into sub-clusters
 # this depends on the interpretation of the scientist though


}