This vignette will demonstrate the integration of spectral flow cytometry (SFC) and CyTOF protein expression measurements using cyCombine.

In this vignette, we will analyze the healthy donor PBMC SFC and CyTOF data, which is also presented in the three-platform vignette.

The SFC data is from Park et al. (2020) available from FlowRepository (ID: FR-FCM-Z2QV). We pre-gated to live single cells in FlowJo version 10 (Tree Star Inc). Singlets and non-debris were identified using forward and side-scatter. Dead cells were excluded using live/dead stains. Data from these gates were then exported in FCS format.

For the CyTOF data, we use the data from a single healthy donor processed at the Human Immune Monitoring Center. The sample was also derived from FlowRepository (ID: FR-FCM-ZYAJ) and pre-gated to live intact singlets in FlowJo version 10 (Tree Star Inc).

We start by loading some packages:

library(cyCombine)
library(tidyverse)

# Set a seed for reproducibility
seed <- 9369

Loading data

We start by defining some colors to use.

# Define some nice colors for plotting
color_clusters <- c("#DC050C", "#FB8072", "#1965B0", "#7BAFDE", "#882E72",
                    "#B17BA6", "#FF7F00", "#FDB462", "#E7298A", "#E78AC3",
                    "#33A02C", "#B2DF8A", "#55A1B1", "#8DD3C7", "#A6761D",
                    "#E6AB02", "#7570B3", "#BEAED4", "#666666", "#999999",
                    "#aa8282", "#d4b7b7", "#8600bf", "#ba5ce3", "#808000",
                    "#aeae5c", "#1e90ff", "#00bfff", "#FAE174", "#56ff0d",
                    "#ffff00", "#D4E1C8", "#D470C8", "#64C870", "#0AC8D4",
                    "#64C80C", "#641400", "#000000", "#0C00FA", "#7800FA",
                    "#FA00FA", "#00FAFA", "#707A00", "#0C0C91", "#B5651D")

Spectral flow cytometry data pre-processing

We are now ready to load the spectral flow data into a tibble.

# Directory with raw .fcs files
data_dir <- "Park_et_al_2020_Live+/"

# Panel and reading data
sfc_panel <- read_csv(paste0(data_dir, "/panel_Park2020.csv"))

sfc_markers <- sfc_panel %>%
  dplyr::filter(Type != "none") %>%
  pull(Marker) %>%
  str_remove_all("[ _-]")

spectral <- prepare_data(data_dir = data_dir,
                         metadata = paste0(data_dir, "/Spectral samples cohort.xlsx"),
                         filename_col = "FCS_name",
                         batch_ids = "Batch",
                         condition = "Set",
                         sample_ids = "Patient id",
                         cofactor = 6000,
                         derand = FALSE,
                         markers = sfc_markers,
                         down_sample = FALSE)


# Subset to a single sample for simplicity
spectral <- spectral %>%
  dplyr::filter(sample == "303444")

Now, we a single sample consisting of 582,005 cells. We now want to generate some cell labels using the overlapping markers

# Defining the set of overlapping markers - there are 26
overlap_markers <- c('CD3', 'CD4', 'CD8', 'CD25', 'CCR7', 'CD45RA', 'TCRgd', 'CD27', 'CD19', 'CD20', 'CD24', 'IgD', 'CXCR5', 'CD56', 'CD57', 'CD16', 'CD38', 'CD28', 'CCR6', 'CD127', 'HLADR', 'CD14', 'CD11c', 'CD123', 'CXCR3', 'PD1')

# Time to label the cells. Here, we will apply a SOM and ConsensusClusterPlus

# Clustering with kohonen 10x10
set.seed(seed)
som_ <- spectral %>%
  dplyr::select(dplyr::all_of(overlap_markers)) %>%
  as.matrix() %>%
  kohonen::som(grid = kohonen::somgrid(xdim = 10, ydim = 10),
               rlen = 10,
               dist.fcts = "euclidean")


cell_clustering_som <- som_$unit.classif
codes <- som_$codes[[1]]

# Meta-clustering with ConsensusClusterPlus
mc <- ConsensusClusterPlus::ConsensusClusterPlus(t(codes), maxK = 35, reps = 100,
                                                 pItem = 0.9, pFeature = 1, plot = F,
                                                 clusterAlg = "hc", innerLinkage = "average", 
                                                 finalLinkage = "average",
                                                 distance = "euclidean", seed = seed)

# Get cluster ids for each cell - here we choose to look at 30 meta-clusters but we will merge some in the next step
code_clustering1 <- mc[[30]]$consensusClass
spectral$label <- as.factor(code_clustering1[cell_clustering_som])

# Down-sampling for plot (20,000 cells)
set.seed(seed)
spectral_sliced <- spectral %>%
  dplyr::slice_sample(n = 20000)

# Let us visualize these clusters on a UMAP
umap <- spectral_sliced %>% plot_dimred(name = 'SFC clusters')
spectral_sliced <- cbind.data.frame(umap$data, spectral_sliced)

ggplot(spectral_sliced, aes(x = UMAP1, y = UMAP2)) +
  geom_point(aes(color = label), alpha = 0.3, size = 0.4) +
  theme_bw() + theme(plot.title = element_text(hjust = 0.5)) +
  scale_color_manual(values = color_clusters) +
  guides(colour = guide_legend(override.aes = list(alpha = 1, size = 1)))

# And now let us use violin plots for the assignment of labels for each population
p <- list()
for (m in overlap_markers) {
  p[[m]] <- ggplot(spectral, aes_string(x = 'label', y = m)) +
    geom_violin(aes(fill = label)) +
    scale_fill_manual(values = color_clusters) +
    theme_bw()
}

p2 <- ggpubr::ggarrange(plotlist = p, common.legend = T)

p2

Based on these plots and the UMAP, it is possible to define labels for many of the clusters, although some also appear strange. This includes the very small cluster 2, which could be B-T cell doublets (CD19+CD3+). Cluster 22 was also very small and expressed only CD25, CD127, CD45RA. Cluster 24 also seemed very mixed with bimodal CD14 and CD11c distributions. Finally cluster 29 had only 3 cells, and was left unlabeled.

Regarding cluster 10, this was also tricky, but considering both its UMAP location and the intermediate level of CD4, which is comparable to clusters 1, 2, and 5, we are comfortable with labeling these as myeloid cells. The rest of the labels are assigned below.

spectral_cl_names <- c('1' = 'B cells', '2' = 'Unlabeled', '3' = 'Naive CD8+ T cells', '4' = 'Memory CD8+ T cells', '5' = 'gdT cells',
                       '6' = 'Memory CD4+ T cells', '7' = 'Naive CD4+ T cells', '8' = 'B cells (IgDlo)', '9' = 'gdT cells (CD57dim)', '10' = 'Memory CD8+ T cells (CD57+)',
                       '11' = 'NK cells (CD16+CD57+)', '12' = 'NK cells (CD16+CD57-)', '13' = 'NK cells (CD16-CD57-)', '14' = 'NK cells (CD16-CD57+)', '15' = 'Memory CD4+ T cells (CD57+ PD1+)',
                       '16' = 'NK cells (CD16-CD57-)', '17' = 'pDCs', '18' = 'Tregs', '19' = 'pDCs', '20' = 'B cells (CD19lo CD38++)',
                       '21' = 'CD3lo gdT cells' , '22' = 'Unlabeled', '23' = 'pDCs (CD38+)', '24' = 'Unlabeled', '25' = 'Memory CD4+ T cells',
                       '26' = 'mDCs' , '27' = 'Classical monocytes', '28' = 'Non-classical monocytes', '29' = 'Unlabeled', '30' = 'Classical monocytes') 

# c('1' = '', '2' = '', '3' = '', '4' = '', '5' = '',
#                        '6' = '', '7' = '', '8' = '', '9' = '', '10' = '',
#                        '11' = '', '12' = '', '13' = '', '14' = '', '15' = '',
#                        '16' = '', '17' = '', '18' = '', '19' = '', '20' = '',
#                        '21' = '' , '22' = '', '23' = '', '24' = '', '25' = '',
#                        '26' = '' , '27' = '', '28' = '', '29' = '', '30' = '')


spectral$label <- spectral_cl_names[spectral$label]

# Removing the unlabeled cells
spectral <- spectral %>%
  dplyr::filter(label != 'Unlabeled')

After removal of the unlabeled cells, we have 573,397 cells remaining (98.5 %) in the SFC dataset and this portion of the data is now ready for batch correction. We will now look at the CyTOF data.

CyTOF data pre-processing

Then it is time to read the CyTOF data. We use a single sample (ctrls-001) from FlowRepository: FR-FCM-ZYAJ. We downloaded the version normalized with MATLAB and pre-gated it to live intact singlets using FlowJo.

### Get HIMC cytof data for one sample
# Panel and reading data
cytof_panel <- read_csv("cytof_panel_HIMC.csv")

cytof_markers <- cytof_panel %>%
  dplyr::filter(Type != "none") %>%
  pull(Marker) %>%
  str_remove_all("[ _-]")

# Read the fcs file and make it a dataframe
cytof <- prepare_data(data_dir = '.',
                      cofactor = 5,
                      derand = T,
                      markers = cytof_markers,
                      down_sample = FALSE,
                      batch_ids  = c('CyTOF'),
                      sample_ids = c('Sample1'))

Now, we a single sample consisting of 174,601 cells. Similarly to the other datasets, we now need to generate some cell labels - based on the overlapping markers only. Let us look at this:

# Subset to overlap
cytof <- cytof %>%
  select(all_of(c(overlap_markers, 'sample', 'batch')))

# Time to label the cells. Here, we will again apply a SOM and ConsensusClusterPlus

# Clustering with kohonen
set.seed(seed)
som_ <- cytof %>%
  dplyr::select(dplyr::all_of(overlap_markers)) %>%
  as.matrix() %>%
  kohonen::som(grid = kohonen::somgrid(xdim = 10, ydim = 10),
               rlen = 10,
               dist.fcts = "euclidean")


cell_clustering_som <- som_$unit.classif
codes <- som_$codes[[1]]

# Meta-clustering with ConsensusClusterPlus
mc <- ConsensusClusterPlus::ConsensusClusterPlus(t(codes), maxK = 35, reps = 100,
                                                 pItem = 0.9, pFeature = 1, plot = F,
                                                 clusterAlg = "hc", innerLinkage = "average", 
                                                 finalLinkage = "average",
                                                 distance = "euclidean", seed = seed)

# Get cluster ids for each cell - here we also look at 30 meta-clusters but we will merge some in the next step
code_clustering1 <- mc[[30]]$consensusClass
cytof$label <- as.factor(code_clustering1[cell_clustering_som])

# Down-sampling for plot (20,000 cells)
set.seed(seed)
cytof_sliced <- cytof %>%
  dplyr::slice_sample(n = 20000)

# Let us visualize these clusters on a UMAP
umap <- cytof_sliced %>% plot_dimred(name = 'CyTOF clusters')
cytof_sliced <- cbind.data.frame(umap$data, cytof_sliced)

ggplot(cytof_sliced, aes(x = UMAP1, y = UMAP2)) +
  geom_point(aes(color = label), alpha = 0.3, size = 0.4) +
  theme_bw() + theme(plot.title = element_text(hjust = 0.5)) +
  scale_color_manual(values = color_clusters) +
  guides(colour = guide_legend(override.aes = list(alpha = 1, size = 1)))

# And now let us use violin plots for the assignment of labels for each population
p <- list()
for (m in overlap_markers) {
  p[[m]] <- ggplot(cytof, aes_string(x = 'label', y = m)) +
    geom_violin(aes(fill = label)) +
    scale_fill_manual(values = color_clusters) +
    theme_bw()
}

p2 <- ggpubr::ggarrange(plotlist = p, common.legend = T)

p2

Now we assign labels to each of the clusters. There are some labels which we did not have for the SFC data.

cytof_cl_names <- c('1' = 'B cells', '2' = 'Naive CD4+ T cells', '3' = 'Memory CD4+ T cells (CD38+)', '4' = 'Memory CD4+ T cells', '5' = 'B cells (CD19lo CD38++)',
                       '6' = 'B cells (IgDlo)', '7' = 'Dbl. neg. T cells', '8' = 'Dbl. neg. T cells', '9' = 'Memory CD8+ T cells (CD57+)', '10' = 'Naive CD8+ T cells',
                       '11' = 'Memory CD8+ T cells', '12' = 'Tregs', '13' = 'CD8+ TEMRA cells', '14' = 'gdT cells', '15' = 'Tregs',
                       '16' = 'NK cells (CD16-CD57+)', '17' = 'gdT cells (CD57dim)', '18' = 'NK cells (CD16-CD57-)', '19' = 'Memory CD8+ T cells', '20' = 'Memory CD8+ T cells (CD38+)',
                       '21' = 'NK cells (CD16+CD57+)' , '22' = 'CD8+ TEMRA cells', '23' = 'Memory CD4+ T cells (CD57+)', '24' = 'mDCs', '25' = 'Classical monocytes',
                       '26' = 'NK cells (CD16+CD57-)' , '27' = 'pDCs', '28' = 'Non-classical monocytes', '29' = 'mDCs', '30' = 'mDCs')


cytof$label <- cytof_cl_names[cytof$label]

Batch correction

Now, we are ready to combine the two datasets on the overlapping columns. As discussed above, there are 26 markers which overlap between the sets. In this example, we do not downsample to have the same number of cells from each platform, but instead analyze all available cells.

# Get overlapping column names
overlap_cols <- intersect(colnames(cytof), colnames(spectral))

# Make one tibble - here, we downsample to have the same number of cells from each platform
uncorrected <- bind_rows(cytof[,overlap_cols], spectral[,overlap_cols]) %>%
  mutate(id = 1:(nrow(cytof)+nrow(spectral)))

Now, batch correction can be performed with cyCombine. Because these are datasets of completely different platforms, we use rank as the normalization method - similar to what is done for the three-platform integration including CITE-seq data. In addition, we have also selected to use a 3x3 grid for the batch correction, as this looked much better when inspecting the density plots.

# Run batch correction
corrected <- uncorrected %>%
  batch_correct(seed = seed,
                xdim = 3,
                ydim = 3,
                norm_method = 'rank',
                ties.method = 'average')

# Let's make a meaningful ordering of the labels
label_levels <- c('B cells', 'B cells (CD19lo CD38++)', 'B cells (IgDlo)', 'Naive CD4+ T cells', 'Memory CD4+ T cells', 'Memory CD4+ T cells (CD57+)', 'Memory CD4+ T cells (CD57+ PD1+)', 'Memory CD4+ T cells (CD38+)', 'Tregs', 'Naive CD8+ T cells',  'Memory CD8+ T cells', 'Memory CD8+ T cells (CD57+)', 'Memory CD8+ T cells (CD38+)', 'CD8+ TEMRA cells', 'Dbl. neg. T cells', 'gdT cells', 'gdT cells (CD57dim)', 'CD3lo gdT cells', 'NK cells (CD16-CD57-)', 'NK cells (CD16-CD57+)', 'NK cells (CD16+CD57-)', 'NK cells (CD16+CD57+)', 'Classical monocytes', 'Non-classical monocytes', 'mDCs', 'pDCs', 'pDCs (CD38+)')

# Add labels back from uncorrected
uncorrected$label <- factor(uncorrected$label, levels = label_levels)
corrected$label <- uncorrected$label

Evaluating batch correction

We can now evaluate the correction using the EMD reduction - first, we apply clustering and then evaluate each marker in each cluster.

# Generate labels for EMD evaluation
labels <- corrected %>%
  create_som(seed = seed,
             xdim = 8,
             ydim = 8,
             markers = get_markers(corrected))

# Add labels
corrected <- corrected %>%
  dplyr::mutate(som = labels)
celltype_col <- "som"

uncorrected <- corrected %>%
  dplyr::select(id, all_of(celltype_col)) %>%
  dplyr::left_join(uncorrected, by = "id")


# Evaluate Earth Movers Distance
emd <- uncorrected %>%
  evaluate_emd(corrected, markers = get_markers(corrected), cell_col = celltype_col)

# Show plots
cowplot::plot_grid(emd$violin, emd$scatterplot)

We also use the MAD score for evaluation:

# MAD
mad <- uncorrected %>%
  cyCombine::evaluate_mad(corrected,
                          markers = get_markers(corrected),
                          cell_col = celltype_col,
                          filter_limit = NULL)

cat('The MAD score is: ', mad$score, '\n')

The MAD score is:  0.05

For this integration, the EMD reduction is 0.72 and the MAD score is 0.05, which are very satisfactory values.

However, as one should always do we will also visualize the correction with plots. First, the marker distributions before and after:

plot_density(uncorrected, corrected, ncol = 4, xlims = c(-2, 9))

Let us also see the UMAPs for uncorrected and corrected data colored by batch. We downsample a bit here so it’s easier to see what is going on.

# Down-sampling for plot (20,000 cells from each platform)
set.seed(seed)
uncorrected_sliced <- uncorrected %>%
  dplyr::group_by(batch) %>%
  dplyr::slice_sample(n = 20000) %>%
  dplyr::ungroup() %>%
  dplyr::arrange(id)

corrected_sliced <- corrected %>%
  dplyr::filter(id %in% uncorrected_sliced$id)


# UMAP plot uncorrected
umap1 <- uncorrected_sliced %>%
  plot_dimred(name = "uncorrected", type = "umap")

# UMAP plot corrected
umap2 <- corrected_sliced %>%
  plot_dimred(name = "corrected", type = "umap")

# Show plots
cowplot::plot_grid(umap1, umap2)

Now, we also have labels for each of the datasets, which were generated independently of the batch correction. Let us have some visualization with these.

Visualization

# For uncorrected data, let's make a combined object
uncor_df <- cbind.data.frame(umap1$data[,1:2], uncorrected_sliced)

umap3 <- ggplot(uncor_df, aes(x = UMAP1, y = UMAP2)) +
                geom_point(aes(color = label), alpha = 0.3, size = 0.4) +
                theme_bw() + theme(plot.title = element_text(hjust = 0.5)) +
                ggtitle('UMAP - uncorrected') +
                scale_color_manual(values = color_clusters) + facet_wrap(~batch) +
                guides(color = guide_legend(override.aes = list(alpha = 1, size = 1)))

# For corrected data, let's make a combined object
cor_df <- cbind.data.frame(umap2$data[,1:2], corrected_sliced)

umap4 <- ggplot(cor_df, aes(x = UMAP1, y = UMAP2)) +
                geom_point(aes(color = label), alpha = 0.3, size = 0.4) +
                theme_bw() + theme(plot.title = element_text(hjust = 0.5)) +
                ggtitle('UMAP - corrected') +
                scale_color_manual(values = color_clusters) + facet_wrap(~batch) +
                guides(color = guide_legend(override.aes = list(alpha = 1, size = 1)))


# Show plots
cowplot::plot_grid(umap3, umap4, nrow = 2)

We can also show the UMAPs faceted by technology, where cells are colored by their expression of the 26 overlapping markers:

# Loop over markers to make the plots for both the uncorrected and corrected data
expr_plots <- list()
for (m in get_markers(corrected)) {

  # Find range for marker to make plots comparable
  range <- summary(c(uncor_df[,m], cor_df[,m]))[c(1,6)]

  expr_plots[[paste0(m, '1')]] <- ggplot(uncor_df, aes(x = UMAP1, y = UMAP2)) +
                geom_point(aes_string(color = m), alpha = 0.3, size = 0.4) +
                theme_bw() + theme(plot.title = element_text(hjust = 0.5)) +
                ggtitle('UMAP - uncorrected') +
                scale_color_gradientn(m, limits  = range, colours = colorRampPalette(rev(RColorBrewer::brewer.pal(n = 11, name = "Spectral")))(50)) +
                facet_wrap(~batch)

  expr_plots[[paste0(m, '2')]] <- ggplot(cor_df, aes(x = UMAP1, y = UMAP2)) +
                geom_point(aes_string(color = m), alpha = 0.3, size = 0.4) +
                theme_bw() + theme(plot.title = element_text(hjust = 0.5)) +
                ggtitle('UMAP - corrected') +
                scale_color_gradientn(m, limits = range, colours = colorRampPalette(rev(RColorBrewer::brewer.pal(n = 11, name = "Spectral")))(50)) +
                facet_wrap(~batch)

}

# Show plots
cowplot::plot_grid(plotlist = expr_plots, ncol = 2)

Relabeling corrected data

To determine if it is possible to obtain a similar cell labeling after correction, we will now re-cluster and label the combined, corrected dataset.

# Clustering with kohonen
set.seed(seed)
som_ <- corrected %>%
  dplyr::select(dplyr::all_of(overlap_markers)) %>%
  as.matrix() %>%
  kohonen::som(grid = kohonen::somgrid(xdim = 10, ydim = 10),
               rlen = 10,
               dist.fcts = "euclidean")


cell_clustering_som <- som_$unit.classif
codes <- som_$codes[[1]]

# Meta-clustering with ConsensusClusterPlus
mc <- ConsensusClusterPlus::ConsensusClusterPlus(t(codes), maxK = 45, reps = 100,
                                                 pItem = 0.9, pFeature = 1, plot = F,
                                                 clusterAlg = "hc", innerLinkage = "average", 
                                                 finalLinkage = "average",
                                                 distance = "euclidean", seed = seed)

# Get cluster ids for each cell - here we will look at 40 meta-clusters to attempt to grasp the full range of cell types from both data sources but we will merge some in the next step
code_clustering1 <- mc[[35]]$consensusClass
corrected$new_label <- as.factor(code_clustering1[cell_clustering_som])

# And now let us use violin plots for the assignment of labels for each population
p <- list()
for (m in overlap_markers) {
  p[[m]] <- ggplot(corrected, aes_string(x = 'new_label', y = m)) +
    geom_violin(aes(fill = new_label)) +
    scale_fill_manual(values = color_clusters) +
    theme_bw()
}

p2 <- ggpubr::ggarrange(plotlist = p, common.legend = T)

p2

Now we assign labels to each of the clusters. There are some labels which we did not have for the separate data sets,

corrected_cl_names <- c('1' = 'NK cells (CD16+CD57+)', '2' = 'NK cells (CD16+CD57-)', '3' = 'Naive CD8+ T cells', '4' = 'CD3lo gdT cells', '5' = 'Naive CD4+ T cells',
                        '6' = 'Memory CD8+ T cells (CD57+)', '7' = 'Memory CD8+ T cells', '8' = 'B cells (IgDlo)', '9' = 'B cells', '10' = 'NK cells (CD16-CD57-)',
                        '11' = 'gdT cells (CD57dim)', '12' = 'Dbl. neg. T cells', '13' = 'pDCs', '14' = 'NK cells (CD16-CD57+)', '15' = 'Memory CD8+ T cells (CD57+)',
                        '16' = 'Memory CD4+ T cells (CD57+)', '17' = 'B cells', '18' = 'pDCs (CD38+)', '19' = 'gdT cells', '20' = 'B cells (CD19lo CD38++)',
                        '21' = 'mDCs' , '22' = 'Memory CD8+ T cells (CD38+)', '23' = 'Tregs', '24' = 'Memory CD4+ T cells', '25' = 'Memory CD4+ T cells (CD38+)',
                        '26' = 'mDCs' , '27' = 'Memory CD8+ T cells (CD38+)', '28' = 'Memory CD4+ T cells', '29' = 'Memory CD4+ T cells', '30' = 'Non-classical monocytes',
                        '31' = 'Non-classical monocytes', '32' = 'Classical monocytes', '33' = 'Classical monocytes', '34' = 'Doublets?', '35' = 'Non-classical monocytes')

# Add the new label to the levels
label_levels <- c(label_levels, 'Doublets?')

# Add labels
corrected$new_label <- factor(corrected_cl_names[corrected$new_label], levels = label_levels)

Now that we have these labels, we can plot the corrected UMAP with them.

# For corrected data, let's make a combined object
corrected_sliced <- corrected %>%
  dplyr::filter(id %in% uncorrected_sliced$id)

cor_df <- cbind.data.frame(umap2$data[,1:2], corrected_sliced)

umap5 <- ggplot(cor_df, aes(x = UMAP1, y = UMAP2)) +
                geom_point(aes(color = new_label), alpha = 0.3, size = 0.4) +
                theme_bw() + theme(plot.title = element_text(hjust = 0.5)) +
                ggtitle('UMAP - corrected - reclustered') +
                scale_color_manual(values = color_clusters[c(1:6,8:12,14:30)]) + facet_wrap(~batch) +
                guides(color = guide_legend(override.aes = list(alpha = 1, size = 1)))

# Show plot
umap5

Finally, let us compare the fractions of the ‘corrected’ labels with those from the uncorrected, separately labeled sets.

# Split corrected data in to original parts
corrected_spectral <- corrected %>% 
  dplyr::filter(batch == 'Spectral')

corrected_cytof <- corrected %>% 
  dplyr::filter(batch == 'CyTOF')


# Compare fractions
Spectral_uncor <- (table(factor(spectral$label, levels = label_levels)) / nrow(spectral) ) * 100
Spectral_cor <- (table(factor(corrected_spectral$new_label, levels = label_levels)) / nrow(spectral) ) * 100

CyTOF_uncor <- (table(factor(cytof$label, levels = label_levels)) / nrow(cytof) ) * 100
CyTOF_cor <- (table(factor(corrected_cytof$new_label, levels = label_levels)) / nrow(cytof) ) * 100



# Barplots
ggdf <- reshape2::melt(cbind(Spectral_uncor, Spectral_cor, CyTOF_uncor, CyTOF_cor))
colnames(ggdf) <- c('Cluster', 'Set', 'Percentage')
ggdf$Cluster <- factor(ggdf$Cluster, levels = label_levels)

ggplot(ggdf, aes(x = Set, y = Percentage, fill = Cluster)) +
  geom_bar(stat = "identity") +
  theme_bw() +
  scale_fill_manual(values = color_clusters) +
  ggtitle('Clustering results') +
  theme(plot.title = element_text(hjust = 0.5)) +
  ylab('Percentage of total cells') + xlab('') + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

From this comparison, it is clear that most proportions of the uncorrected-and-labeled sets are well-maintained in the corrected-and-labeled set. There are some discrepancies, but some of those are also found for labels, where there was not necessarily a 100 % clear distinction between + and - (e.g. for NK cells and their expression of CD57). We also note that even while some populations are not found for the corrected set, that does not mean that those cells would not be identifiable if using a higher number of meta-clusters.

In conclusion, even more detailed clusters are well-preserved after batch correction with cyCombine.

Contact

Integrating spectral flow and CyTOF data

Christina Bligaard Pedersen

Last updated on Febraury 10, 2022