This vignette will demonstrate the integration of spectral flow cytometry (SFC) and CyTOF protein expression measurements using cyCombine.



In this vignette, we will analyze the healthy donor PBMC SFC and CyTOF data, which is also presented in the three-platform vignette.

The SFC data is from Park et al. (2020) available from FlowRepository (ID: FR-FCM-Z2QV). We pre-gated to live single cells in FlowJo version 10 (Tree Star Inc). Singlets and non-debris were identified using forward and side-scatter. Dead cells were excluded using live/dead stains. Data from these gates were then exported in FCS format.

For the CyTOF data, we use the data from a single healthy donor processed at the Human Immune Monitoring Center. The sample was also derived from FlowRepository (ID: FR-FCM-ZYAJ) and pre-gated to live intact singlets in FlowJo version 10 (Tree Star Inc).


We start by loading some packages:




Loading data

We start by defining some colors to use.


Spectral flow cytometry data pre-processing

We are now ready to load the spectral flow data into a tibble.

Now, we a single sample consisting of 582,005 cells. We now want to generate some cell labels using the overlapping markers

Based on these plots and the UMAP, it is possible to define labels for many of the clusters, although some also appear strange. This includes the very small cluster 2, which could be B-T cell doublets (CD19+CD3+). Cluster 22 was also very small and expressed only CD25, CD127, CD45RA. Cluster 24 also seemed very mixed with bimodal CD14 and CD11c distributions. Finally cluster 29 had only 3 cells, and was left unlabeled.


Regarding cluster 10, this was also tricky, but considering both its UMAP location and the intermediate level of CD4, which is comparable to clusters 1, 2, and 5, we are comfortable with labeling these as myeloid cells. The rest of the labels are assigned below.

After removal of the unlabeled cells, we have 573,397 cells remaining (98.5 %) in the SFC dataset and this portion of the data is now ready for batch correction. We will now look at the CyTOF data.


CyTOF data pre-processing

Then it is time to read the CyTOF data. We use a single sample (ctrls-001) from FlowRepository: FR-FCM-ZYAJ. We downloaded the version normalized with MATLAB and pre-gated it to live intact singlets using FlowJo.

Now, we a single sample consisting of 174,601 cells. Similarly to the other datasets, we now need to generate some cell labels - based on the overlapping markers only. Let us look at this:


Now we assign labels to each of the clusters. There are some labels which we did not have for the SFC data.



Batch correction

Now, we are ready to combine the two datasets on the overlapping columns. As discussed above, there are 26 markers which overlap between the sets. In this example, we do not downsample to have the same number of cells from each platform, but instead analyze all available cells.


Now, batch correction can be performed with cyCombine. Because these are datasets of completely different platforms, we use rank as the normalization method - similar to what is done for the three-platform integration including CITE-seq data. In addition, we have also selected to use a 3x3 grid for the batch correction, as this looked much better when inspecting the density plots.



Evaluating batch correction

We can now evaluate the correction using the EMD reduction - first, we apply clustering and then evaluate each marker in each cluster.


We also use the MAD score for evaluation:

The MAD score is:  0.05 


For this integration, the EMD reduction is 0.72 and the MAD score is 0.05, which are very satisfactory values.


However, as one should always do we will also visualize the correction with plots. First, the marker distributions before and after:

Let us also see the UMAPs for uncorrected and corrected data colored by batch. We downsample a bit here so it’s easier to see what is going on.

Now, we also have labels for each of the datasets, which were generated independently of the batch correction. Let us have some visualization with these.

Visualization


We can also show the UMAPs faceted by technology, where cells are colored by their expression of the 26 overlapping markers:


Relabeling corrected data

To determine if it is possible to obtain a similar cell labeling after correction, we will now re-cluster and label the combined, corrected dataset.


Now we assign labels to each of the clusters. There are some labels which we did not have for the separate data sets,


Now that we have these labels, we can plot the corrected UMAP with them.


Finally, let us compare the fractions of the ‘corrected’ labels with those from the uncorrected, separately labeled sets.


From this comparison, it is clear that most proportions of the uncorrected-and-labeled sets are well-maintained in the corrected-and-labeled set. There are some discrepancies, but some of those are also found for labels, where there was not necessarily a 100 % clear distinction between + and - (e.g. for NK cells and their expression of CD57). We also note that even while some populations are not found for the corrected set, that does not mean that those cells would not be identifiable if using a higher number of meta-clusters.

In conclusion, even more detailed clusters are well-preserved after batch correction with cyCombine.

 

Contact