Authors. Natalie Stanley, Ina Stelzer, Ramin Fallahzadeh, Edward Ganio, Amy S. Tsai, Sajjad Ghaemi, Anthony Culos, Xiaoyuan Han, Kristen Rumer, Laura Peterson, Martin Becker, Joe Phongpreecha, Huda Nassar, Alan Chang, Ivana Maric, Dyani Gaudilliere, Eileen Tsai, Kazuo Ando, Ronald J. Wong, Gerlinde Obermoser, Gary Shaw, David Stevenson, Martin Angst, Brice Gaudilliere, Nima Aghaeepour
Motivation. Single cell technologies have reached a level of maturity enabling their implementation in clinical studies. Immune profiling with mass cytometry is a particularly useful technology for gaining a holistic understanding of the immune system. Current bioinformatics methods for the analysis of mass cytometry data focus on automated cell-population discovery but are limited because they 1) are mostly not intended for classification tasks in clinical settings and 2) suffer from a lack of robustness. We introduce VoPo, an end-to-end machine learning pipeline for robust classification and comprehensive visualization from mass cytometry data. VoPo can be used to gain insight into the critical cell-populations associated with a particular clinical outcome.
The main idea. Stochastic clustering algorithms that are traditionally used for automated cell-population discovery from mass cytometry data produce variable results over every run. VoPo leverages this variability by generating a family metaclustering solutions (where each metacluster corresponds to a cell-population). Frequency-based features are engineered for each metaclustering solution and integrated for classification. VoPo then creates a comprehensive immune atlas for visualization by using a large number of cells across all samples and probabilistically mapping statistics about the identified cell-populations onto individual cells.
Results. VoPo was tested in three diverse clinical mass cytometry datasets profiling recovery from hip surgery, normal term pregnancy, and recovery from stroke. VoPo offers increased classification accuracy over current state-of-the-art cell-population discovery methods. Additionally, we show that the choice of clustering algorithm for this task is not as important as how the algorithm is used. For each of the three datasets, we show a comprehensive visualization and discuss the biological relevance of particular cell-populations with a clinical outcome of interest.
Code. Code to reproduce results can be found here. Please see the README for instructions.
Data. Raw FCS files for reproduction of the results are available through the the following flowrepository links:
Normal Term Pregnancy (NTP):
Longitudinal Stroke Recovery (LSR):