Integration of Mechanistic Immunological Knowledge into a Machine Learning Pipeline Increases Predictive Power.

Integration of Mechanistic Immunological Knowledge into a Machine Learning Pipeline Increases Clinical Predictive Power


Anthony Culos
, Amy S. Tsai, Natalie Stanley, Martin Becker, Mohammad S. Ghaemi, David R. McIlwain, Ramin Fallahzadeh, Athena Tanada, Huda Nassar, Edward Ganio, Laura Peterson, Xiaoyuan Han, Ina Stelzer, Kazuo Ando, Dyani Gaudilliere, Thanaphong Phongpreecha, Gary M. Shaw, David K. Stevenson, Sean Bendall, Kara Davis, Wendy Fantl, Garry P. Nolan, Trevor Hastie, Robert Tibshirani, Martin S. Angst, Brice Gaudilliere, Nima Aghaeepour. Submitted for Review, 2020.

Motivation: The dense network of interconnected cellular signaling responses quantifiable in peripheral immune cells provide a wealth of actionable immunological insight. While flow cytometry techniques, including mass cytometry, have matured to a point that enable detailed immune profiling of patients in numerous clinical settings, limited cohort size together with the high dimensionality of data increase the possibility of false positive discoveries and model overfitting. We introduce a machine learning platform, the immunological Elastic-Net (iEN), which accounts for immunological knowledge as continuous priors. This algorithm integrates these priors to guide the optimization of predictive models.

Results: Repeated 10-fold cross-validation approach for three analyses, two clinical data sets and a synthetic data set all show improvements of the iEN over the traditional EN as well as other contemporary machine learning algorithms. The first analysis identified biomarkers for fetal development in a Longitudinal Term Pregnancy study, which includes a validation cohort. The second example was a classification analysis, modeling patient and control populations for Chronic Periodontitis. Synthetic data was generated to replicate mass cytometry measurements in large clinical settings to enable algorithm comparison.

Availability: 

  • Processed mass cyometry data as .csv files: download
  • Raw mass cytometry data for the LTP study’s training set: download
  • Raw mass cytometry data for the LTP study’s validation set: download
  • Raw mass cytometry data for the ChP study: download
  • Aggregated prior knowledge from expert immunologists for Chronic Periodontitis dataset: download
  • Aggregated prior knowledge from expert immunologists for Longitudinal Term Pregnancy dataset: download
  • iEN documentation + scripts + README for use and reproduction of results: download
  • iEN R Software package as .tar.gz: download
  • Public github repository with source code link

Start typing and press Enter to search