Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning
The paper introduces an unsupervised deep learning model called MOSA (Multi-Omic Synthetic Augmentation) that integrates and augments multi-omic datasets from over 1,500 cancer cell lines in the Cancer Dependency Map (DepMap). MOSA uses a conditional variational autoencoder architecture to generate synthetic data that fills in missing measurements and corrects experimental errors across diverse omic datasets including genomics, transcriptomics, proteomics, metabolomics, and drug response. The model outperforms existing methods in reconstructing held-out data and enables downstream analyses like improving statistical power for identifying genetic associations with CRISPR-Cas9 gene essentiality screens. MOSA also provides model interpretability through SHAP analysis, revealing key multi-omic features that contribute to cancer cell states and drug response, including the metabolite 1-methylnicotinamide and its association with epithelial-mesenchymal transition. Overall, MOSA demonstrates the value of unsupervised deep learning for integrating and augmenting large-scale cancer multi-omic datasets to enable more comprehensive discovery of cancer mechanisms and therapeutic targets.