A community effort to optimize sequence-based deep learning models of gene regulation

POD

Gurukaelaiarasu Tamilarasi Mani

10/14/2024

A community effort to optimize sequence-based deep learning models of gene regulationA community effort to optimize sequence-based deep learning models of gene regulation

      The article describes a DREAM Challenge where participants trained models to predict gene expression from random DNA sequences. The challenge used a dataset of millions of random promoter sequences and their corresponding expression levels, measured experimentally in yeast. The models were evaluated on a comprehensive suite of benchmarks encompassing various sequence types, including high/low expression, native, random, and sequences designed to test the limits of the models. The top-performing models used neural networks with diverse architectures and training strategies. To understand the impact of design choices, the authors developed a Prix Fixe framework that enabled modular testing of individual model components. The optimized models were then evaluated on Drosophila and human datasets, where they consistently outperformed existing state-of-the-art models. The challenge dataset and the Prix Fixe framework are presented as valuable resources for the field.