Exploring the Promise and Pitfalls of DEL + ML for Drug Discovery

Can machine learning (ML) and DNA-encoded libraries (DEL) work together to uncover new drug-like compounds more efficiently? A recent study presents a nuanced look at the potential and limitations of combining DEL screening with ML to identify novel small molecule binders to protein targets.

DELs are massive collections of small molecules, each tagged with DNA to track what binds to a target protein. They’re great for generating data, which makes them a natural match for ML. This study compared three different DELs, one with 1 billion compounds and two smaller ones (~10 million each) and trained five ML models on them to find compounds that bind to two protein targets (CK1α and CK1δ). The surprising takeaway? Bigger isn’t always better.

The most diverse library (HitGen’s 1B compound set) consistently helped ML models perform better, not just within the data they were trained on, but also when predicting new binders. Another smaller library (DOS-DEL) was also diverse and did okay, but a less diverse 10M-compound library lagged behind.

Among the five ML models tested, deep learning approaches (MLP and ChemProp) outperformed older methods like random forests and SVMs. ChemProp was especially good at staying close to known safe chemical spaces, while MLP explored more diverse regions.

Still, hit confirmation rates were modest. About 10% of predicted compounds turned out to be real binders, most of them weak. But two nanomolar hits were discovered, showing that the pipeline can deliver high-quality leads.

This study makes a strong case that chemical diversity matters more than library size when training ML for drug discovery. It also confirms that deep learning is a better bet than older ML approaches.

The team made their best-performing models and training data freely available on GitHub, so others can build on their work.

Exploring the Promise and Pitfalls of DEL + ML for Drug Discovery

Latest News

Transcription Factor AP-1 Drives Oncogene Expression in Lung Cancer via Promoter–Enhancer Rewiring

Profluent and Ensoma Initiate Collaboration on AI-Designed Base Editors for In Vivo Hematopoietic Stem Cell Therapies

BenchSci and Merck Extend Collaboration to Advance AI-Enabled Biological Discovery

Real-World Evidence From Commercial Use of Lentiviral Gene Therapies in Hemoglobinopathies Shows Increasing Uptake and Ongoing Safety Considerations

Events & Webinars

Advanced Genomic and Single-Cell Platforms Spotlighted Ahead of ASH 2025

“All Genetic Diseases Could be Cured within a Decade”, Hears Dubai Future Forum

QIAGEN Unveils Next-Generation QIAsymphony Connect at AMP 2025

BostonGene Showcases AI-Powered Multimodal Analytics at ESMO Congress 2025 to Advance Cancer Research and Drug Development

Atrandi Biosciences to Showcase Latest Single-Cell Multiomics Advancements at the American Society of Human Genetics 2025 Annual Meeting

About Us

Media Information

Contact Us