In a comprehensive benchmarking study, SigProfilerExtractor has emerged as the top-performing tool for de novo extraction of mutational signatures, surpassing 13 other tools in both noiseless and noisy datasets. This article, published in Cell, details how SigProfilerExtractor excels in identifying true positive (TP) signatures while minimizing false positives (FP), demonstrating superior performance compared to its competitors.
Benchmarking Results
The large-scale benchmarking included synthetic datasets mimicking whole-genome sequencing (WGS) and whole-exome sequencing (WES) cancers. SigProfilerExtractor showcased its robustness and precision across various scenarios:
- High TP Signature Detection: SigProfilerExtractor consistently identified more TP signatures with almost no FP signatures.
- Factorization and Model Selection: Utilizing a custom multiplicative update algorithm, it supports three objective functions and employs GPU-based factorization via PyTorch. This advanced approach ensures accurate and efficient extraction of mutational signatures.
- Noise Resistance: SigProfilerExtractor’s model selection is resilient to noise, maintaining high precision even as noise levels increase.
Detailed Comparisons
- Correlation and Overlapping Signatures: In scenarios involving correlated and overlapping signatures (like SBS2, SBS7a, and SBS7b), SigProfilerExtractor achieved an impressive average F1 score of 0.96, outperforming the next best tools which scored below 0.90.
- Automatic Model Selection: Unlike several other tools that rely on manual selection, SigProfilerExtractor uses an automated noise-resistant method, enhancing reliability and reducing user dependence.
Biological Insights
SigProfilerExtractor’s effectiveness is further highlighted in real-world applications. When applied to 23,827 sequenced cancers from the PCAWG dataset, it identified four novel mutational signatures, including SBS92, associated with tobacco smoking in bladder cancer. This signature’s detection emphasizes the tool’s capability to uncover significant biological insights that other methods might miss.
