Somatic CAG Expansion: AI Screens 75 Papers for Drug Candidates
Experiment #4 | April 7, 2026
Experiment Card
Can AI screen 75 research papers and rank drug candidates targeting somatic CAG repeat expansion, the hottest frontier in HD therapeutics?
Why Somatic Expansion?
GWAS identified DNA repair genes as the strongest modifiers of HD onset. Natural variants in these genes shift onset by 6 to 8 years. That discovery changed the field: the disease is not just about the inherited CAG length. It is about ongoing expansion in the brain.
The pathway: MutSbeta (MSH3) recognizes CAG slip-out loops. MutLgamma (MLH1) nicks the DNA. Pol-delta adds extra repeats. LIG1 seals the nick. FAN1 opposes expansion by promoting contraction. Each step is a potential drug target.
Multiple companies are now developing drugs against this pathway: LoQus23 (MSH3), Harness Therapeutics (MSH3), Skyhawk Therapeutics (splicing), and Rgenta Therapeutics (splicing). This experiment asks: what does the published evidence say about which targets and modalities are most promising?
Target Rankings
| Target | Papers | Druggability | Most Advanced | Key Challenge |
|---|---|---|---|---|
| MSH3 | 7 | High (multiple siRNA/shRNA) | Gene knockdown in animal models | CNS delivery, off-target effects |
| PMS1 | 4 | High (splicing modulation) | ASOs / small molecules | Preserving basal splicing function |
| FAN1 | 3 | Medium (antagomir approach) | Antagomir targeting mRNA | Pleiotropic toxicity risk |
| MSH2 | 2 | Medium (knockdown shown) | Gene knockdown in animals | Core MMR component, systemic toxicity |
| MLH1 | 2 | Medium (knockdown shown) | Gene knockdown in cells | Vital for genome stability, narrow window |
Drug Candidates Ranked
Gene Silencing
siRNA/shRNA targeting MSH3
Preclinical (Animal Model)
Multiple studies show that reducing MSH3 expression robustly prevents somatic CAG repeat expansion. One study demonstrated 78.1% reduction in striatal expansion with di-siRNA delivery.
Splicing Modulation
Splice Modulators (PMS1)
Preclinical (Cell Model)
Modulating splicing of PMS1 and HTT alleviates polyQ toxicity. ASOs and small molecules both demonstrated in cell models.
Gene Silencing
A4(P10A) shRNA
Preclinical (Animal Model)
Directly targets the CAG repeat tract in HTT/ATXN3 mRNA. Addresses the source of instability rather than the repair machinery.
Small Molecule
NA / CFZ / TZD
Preclinical (Cell Model)
Oral small molecules targeting pathway-level intervention. Mechanism varies by compound, but all show ability to alleviate polyQ toxicity in cell models.
Novel Hypotheses
PCNA-FAN1 Interaction Targeting
Novelty: HighTargeting the PCNA-FAN1 interaction point could simultaneously inhibit both the repair process and the replication stress associated with unstable repeats.
Transcription Factor Inhibition for Cell-Type-Specific MMR
Novelty: Medium-HighInhibiting transcription factors responsible for cell-type-specific MMR dysregulation could be a more upstream and safer intervention than directly targeting MMR genes.
Epigenetic Silencing of MSH3 Promoter
Novelty: HighEpigenetic modifiers to silence MSH3 promoter regions in a dose-dependent manner could offer better control and reversibility than direct gene knockdown.
The Key Insight
This is the first AI-driven drug screen specifically targeting somatic CAG expansion in HD. Across 75 papers and 4.97 million characters, Gemma 4 identified 42 drug candidates against 5 validated targets. The strongest signal: MSH3 knockdown via siRNA consistently reduces somatic expansion across multiple studies.
The field is converging on two complementary strategies: reducing expansion (MSH3/PMS1 inhibition) and promoting contraction (FAN1 enhancement). Both may be needed.
Updated Drug Pipeline (via ML Intern)
Cross-referenced with HuggingFace ML Intern research agent in April 2026. ML Intern confirmed our Experiment #4 findings and surfaced additional clinical pipeline data.
| Rank | Target | Drug / Modality | Developer | Stage |
|---|---|---|---|---|
| 1 | MSH3 | Oral RNA splicing modulator (exon skipping) | Rgenta Therapeutics | IND-enabling (2025-2026) |
| 2 | MSH3 | ASO (intrathecal, RNase H knockdown) | Ionis / CHDI / MGH | Advanced preclinical (NHP data) |
| 3 | CAG DNA | Naphthyridine-azaquinolone (binds CAG hairpins, promotes contraction) | NIH / Nagoya University | Proof-of-concept (mouse) |
| 4 | FAN1 | Small molecule activator (enhance nuclease activity) | CHDI-funded consortium | HTS screening |
| 5 | PMS1/PMS2 | MutL inhibitor | None dedicated | Concept only |
Scale Comparison
| Metric | EXP-001 | EXP-002 | EXP-003 | EXP-004 |
|---|---|---|---|---|
| Papers | 22 abstracts | 16 full | 16 full | 75 (55 full + 20 abstract) |
| Characters | ~50K | 1.9M | 1.9M | 4.97M |
| Focus | General HD | General HD | Model comparison | Somatic expansion drug screen |
| Drug candidates | 12 | 12 | 12 | 42 |
| Model | Llama 3.1 8B | Llama 3.1 8B | Gemma 4 26B | Gemma 4 26B |
Limitations
- Single model, single run. Different prompt or temperature could produce different results.
- Not reviewed by HD domain experts or medicinal chemists.
- Drug rankings reflect AI assessment, not clinical validation.
- Only open-access papers analyzed. Paywalled papers were excluded.
- Some papers had JSON parsing errors that were fixed mid-run.
- Abstract-only papers (20 of 75) contribute lower-confidence analysis.
Target Structures · Exp #1 · Exp #2 · Exp #3 · Research Tracker · Dashboard