arrow_back HD Research Hub

Somatic CAG Expansion: AI Screens 75 Papers for Drug Candidates

Experiment #4 | April 7, 2026

Experiment Card

ID
EXP-004-SOMATIC-CAG
Date
2026-04-07
Type
Somatic CAG Expansion Drug Screen
Status
Complete
Infrastructure
Model: Gemma 4 (26B)
Context: 64K
Hardware: Mac M2 MBP
Cost: $0 (local)
Corpus
Discovered: 123 papers
Relevant: 92 papers
Analyzed: 75 (55 full text, 20 abstract)
Characters: 4,976,515
The Question

Can AI screen 75 research papers and rank drug candidates targeting somatic CAG repeat expansion, the hottest frontier in HD therapeutics?

Why Somatic Expansion?

GWAS identified DNA repair genes as the strongest modifiers of HD onset. Natural variants in these genes shift onset by 6 to 8 years. That discovery changed the field: the disease is not just about the inherited CAG length. It is about ongoing expansion in the brain.

The pathway: MutSbeta (MSH3) recognizes CAG slip-out loops. MutLgamma (MLH1) nicks the DNA. Pol-delta adds extra repeats. LIG1 seals the nick. FAN1 opposes expansion by promoting contraction. Each step is a potential drug target.

Multiple companies are now developing drugs against this pathway: LoQus23 (MSH3), Harness Therapeutics (MSH3), Skyhawk Therapeutics (splicing), and Rgenta Therapeutics (splicing). This experiment asks: what does the published evidence say about which targets and modalities are most promising?

Target Rankings

Target Papers Druggability Most Advanced Key Challenge
MSH37High (multiple siRNA/shRNA)Gene knockdown in animal modelsCNS delivery, off-target effects
PMS14High (splicing modulation)ASOs / small moleculesPreserving basal splicing function
FAN13Medium (antagomir approach)Antagomir targeting mRNAPleiotropic toxicity risk
MSH22Medium (knockdown shown)Gene knockdown in animalsCore MMR component, systemic toxicity
MLH12Medium (knockdown shown)Gene knockdown in cellsVital for genome stability, narrow window

Drug Candidates Ranked

#1

Gene Silencing

siRNA/shRNA targeting MSH3

Preclinical (Animal Model)

Confidence90/100

Multiple studies show that reducing MSH3 expression robustly prevents somatic CAG repeat expansion. One study demonstrated 78.1% reduction in striatal expansion with di-siRNA delivery.

#2

Splicing Modulation

Splice Modulators (PMS1)

Preclinical (Cell Model)

Confidence85/100

Modulating splicing of PMS1 and HTT alleviates polyQ toxicity. ASOs and small molecules both demonstrated in cell models.

#3

Gene Silencing

A4(P10A) shRNA

Preclinical (Animal Model)

Confidence80/100

Directly targets the CAG repeat tract in HTT/ATXN3 mRNA. Addresses the source of instability rather than the repair machinery.

#4

Small Molecule

NA / CFZ / TZD

Preclinical (Cell Model)

Confidence75/100

Oral small molecules targeting pathway-level intervention. Mechanism varies by compound, but all show ability to alleviate polyQ toxicity in cell models.

Novel Hypotheses

PCNA-FAN1 Interaction Targeting

Novelty: High
Score88/100

Targeting the PCNA-FAN1 interaction point could simultaneously inhibit both the repair process and the replication stress associated with unstable repeats.

Transcription Factor Inhibition for Cell-Type-Specific MMR

Novelty: Medium-High
Score82/100

Inhibiting transcription factors responsible for cell-type-specific MMR dysregulation could be a more upstream and safer intervention than directly targeting MMR genes.

Epigenetic Silencing of MSH3 Promoter

Novelty: High
Score78/100

Epigenetic modifiers to silence MSH3 promoter regions in a dose-dependent manner could offer better control and reversibility than direct gene knockdown.

The Key Insight

This is the first AI-driven drug screen specifically targeting somatic CAG expansion in HD. Across 75 papers and 4.97 million characters, Gemma 4 identified 42 drug candidates against 5 validated targets. The strongest signal: MSH3 knockdown via siRNA consistently reduces somatic expansion across multiple studies.

The field is converging on two complementary strategies: reducing expansion (MSH3/PMS1 inhibition) and promoting contraction (FAN1 enhancement). Both may be needed.

Updated Drug Pipeline (via ML Intern)

Cross-referenced with HuggingFace ML Intern research agent in April 2026. ML Intern confirmed our Experiment #4 findings and surfaced additional clinical pipeline data.

Rank Target Drug / Modality Developer Stage
1MSH3Oral RNA splicing modulator (exon skipping)Rgenta TherapeuticsIND-enabling (2025-2026)
2MSH3ASO (intrathecal, RNase H knockdown)Ionis / CHDI / MGHAdvanced preclinical (NHP data)
3CAG DNANaphthyridine-azaquinolone (binds CAG hairpins, promotes contraction)NIH / Nagoya UniversityProof-of-concept (mouse)
4FAN1Small molecule activator (enhance nuclease activity)CHDI-funded consortiumHTS screening
5PMS1/PMS2MutL inhibitorNone dedicatedConcept only
Source: ML Intern synthesis of GeM-HD Consortium papers (Cell 2015, 2019), Lancet Neurology 2017, Genetics 2020, Nature Genetics 2020. AI-generated summary, not validated.

Scale Comparison

Metric EXP-001 EXP-002 EXP-003 EXP-004
Papers22 abstracts16 full16 full75 (55 full + 20 abstract)
Characters~50K1.9M1.9M4.97M
FocusGeneral HDGeneral HDModel comparisonSomatic expansion drug screen
Drug candidates12121242
ModelLlama 3.1 8BLlama 3.1 8BGemma 4 26BGemma 4 26B
warning

Limitations

  • Single model, single run. Different prompt or temperature could produce different results.
  • Not reviewed by HD domain experts or medicinal chemists.
  • Drug rankings reflect AI assessment, not clinical validation.
  • Only open-access papers analyzed. Paywalled papers were excluded.
  • Some papers had JSON parsing errors that were fixed mid-run.
  • Abstract-only papers (20 of 75) contribute lower-confidence analysis.

Target Structures · Exp #1 · Exp #2 · Exp #3 · Research Tracker · Dashboard