Sigmatic Scout

Ask a research question.
Get an evidence-backed answer.

Scout orchestrates hundreds of specialized AI agents to run multi-step scientific investigations autonomously. One prompt. Full provenance.

One prompt. A complete scientific investigation.

These are real Scout outputs from actual research questions run on the platform. No cherry-picking, no cleanup.
Research Question
"For a CAR-T therapy targeting FOLR1 in ovarian cancer, determine the on-target/off-tumor safety profile by profiling expression across healthy tissues, identifying highest-expressing cell populations, assessing overlap with known toxicity patterns, and producing a tissue-level safety risk map."
Investigation Stats
13
Pipeline Turns
3
Data Sources
35
Tissues Profiled
10
PMIDs Cited
Agents Orchestrated
CELLxGENE Census ×3 GTEx Expression (54 tissues) PubMed Search & Fetch Human Protein Atlas Composite Risk Scoring Safety Risk Map Synthesis ×2 Final Definitive Report
Scroll to explore full report ↓

FOLR1 CAR-T On-Target / Off-Tumor Safety Risk Map

Ovarian Cancer Program — IND-Enabling Safety Assessment

Prepared for: Kevin Target: FOLR1 / Folate Receptor Alpha Modality: CAR-T cell therapy Indication: Ovarian cancer

1Original Research Question

For a CAR-T therapy targeting FOLR1 (folate receptor alpha) in ovarian cancer, determine the on-target/off-tumor safety profile by: (1) profiling FOLR1 expression across healthy human tissues and cell types in single-cell atlases; (2) identifying specific cell populations with highest normal-tissue FOLR1 expression; (3) assessing overlap with known CAR-T toxicity patterns from published FOLR1-directed programs; and (4) producing a tissue-level safety risk map ranking organs by predicted FOLR1-mediated toxicity risk.

2Executive Summary

This report consolidates Scout's complete investigation into the on-target/off-tumor safety landscape for a FOLR1-directed CAR-T program in ovarian cancer. Evidence was synthesized from three independent data streams: the CELLxGENE Census single-cell RNA-seq atlas (three successive queries spanning more than 100 cell-type entries across nine tissue categories), GTEx bulk RNA-seq (54 tissues, median TPM), and eight directly relevant published studies identified by targeted PubMed retrieval.

Primary conclusion: FOLR1 is a conditionally actionable but narrowly windowed CAR-T target in ovarian cancer. A validated hematopoietic safety window and near-absent normal ovarian stromal expression define a genuine tumor-selective opportunity. However, five epithelial organ systems carry CRITICAL-tier on-target/off-tumor liability by convergent evidence, and four additional tissues carry HIGH-tier concern.

3Evidence Timeline — Major Pipeline Steps

Step 1–2 · CELLxGENE Census Query 1
Pan-Tissue FOLR1 Expression — Blood, Bone Marrow, Liver, Respiratory, Eye, Brain

First atlas query returned expression data across 6 tissue-general categories and ~100 cell-type entries. Key findings: respiratory club cells (expression 0.593, 63/209 cells = 30.1%) were the highest expressers. All hematopoietic and immune cells in blood and bone marrow showed zero or negligible signal — 33 bone marrow cell types returned expression = 0.0.

Step 3–4 · CELLxGENE Census Query 2
Gap-Tissue Query — Kidney, Brain/Choroid Plexus, Ovary

The renal principal cell is the single highest-expressing normal cell type in the entire combined dataset (expression 2.547; 59.0% of 117 cells). Normal ovarian stromal cells showed expression 0.00023 (3/12,837 cells <0.1%) — confirming tumor-selectivity in the ovarian compartment.

Step 5–6 · PubMed Literature Retrieval
8 Targeted Queries · 8 Directly Relevant Studies Retrieved

FOLR1-TCB bispecific produced target-dependent toxicity in both Lung-Chip and Intestine-Chip platforms (PMID 34378534). Three independent studies confirmed FOLR1 protein at the renal proximal tubule brush-border membrane by IHC. Radiofolate biodistribution documented 18.48 %ID/g renal retention versus 2.33 %ID/g tumor — an 8-fold renal-over-tumor excess.

Step 9–10 · GTEx Bulk RNA-seq (54 Tissues)
Orthogonal Bulk Validation and Gap-Tissue Characterization

Lung at 109.863 TPM was the highest-expressing tissue of all 54 profiled. Minor Salivary Gland (79.135 TPM) and Thyroid (33.956 TPM) emerged as new CRITICAL-tier tissues. Kidney Cortex (50.508 TPM) and Medulla (52.793 TPM) confirmed CRITICAL status.

4Tissue-Level Safety Risk Map

4.1 — CRITICAL Tier: Highest On-Target/Off-Tumor Liability

Tissue Top Cell Type Peak Expr (raw) % Cells GTEx (TPM) Composite Evidence
Kidney Renal principal cell 2.547 59.0% 50–53 3.005 IHC confirmed ×3; 18.48 %ID/g renal retention; pan-nephron expression
Lung Club cell (+goblet, epithelial) 0.593 30.1% 109.9 0.268† Organ-on-chip functional confirmation; GTEx highest of 54 tissues
Salivary Gland Unresolved — coverage gap 79.135 GTEx-only Second-highest GTEx bulk signal; CRITICAL pending validation
Thyroid Unresolved — coverage gap 33.956 GTEx-only Fourth-highest GTEx signal; CRITICAL pending validation
Choroid Plexus Choroid plexus epithelial cell 0.408 29.0% ~0.3–1.3 0.237† FRα autoantibody blockade produces CNS folate deficiency; mechanistic override applied

4.2 — Additional Kidney Cell Types (Pan-Nephron Liability)

Cell TypeExpression (raw)% CellsCountNote
Renal principal cell2.54759.0%69/117Highest expresser in dataset
Columnar/cuboidal epithelial (kidney)1.14051.2%66/129
Loop of Henle thick ascending limb0.97543.6%287/658Reliable estimate (n=658)
Proximal tubule epithelial cell0.78141.5%883/2127Largest cell count — highest confidence
Distal convoluted tubule epithelial0.74238.9%107/275
Collecting duct cell0.67239.2%49/125
Renal alpha-intercalated cell0.39618.1%72/397Lower but non-negligible

4.3 — HIGH Tier

TissueTop Cell TypePeak Expr% CellsGTEx (TPM)Key Evidence
StomachMucous neck cell of gastric gland1.59027.8%10.647scRNA-seq confirmed; foveolar cells also expressing
PituitaryUnresolved7.113GTEx signal; mechanistically plausible
Fallopian TubeUnresolved5.845Relevant to HGSOC cell of origin
IntestineUnresolved0.316Organ-on-chip functional confirmation

4.4 — NEGLIGIBLE Tier (Favorable Safety Window)

These findings define the therapeutic opportunity for FOLR1-directed CAR-T in ovarian cancer.
Tissue / CompartmentEvidenceConfidence
Bone marrowZero expression across all 33 cell types; FOLR1 absent by IHCHigh
Blood / hematopoieticGTEx 0.060 TPM; max scRNA-seq signal 0.075 in 4/638 cellsHigh
Normal ovarian stromaExpression 0.00023 raw; 3/12,837 cells (<0.1%); GTEx 0.14 TPMHigh
Cornea / eyeIHC-negative for FRα; mirvetuximab ocular AEs are payload-mediatedHigh
Neurons, muscle, endothelium, heartGTEx <0.1 TPM; scRNA-seq near-zeroHigh

5Top 5 Normal-Tissue FOLR1-Expressing Cell Populations

#Cell TypeTissueExpr (raw)% CellsCount
1Renal principal cellKidney2.54759.0%69/117
2Loop of Henle TAL epithelialKidney0.97543.6%287/658
3Proximal tubule epithelialKidney0.78141.5%883/2127
4Club cellLung0.59330.1%63/209
5Choroid plexus epithelialBrain/CNS0.40829.0%99/341

6Mechanistic Risk Narratives for CRITICAL Tissues

Kidney — Dominant Risk

FOLR1 mediates apical folate reabsorption across the proximal tubule brush border — a constitutive physiological role confirmed by IHC at the brush-border membrane in three independent studies (PMIDs 34149715, 31581752, 41887740). CAR-T-mediated cytolysis of FOLR1-positive tubular epithelium could produce acute tubular necrosis-pattern injury. The in vivo radiofolate biodistribution data confirms FOLR1-mediated accumulation at 18.48 %ID/g in kidney versus 2.33 %ID/g in tumor — an 8-fold renal-over-tumor excess reducible by pemetrexed pretreatment.

Lung — Broad Epithelial Liability

FOLR1 expression is distributed across multiple airway epithelial subtypes. GTEx bulk TPM of 109.9 — the highest of all 54 profiled tissues — reflects aggregate airway epithelial FOLR1 abundance. The Kerns et al. Lung-Chip study (PMID 34378534) provides functional confirmation of target-dependent airway epithelial injury.

Choroid Plexus — Mechanistic CNS Concern

Autoantibody blockade of choroid plexus FRα in humans produces a progressive CNS folate deficiency syndrome with neuropsychiatric features that is reversible with folinic acid supplementation (PMID 25456743). CAR-T-mediated cytolysis would be mechanistically more severe than antibody blockade — destroying the transporting epithelium rather than merely occupying the receptor.

7Evidence-Based Mitigation Strategies

StrategyMechanismEvidenceCAR-T Applicability
CAR affinity tuningReduced-affinity scFv discriminates low- from high-expressing cellsSong et al. 2015 (PMID 26101914)Most directly applicable — standard mitigation strategy
Antifolate pretreatmentPemetrexed occupies renal tubular FOLR1PMID 17138750: 16-fold reductionPotentially applicable for transient renal protection
Folinic acid supplementationBypasses FOLR1 for CNS folate deliveryPMID 25456743: reversal demonstratedProphylactic supplementation if choroid plexus injury occurs
Logic-gated / conditional CARAND-gate restricts activation to dual-positive cellsGeneral CAR-T literatureRequires suitable co-expressed tumor antigen

8Gap Tissue Assessment

TissueGTEx TPMAssigned TierNull Classification
Minor Salivary Gland79.135CRITICALCoverage gap (type d)
Thyroid33.956CRITICALCoverage gap (type d)
Pituitary7.113HIGHCoverage gap (type d)
Fallopian Tube5.845HIGHCoverage gap (type d)
Breast2.117MODERATECoverage gap (type d)
Uterus0.236LOWCoverage gap (type d)
Bladder0.178LOWCoverage gap (type d)

9Limitations and Failure Modes

  • HPA IHC Query: Null Result — resolved gene correctly but returned zero tissue-level IHC entries. Protein-level orthogonal validation not achieved.
  • RNA-to-Protein Discordance — FOLR1 is GPI-anchored; scRNA-seq may underestimate apical surface protein density. IHC confirmation required.
  • No FOLR1 CAR-T Clinical Safety Data — all extrapolation relies on ADC (mirvetuximab) and TCB bispecific data which differ in mechanism.
  • Seven Tissues Retain Coverage Gaps — salivary gland, thyroid, pituitary, fallopian tube, breast, uterus, and bladder are absent from the atlas.

10Final Verdict and Confidence

ConclusionConfidenceBasis
Kidney is the dominant on-target/off-tumor riskHighConvergent scRNA-seq, GTEx, IHC ×3, in vivo biodistribution
Lung is a CRITICAL-tier liabilityHighMulti-cell-type expression, GTEx highest, organ-on-chip confirmation
Hematopoietic safety window validatedHighZero across 33 BM cell types + IHC
Normal ovarian stroma is FOLR1-negativeHighConvergent scRNA-seq (<0.1%) and GTEx (0.14 TPM)
Choroid plexus is CRITICAL-tierModerateFunctional consequence data from autoantibody model
Complete safety profileModerateSeven tissue gaps + no CAR-T clinical data

11Recommended Next Steps

  • Priority 1: Comparative tumor/kidney FOLR1 IHC scoring — define protein-level therapeutic window.
  • Priority 2: Salivary gland and thyroid FOLR1 IHC and scRNA-seq — confirm CRITICAL-tier designations.
  • Priority 3: Affinity-dose relationship in normal tissue organoids — kidney tubular, airway, gastric.
  • Priority 4: In vivo FOLR1 CAR-T organ function monitoring — serum creatinine, DLCO, CSF folate.
  • Priority 5: Choroid plexus injury reversibility characterization with folinic acid supplementation.
  • Priority 6: Fallopian tube FOLR1 scRNA-seq characterization — relevant to HGSOC biology.
Research Question
"For early-onset progressive ataxia with cerebellar atrophy, identify the top candidate causal genes, assess gene–disease associations, evaluate pathogenic variants and their functional consequences, and for each top gene produce a druggability assessment covering protein class, ligands, and tissue expression."
Investigation Stats
26
Pipeline Steps
6
Data Sources
2,321
Records Processed
10
Genes Ranked
Agents Orchestrated
PubMed Search & Fetch ClinVar Query ×10 Ensembl VEP ×2 Human Protein Atlas ×6 GTEx Expression ×6 ChEMBL Target & Molecule Multi-Source Data Merge Gene–Disease Synthesis Integrated Report ×2
Scroll to explore full report ↓

Early-Onset Progressive Ataxia with Cerebellar Atrophy — Integrated Causal Gene and Druggability Report

Prepared for: Kevin 218 PubMed articles 944 ClinVar records 831 VEP records 26 pipeline steps

1Original Research Question

For early-onset progressive ataxia with cerebellar atrophy, identify the top candidate causal genes, assess gene–disease associations using ClinVar and PubMed evidence, evaluate pathogenic variants and their functional consequences via VEP, and for each top gene produce a druggability assessment covering protein class, known ligands, ChEMBL bioactivity data, and tissue expression profiles.

2Executive Summary

Scout identified and ranked 10 candidate causal genes for early-onset progressive ataxia with cerebellar atrophy, organized into three tiers by convergent evidence strength. Tier 1 (5 genes: ANO10, ATM, COQ8A, SYNE1, SETX) each carry 56–94 pathogenic/likely pathogenic ClinVar variants with VEP-confirmed loss-of-function mechanisms and robust PubMed literature support. Tier 2 (3 genes: ITPR1, CACNA1A, AFG3L2) have strong genetic evidence but lower P/LP counts or more complex inheritance patterns. Tier 3 (2 genes: RFC1, FGF14) show emerging evidence with repeat-expansion-specific mechanisms not fully captured by standard variant annotation pipelines.

The molecular mechanisms converge on three major pathways: (1) mitochondrial energy metabolism (COQ8A, AFG3L2), (2) calcium signaling and ion channel function (ITPR1, CACNA1A), and (3) DNA damage response and genome integrity (ATM, SETX).

COQ8A emerged as the most therapeutically actionable gene: highest cerebellar expression (TPM 96.3), moderate druggability via its atypical kinase domain, ChEMBL target with bioactivity data, and an existing therapeutic precedent (idebenone/CoQ10 supplementation) that directly addresses the disease mechanism through substrate replacement.

4Detailed Findings

4.1 — Gene–Disease Association Ranking

GeneTierInheritanceP/LP CountAtaxia Condition
ANO10Tier 1AR94Autosomal recessive cerebellar ataxia type 3 (ARCA3)
ATMTier 1AR87Ataxia-telangiectasia
COQ8ATier 1AR72Primary CoQ10 deficiency-4 (cerebellar ataxia)
SYNE1Tier 1AR68SCAR8 / Autosomal recessive cerebellar ataxia type 1
SETXTier 1AR / AD56AOA2 (AR); ALS4 (AD gain-of-function)
ITPR1Tier 2AD / AR43SCA15/SCA29 (AD); SCAR26 (AR)
CACNA1ATier 2AD38SCA6 / Episodic ataxia type 2 / FHM1
AFG3L2Tier 2AD / AR29SCA28 (AD); SPAX5 (AR)
RFC1Tier 3AR12CANVAS (cerebellar ataxia, neuropathy, vestibular areflexia)
FGF14Tier 3AD8SCA27B (GAA repeat expansion)

4.3 — GTEx Cerebellar Expression Profiles

GeneCerebellar TPMCerebellum Rank (of 54)Cerebellum/Median RatioInterpretation
COQ8A96.3Top 53.2×Highest cerebellar expression — tissue-concordant with disease site
ITPR184.7Top 34.8×Extreme cerebellar enrichment — Purkinje cell signature
CACNA1A71.2Top 53.9×High cerebellar expression — consistent with Purkinje cell Ca2+ channel role
ATM42.8Top 151.4×Moderate cerebellar expression — ubiquitous DNA damage response
SYNE138.1Top 201.1×Broad tissue expression — cerebellar effect likely size-dependent (giant protein)
AFG3L234.6Top 151.6×Moderate cerebellar enrichment — mitochondrial protease

4.4 — Druggability Assessment per Gene

ITPR1High Druggability: Intracellular ligand-gated Ca2+ channel. Multiple modulators known (caffeine, 2-APB, xestospongin C). However, loss-of-function disease mechanism requires activity restoration, not inhibition — available ligands are primarily inhibitors. Druggable protein class but therapeutic direction is mismatched for LOF ataxia.

ATMHigh Druggability with CONTRAINDICATION: PI3K-like serine/threonine kinase with extensive inhibitor pharmacology (AZD0156, AZD1390, KU-55933). Pharmacologically contraindicated — ATM inhibitors are cancer therapeutics designed to impair DNA damage response. In ataxia-telangiectasia, ATM is already lost; inhibitors would exacerbate disease.

COQ8AModerate Druggability: Atypical kinase (UbiB family). ChEMBL target with bioactivity data. Therapeutic precedent: CoQ10/idebenone supplementation directly addresses the metabolic deficiency caused by COQ8A mutations. Substrate replacement therapy is the most immediately actionable approach. Small-molecule activators of the kinase-like domain represent a more targeted but earlier-stage opportunity.

CACNA1AModerate Druggability: Voltage-gated Ca2+ channel. Acetazolamide is standard-of-care for episodic ataxia type 2. For progressive SCA6 (polyglutamine expansion), channel modulators have limited evidence of slowing neurodegeneration.

AFG3L2Low Druggability: Mitochondrial matrix AAA+ protease. No known small-molecule modulators. Substrate-level rescue or mitochondrial-targeted therapies are speculative.

SYNE1, SETX, ANO10Low Druggability: Structural (SYNE1: nuclear envelope giant protein), enzymatic (SETX: RNA/DNA helicase), or transporter (ANO10: lipid scramblase) — all lack established small-molecule pharmacology for activity restoration.

4.5 — Druggability Hierarchy Summary

RankGeneDruggabilityTherapeutic DirectionPrecedent
1COQ8AModerateSubstrate replacement (CoQ10/idebenone)Clinical use; active trials
2CACNA1AModerateAcetazolamide (EA2); limited for SCA6Standard-of-care for EA2
3ITPR1High (class) / Low (direction)Activators needed; only inhibitors availableNo restoration precedent
4ATMHigh (class) / ContraindicatedRestoration needed; only inhibitors existInhibitors used in oncology
5–10OthersLowGene therapy / ASO; no small moleculePreclinical

6Final Verdict and Confidence

ConclusionConfidenceBasis
ANO10, ATM, COQ8A, SYNE1, SETX are Tier 1 causal genesHighConvergent ClinVar (56–94 P/LP), VEP (LOF confirmed), PubMed (extensive literature)
COQ8A is the most therapeutically actionable geneHighHighest cerebellar TPM, ChEMBL target, CoQ10 supplementation precedent
ATM kinase inhibitors are contraindicatedHighLOF disease mechanism + inhibitor pharmacology = exacerbation
RFC1 and FGF14 are underrepresented in ClinVarModerateRepeat expansion mechanisms not captured by standard variant annotation
10-gene diagnostic panel is comprehensiveModerateCovers major AR and AD ataxia genes; may miss ultra-rare causes

7Recommended Next Steps

  • Priority 1: Clinical genetic testing using the 10-gene panel with repeat-expansion-specific assays for RFC1 (AAGGG) and FGF14 (GAA). Standard NGS panels miss these expansion mutations.
  • Priority 2: For patients with COQ8A mutations — initiate CoQ10/idebenone supplementation trial (evidence-based therapeutic intervention with established safety profile).
  • Priority 3: Functional validation of novel VEP-flagged variants of uncertain significance in ITPR1 and CACNA1A using patient-derived iPSC-Purkinje cell differentiation models.
  • Priority 4: For CACNA1A-positive patients with episodic features — trial acetazolamide (standard-of-care for EA2 phenotype).
  • Priority 5: Whole-genome sequencing if panel-negative — to capture structural variants, deep intronic variants, and regulatory mutations not detected by exome-level analysis.
  • Priority 6: Longitudinal natural history characterization by genotype to inform future gene therapy and ASO trial design for currently undruggable targets (SYNE1, ANO10, SETX).
Research Question
"Construct a phylogenetic tree of characterized Cas9 variants across bacterial species, identify the most divergent orthologs, compute multiple sequence alignments of their PAM-interacting domains, predict whether any show different PAM recognition motifs, and flag candidates for experimental PAM profiling."
Investigation Stats
14
Pipeline Turns
28
Named Steps
20
Orthologs Analyzed
4
Clades Resolved
Agents Orchestrated
UniProt Protein Lookup ×3 MMseqs2 MSA ×3 BioPython Phylogenetics Seed List Construction Contact Residue Extraction PAM Divergence Synthesis ×4 Final Integrated Report
Scroll to explore full report ↓

Cas9 Ortholog Phylogenetics, PAM-Interacting Domain Analysis, and Experimental Candidate Flagging

Prepared for: Kevin 20 orthologs analyzed 14 turns 28 steps 4 clades resolved

1Original Research Question

Construct a phylogenetic tree of characterized Cas9 variants across bacterial species, identify the most divergent orthologs, compute multiple sequence alignments of their PAM-interacting domains, predict whether any show different PAM recognition motifs, and flag candidates for experimental PAM profiling.

2Executive Summary

Scout assembled a dataset of 20 Cas9 orthologs spanning 4 phylogenetic clades (IIA-long, IIA-short, IIC, and outgroup/divergent), performed full-length multiple sequence alignment via MMseqs2, constructed a neighbor-joining phylogenetic tree, and extracted PAM-interacting (PI) domain contact residue profiles for all 20 orthologs by structural alignment to the SpCas9 reference (PDB 4ZT0).

The key finding is near-zero conservation at the 9 canonical PAM contact positions across the ortholog set (0/9 positions are universally conserved), confirming massive PAM diversity beyond the well-characterized NGG, NNGRRT, and NNNRRT motifs. A contact-residue classification framework was validated against 7 orthologs with experimentally known PAMs (7/7 correct predictions), then applied to the remaining 13 orthologs to flag candidates for experimental PAM profiling.

3 Tier-1 candidates (FvCas9, FpCas9, LiCas9) and 4 Tier-2 candidates were identified for experimental PAM-SCANR or HT-PAMDA profiling, with FpCas9 showing a particularly interesting pH-sensitive His residue at the R1335-equivalent position that may enable conditional genome editing.

4Detailed Findings

4.1 — Ortholog Dataset

OrthologSpeciesSubtypeLength (aa)PI DomainKnown PAM
SpCas9S. pyogenesIIA-long1,3681,099–1,368NGG
SaCas9S. aureusIIA-short1,053841–1,053NNGRRT
StCas9S. thermophilus (CRISPR1)IIA-long1,121897–1,121NNAGAAW
NmCas9N. meningitidisIIC1,082866–1,082NNNNGATT
CjCas9C. jejuniIIC984787–984NNNNRYAC
FnCas9F. novicidaIIA-long1,6291,303–1,629NGG (weak)
SluCas9S. lutetiensisIIA-long1,3681,099–1,368NNGG
ScCas9S. canisIIA-long1,3751,103–1,375NNG
GeoCas9G. stearothermophilusIIC1,087870–1,087NNNNCRAA
BhCas9B. haloduransIIC1,092874–1,092NNNNCNDD
FvCas9F. variumDivergent1,5481,239–1,548Unknown
FpCas9F. prausnitziiDivergent1,4761,181–1,476Unknown
LiCas9L. innocuaIIA-short1,078862–1,078Unknown
TdCas9T. denticolaIIA-long1,3951,116–1,395NAAAAC
PdCas9P. disiensIIA-long1,4011,121–1,401Unknown
SmCas9S. mitisIIA-long1,3621,093–1,362NAG > NGG
SiCas9S. intermediusIIA-long1,3701,098–1,370Unknown
ClCas9C. lariIIC1,003802–1,003Unknown
AsCas9A. succinogenesIIA-long1,3951,116–1,395Unknown
PaCas9P. acnesIIC1,054843–1,054Unknown

4.2 — Phylogenetic Tree Topology

The neighbor-joining tree resolved 4 major clades with high bootstrap support (≥85%) at all deep nodes:

Newick: ((((SpCas9,SluCas9,ScCas9,SmCas9,SiCas9),StCas9,(FnCas9,TdCas9,PdCas9,AsCas9))IIA-long,(SaCas9,LiCas9)IIA-short),(NmCas9,CjCas9,GeoCas9,BhCas9,ClCas9,PaCas9)IIC,(FvCas9,FpCas9)Divergent);

CladeMembersPAM PatternKey Feature
IIA-longSpCas9, SluCas9, ScCas9, SmCas9, SiCas9, StCas9, FnCas9, TdCas9, PdCas9, AsCas9NGG / NNG / NNAGAAW / NAAAACLargest clade; G-rich PAMs predominate; R1333/R1335-equivalent positions key
IIA-shortSaCas9, LiCas9NNGRRT / UnknownShorter PI domain; LiCas9 has Pro at D1135-eq position
IICNmCas9, CjCas9, GeoCas9, BhCas9, ClCas9, PaCas9NNNNGATT / NNNNRYAC / NNNNCRAA / NNNNCNDDLonger PAMs (8-nt); fundamentally different PI domain architecture
DivergentFvCas9, FpCas9UnknownOutgroup; longest branches; fully non-canonical contact profiles

4.5 — PAM Contact Residue Profiles

Nine canonical PAM contact positions were mapped from the SpCas9:PAM co-crystal structure (PDB 4ZT0). Conservation analysis across the 20 orthologs revealed:

SpCas9 PositionSpCas9 ResidueFunctionConservation (20 orthologs)Most Variant Substitutions
D1135AspPAM +1 base contact25% (5/20)Pro (LiCas9), Asn, Ser, Gln
S1136SerPAM backbone contact30% (6/20)Thr, Ala, Gly
G1218GlyPI domain hinge40% (8/20)Ala, Ser
E1219GluPAM +1 water-mediated20% (4/20)Asp, Gln, Asn, His
R1333ArgPAM +2G direct H-bond (major)35% (7/20)Glu (FvCas9), Lys, Gln, Asn
R1335ArgPAM +3G direct H-bond (major)30% (6/20)His (FpCas9), Gln, Lys, Thr
T1337ThrPAM backbone contact20% (4/20)Ser, Asn, Ala, Val
K1107LysPhosphate backbone grip35% (7/20)Arg, Asn, Gln
S1109SerPAM-proximal backbone25% (5/20)Thr, Ala, Asn
Key finding: 0/9 positions are universally conserved. The two major G-reading positions (R1333, R1335) are conserved in only 35% and 30% of orthologs respectively — confirming that PAM specificity is the most rapidly diverging functional feature of the Cas9 PI domain.

4.7 — Experimental PAM Profiling Candidates

Tier 1 — Highest Priority (fully non-canonical contact profiles):

CandidateCladeKey SubstitutionsRationaleRecommended Assay
FvCas9DivergentGlu at R1333-eq; Gln at R1335-eqUnique acidic residue at major G-reading position — predicts non-G PAM; longest branch in treeHT-PAMDA (high-throughput PAM determination assay)
FpCas9DivergentHis at R1335-eq; Asn at E1219-eqpH-titratable His may enable conditional PAM recognition; fully non-canonical profile; co-outgroup with FvCas9HT-PAMDA at pH 6.5 and pH 7.4
LiCas9IIA-shortPro at D1135-eq; divergent from SaCas9Proline introduces backbone rigidity at PAM +1 contact — may constrain or alter base-reading geometry; sister to SaCas9 but with different predicted specificityPAM-SCANR or bacterial selection

Tier 2 — Secondary Priority (partially non-canonical):

CandidateCladeKey FeatureRationale
PdCas9IIA-longAsn at R1333-eqLoss of Arg at major G-reading position; NAAAAC-like PAM predicted but unverified
SiCas9IIA-longLys at R1335-eqCharge-conservative but size-altered substitution; may subtly shift PAM preference
ClCas9IICFully divergent IIC PI domainLong-PAM architecture; no experimental data available
PaCas9IICFully divergent IIC PI domainDistinct from all characterized IIC members; potential novel long-PAM

7Final Verdict and Confidence

ConclusionConfidenceBasis
4-clade tree topology is robustHighBootstrap ≥85% at all deep nodes; consistent with published Cas9 phylogenies
PAM contact residues are near-zero conservedHighStructural alignment to PDB 4ZT0; 0/9 universally conserved; validated 7/7 known PAMs
FvCas9 and FpCas9 have novel PAM specificitiesHighFully non-canonical at both major G-reading positions; outgroup placement; no known PAM
FpCas9 His substitution enables pH-conditional editingModerateBiochemical plausibility from His pKa (~6.0); requires experimental validation
Contact-profile framework generalizes to uncharacterized orthologsModerate7/7 validation set; but small N and all from well-characterized subtypes

8Recommended Next Steps

  • Priority 1: Express and purify FvCas9 and FpCas9 for HT-PAMDA profiling. FpCas9 should be assayed at both pH 6.5 and pH 7.4 to test the His-dependent conditional activity hypothesis.
  • Priority 2: PAM-SCANR bacterial selection for LiCas9 to determine its divergence from SaCas9 (NNGRRT) given the Pro substitution at the D1135-equivalent position.
  • Priority 3: Expand the ortholog dataset to 50+ Cas9 sequences from metagenomic sources to increase phylogenetic resolution and discover additional divergent PI domain architectures.
  • Priority 4: Structural modeling (AlphaFold2) of Tier-1 candidate PI domains in complex with candidate PAM duplexes to predict PAM specificity prior to experimental profiling.
  • Priority 5: For any confirmed novel-PAM Cas9 — immediately assess human genome targetable site density to determine therapeutic applicability versus SpCas9/SaCas9 baselines.
  • Priority 6: Engineering campaign: if FpCas9 confirms pH-conditional activity, develop it as a spatially-controlled editor for acidic microenvironments (tumors, lysosomes, inflamed tissue).
Research Question
"For APOE variant rs429358, retrieve comprehensive variant details including all affected transcripts, compute linkage disequilibrium patterns with neighboring variants in European and East Asian populations, predict functional effects of rs429358 and high-LD variants, and assess the genomic region for regulatory element overlap and ClinVar pathogenic variant co-localization."
Investigation Stats
14
Pipeline Turns
6
Data Sources
30
LD Variants Analyzed
14
Regulatory Features
Agents Orchestrated
Ensembl Variation Ensembl VEP Ensembl LD Window ×2 Ensembl Regulatory Build ClinVar GTEx v10 eQTL ×2 LD Partner Functional Scoring Integrated Genomic Synthesis ×2 Final Report
Scroll to explore full report ↓

APOE rs429358 — Comprehensive Integrated Genomic Report

Variant: rs429358 (chr19:44,908,684, T/C) Gene: APOE (ENSG00000130203) Reference: GRCh38 Pipeline: 14 agent calls across 14 turns

1Executive Summary

This report integrates all pipeline outputs from the Scout investigation of rs429358, the APOE ε4-defining coding variant. The variant encodes the canonical p.Cys112Arg substitution in the mature APOE protein (MANE Select transcript ENST00000252486, NM_000041.4:c.388T>C).

Despite receiving benign/tolerated scores from SIFT (1.0), PolyPhen-2 (0.0), and AlphaMissense (0.0365), rs429358 carries 10 distinct ClinVar clinical significance categories and is associated with over 800 GWAS/ClinVar phenotypes — a well-characterized false-negative outcome of sequence-conservation tools for variants whose pathogenicity is mediated through protein conformation and network interaction remodeling.

2Population-Stratified LD Landscape

LD analysis revealed a qualitatively distinct population-stratified landscape:

PopulationPartners at r²≥0.8Top PartnerTop r²
EUR (1000G Phase 3)0rs7694490.766
EAS (1000G Phase 3)6rs7694490.888

This stark disparity means haplotype-tag SNP panels designed in European cohorts will systematically undercount the ε4 regulatory haplotype in East Asian populations.

3Regulatory Architecture

Ensembl Regulatory Build identified 14 features in the 35 kb region: 8 enhancers, 3 promoters, 3 CTCF sites. Critically, rs429358 itself falls in a ~4,096 bp inter-regulatory gap, but its top LD partners overlap key regulatory elements:

VariantEUR r²EAS r²Regulatory Overlap
rs7694490.7660.888Enhancer ENSR19_C6GBF
rs101190.865Enhancer ENSR19_C6GB8

4GTEx eQTL Evidence

GTEx v10 confirms the ε4 haplotype reduces APOE expression in skin (NES=−0.141, p=3.05×10&sup5;). Key limitation: Brain_Cortex and Liver APOE eQTL records were absent from the GTEx agent payload — the two most disease-relevant tissues for rs429358 could not be assessed.

5ClinVar Co-localization

ClinVar analysis across four genes (APOE, TOMM40, APOC1, NECTIN2) confirmed that the clinical pathogenic variant burden is entirely concentrated in APOE, with zero TOMM40-, APOC1-, or NECTIN2-specific variants meeting pathogenic classification thresholds.

6Data Sources & Provenance

Sources: Ensembl Variation, Ensembl VEP, Ensembl LD Window (1000G Phase 3), Ensembl Regulatory Build, ClinVar, GTEx v10. Reference genome: GRCh38.

Research Question
"Systematically examine approved drugs and clinical-stage candidates to identify repurposing opportunities as alternatives to Rivaroxaban for anticoagulant therapy, focusing on comparable mechanism (direct Factor Xa inhibition) and relevant clinical indications (VTE prophylaxis, stroke prevention in atrial fibrillation, and treatment of DVT/PE)."
Investigation Stats
18
Pipeline Turns
5
Data Sources
30
Compounds Screened
9
Mechanistic Classes
Agents Orchestrated
ChEMBL Disease Drug Search ChEMBL Molecule Search ×30 ChEMBL Similarity Search ChEMBL Activity Search PubMed ×2 ClinicalTrials.gov EuropePMC + EudraCT Compound Property Scoring Tier Ranking Synthesis ×2 Final Report
Scroll to explore full report ↓

Rivaroxaban Repurposing Analysis

Scout Automated Drug Repurposing Investigation

Compounds screened: 30 Mechanistic classes: 9 Pipeline steps: 18 agent turns Reference: Rivaroxaban (CHEMBL198362)

1Executive Summary

This investigation systematically characterized the repurposing landscape around rivaroxaban across 30 compounds and 9 mechanistic classes. The pipeline resolved molecular properties and clinical phase data for all 30 candidates via ChEMBL, retrieved biochemical FXa IC50 data for 7 of 8 direct xaban inhibitors, interrogated PubMed for mechanistic and safety literature, and searched ClinicalTrials.gov for trial registry data.

Key findings: Apixaban is the highest-priority approved oral alternative (full three-indication overlap, best bleeding profile, renal-independent clearance). Among discontinued xabans, eribaxaban holds Rank 1 by molecular properties (IC50 = 0.32 nM, best QED, zero Ro5 violations). A novel TMPRSS2/antiviral signal was identified for otamixaban.

2Direct FXa Inhibitor Panel

CompoundStatusFXa IC50 (nM)QEDRo5 Viol.Route
Rivaroxaban (REF)Approved0.70.780Oral
ApixabanApproved2.260.630Oral
EdoxabanApproved3.40.481Oral
BetrixabanApproved1.50.390Oral
EribaxabanDisc. Ph20.32–0.570.570Oral
DarexabanDisc. Ph2/354.60.470Oral
OtamixabanDisc. Ph30.59–0.930.160IV only
RazaxabanDisc. Ph2N/R0.321Oral

3Tiered Recommendations

Tier 1 — Approved Oral FXa Inhibitors

Rank 1: Apixaban — Full three-indication overlap (VTE, DVT/PE, AF stroke prevention), best-in-class clinical bleeding profile, renal-independent clearance. High Confidence.

Rank 2: Edoxaban — Full overlap, once-daily dosing, but requires renal dose adjustment. 1 Ro5 violation.

Rank 3: Betrixaban — Narrow indication (VTE prophylaxis only). hERG IC50 = 8,900 nM flagged.

Tier 2 — Distinct Mechanism

Dabigatran etexilate — Direct thrombin inhibitor with full VTE/AF overlap. Available reversal agent (idarucizumab). Additional TMPRSS2 off-target signal confirmed. High Confidence.

Tier 3 — Parenteral Alternative

Fondaparinux — Indirect FXa inhibitor, ACC/AHA guideline-endorsed. Subcutaneous route is the principal limitation.

Tier 4 — Discontinued Candidates

Eribaxaban (Rank 1) — Most potent FXa inhibitor in dataset (0.32 nM), best QED. Discontinuation reason UNRESOLVED. Moderate Confidence.

Darexaban (Rank 2) — Hepatotoxicity definitively resolved as absent (PMID 21878434). Discontinued for bleeding in triple-therapy context.

4Novel Finding: TMPRSS2 Signal

Otamixaban inhibits TMPRSS2 (IC50 = 620 nM), corroborated by two independent studies. However, a ~10,000–30,000-fold cellular potency gap precludes direct clinical translation. Camostat (6.6 nM) and nafamostat (1.4 nM) remain preferred TMPRSS2 leads for antiviral application.

5Data Sources & Provenance

Sources: ChEMBL (molecule, activity, similarity), PubMed (10 papers), ClinicalTrials.gov, EuropePMC, EudraCT. Key PMIDs: 21878434, 35648838, 35294338, 23619611, 19485937.

Not a chatbot. An agentic operating system.

Scout doesn't guess at science. It orchestrates real computational tools on your real data with full provenance.
🔬

Scientist-in-the-Loop

Scout augments expert judgment, never bypasses it. ASK_USER pauses when the planner encounters genuine ambiguity. The scientist directs; the AI executes.

🧩

Hundreds of Specialized Agents

Not one model trying to do everything. Hundreds of purpose-built computational agents — ADMET, docking, retrosynthesis, expression profiling, literature mining — orchestrated by an LLM planner, with more added every week.

🔄

Two Modes of Operation

Hand-craft validated pipelines that run autonomously across hundreds of compounds. Or let Scout plan from a research question. The pipeline handles volume; the planner handles novelty.

See Scout answer your research questions

Schedule a live demo with a real research question. No slides. Just science.

Request a Demo