Ask a research question.
Get an evidence-backed answer.
Scout orchestrates hundreds of specialized AI agents to run multi-step scientific investigations autonomously. One prompt. Full provenance.
One prompt. A complete scientific investigation.
FOLR1 CAR-T On-Target / Off-Tumor Safety Risk Map
Ovarian Cancer Program — IND-Enabling Safety Assessment
1Original Research Question
For a CAR-T therapy targeting FOLR1 (folate receptor alpha) in ovarian cancer, determine the on-target/off-tumor safety profile by: (1) profiling FOLR1 expression across healthy human tissues and cell types in single-cell atlases; (2) identifying specific cell populations with highest normal-tissue FOLR1 expression; (3) assessing overlap with known CAR-T toxicity patterns from published FOLR1-directed programs; and (4) producing a tissue-level safety risk map ranking organs by predicted FOLR1-mediated toxicity risk.
2Executive Summary
This report consolidates Scout's complete investigation into the on-target/off-tumor safety landscape for a FOLR1-directed CAR-T program in ovarian cancer. Evidence was synthesized from three independent data streams: the CELLxGENE Census single-cell RNA-seq atlas (three successive queries spanning more than 100 cell-type entries across nine tissue categories), GTEx bulk RNA-seq (54 tissues, median TPM), and eight directly relevant published studies identified by targeted PubMed retrieval.
3Evidence Timeline — Major Pipeline Steps
First atlas query returned expression data across 6 tissue-general categories and ~100 cell-type entries. Key findings: respiratory club cells (expression 0.593, 63/209 cells = 30.1%) were the highest expressers. All hematopoietic and immune cells in blood and bone marrow showed zero or negligible signal — 33 bone marrow cell types returned expression = 0.0.
The renal principal cell is the single highest-expressing normal cell type in the entire combined dataset (expression 2.547; 59.0% of 117 cells). Normal ovarian stromal cells showed expression 0.00023 (3/12,837 cells <0.1%) — confirming tumor-selectivity in the ovarian compartment.
FOLR1-TCB bispecific produced target-dependent toxicity in both Lung-Chip and Intestine-Chip platforms (PMID 34378534). Three independent studies confirmed FOLR1 protein at the renal proximal tubule brush-border membrane by IHC. Radiofolate biodistribution documented 18.48 %ID/g renal retention versus 2.33 %ID/g tumor — an 8-fold renal-over-tumor excess.
Lung at 109.863 TPM was the highest-expressing tissue of all 54 profiled. Minor Salivary Gland (79.135 TPM) and Thyroid (33.956 TPM) emerged as new CRITICAL-tier tissues. Kidney Cortex (50.508 TPM) and Medulla (52.793 TPM) confirmed CRITICAL status.
4Tissue-Level Safety Risk Map
4.1 — CRITICAL Tier: Highest On-Target/Off-Tumor Liability
| Tissue | Top Cell Type | Peak Expr (raw) | % Cells | GTEx (TPM) | Composite | Evidence |
|---|---|---|---|---|---|---|
| Kidney | Renal principal cell | 2.547 | 59.0% | 50–53 | 3.005 | IHC confirmed ×3; 18.48 %ID/g renal retention; pan-nephron expression |
| Lung | Club cell (+goblet, epithelial) | 0.593 | 30.1% | 109.9 | 0.268† | Organ-on-chip functional confirmation; GTEx highest of 54 tissues |
| Salivary Gland | Unresolved — coverage gap | — | — | 79.135 | GTEx-only | Second-highest GTEx bulk signal; CRITICAL pending validation |
| Thyroid | Unresolved — coverage gap | — | — | 33.956 | GTEx-only | Fourth-highest GTEx signal; CRITICAL pending validation |
| Choroid Plexus | Choroid plexus epithelial cell | 0.408 | 29.0% | ~0.3–1.3 | 0.237† | FRα autoantibody blockade produces CNS folate deficiency; mechanistic override applied |
4.2 — Additional Kidney Cell Types (Pan-Nephron Liability)
| Cell Type | Expression (raw) | % Cells | Count | Note |
|---|---|---|---|---|
| Renal principal cell | 2.547 | 59.0% | 69/117 | Highest expresser in dataset |
| Columnar/cuboidal epithelial (kidney) | 1.140 | 51.2% | 66/129 | — |
| Loop of Henle thick ascending limb | 0.975 | 43.6% | 287/658 | Reliable estimate (n=658) |
| Proximal tubule epithelial cell | 0.781 | 41.5% | 883/2127 | Largest cell count — highest confidence |
| Distal convoluted tubule epithelial | 0.742 | 38.9% | 107/275 | — |
| Collecting duct cell | 0.672 | 39.2% | 49/125 | — |
| Renal alpha-intercalated cell | 0.396 | 18.1% | 72/397 | Lower but non-negligible |
4.3 — HIGH Tier
| Tissue | Top Cell Type | Peak Expr | % Cells | GTEx (TPM) | Key Evidence |
|---|---|---|---|---|---|
| Stomach | Mucous neck cell of gastric gland | 1.590 | 27.8% | 10.647 | scRNA-seq confirmed; foveolar cells also expressing |
| Pituitary | Unresolved | — | — | 7.113 | GTEx signal; mechanistically plausible |
| Fallopian Tube | Unresolved | — | — | 5.845 | Relevant to HGSOC cell of origin |
| Intestine | Unresolved | — | — | 0.316 | Organ-on-chip functional confirmation |
4.4 — NEGLIGIBLE Tier (Favorable Safety Window)
| Tissue / Compartment | Evidence | Confidence |
|---|---|---|
| Bone marrow | Zero expression across all 33 cell types; FOLR1 absent by IHC | High |
| Blood / hematopoietic | GTEx 0.060 TPM; max scRNA-seq signal 0.075 in 4/638 cells | High |
| Normal ovarian stroma | Expression 0.00023 raw; 3/12,837 cells (<0.1%); GTEx 0.14 TPM | High |
| Cornea / eye | IHC-negative for FRα; mirvetuximab ocular AEs are payload-mediated | High |
| Neurons, muscle, endothelium, heart | GTEx <0.1 TPM; scRNA-seq near-zero | High |
5Top 5 Normal-Tissue FOLR1-Expressing Cell Populations
| # | Cell Type | Tissue | Expr (raw) | % Cells | Count |
|---|---|---|---|---|---|
| 1 | Renal principal cell | Kidney | 2.547 | 59.0% | 69/117 |
| 2 | Loop of Henle TAL epithelial | Kidney | 0.975 | 43.6% | 287/658 |
| 3 | Proximal tubule epithelial | Kidney | 0.781 | 41.5% | 883/2127 |
| 4 | Club cell | Lung | 0.593 | 30.1% | 63/209 |
| 5 | Choroid plexus epithelial | Brain/CNS | 0.408 | 29.0% | 99/341 |
6Mechanistic Risk Narratives for CRITICAL Tissues
Kidney — Dominant Risk
FOLR1 mediates apical folate reabsorption across the proximal tubule brush border — a constitutive physiological role confirmed by IHC at the brush-border membrane in three independent studies (PMIDs 34149715, 31581752, 41887740). CAR-T-mediated cytolysis of FOLR1-positive tubular epithelium could produce acute tubular necrosis-pattern injury. The in vivo radiofolate biodistribution data confirms FOLR1-mediated accumulation at 18.48 %ID/g in kidney versus 2.33 %ID/g in tumor — an 8-fold renal-over-tumor excess reducible by pemetrexed pretreatment.
Lung — Broad Epithelial Liability
FOLR1 expression is distributed across multiple airway epithelial subtypes. GTEx bulk TPM of 109.9 — the highest of all 54 profiled tissues — reflects aggregate airway epithelial FOLR1 abundance. The Kerns et al. Lung-Chip study (PMID 34378534) provides functional confirmation of target-dependent airway epithelial injury.
Choroid Plexus — Mechanistic CNS Concern
Autoantibody blockade of choroid plexus FRα in humans produces a progressive CNS folate deficiency syndrome with neuropsychiatric features that is reversible with folinic acid supplementation (PMID 25456743). CAR-T-mediated cytolysis would be mechanistically more severe than antibody blockade — destroying the transporting epithelium rather than merely occupying the receptor.
7Evidence-Based Mitigation Strategies
| Strategy | Mechanism | Evidence | CAR-T Applicability |
|---|---|---|---|
| CAR affinity tuning | Reduced-affinity scFv discriminates low- from high-expressing cells | Song et al. 2015 (PMID 26101914) | Most directly applicable — standard mitigation strategy |
| Antifolate pretreatment | Pemetrexed occupies renal tubular FOLR1 | PMID 17138750: 16-fold reduction | Potentially applicable for transient renal protection |
| Folinic acid supplementation | Bypasses FOLR1 for CNS folate delivery | PMID 25456743: reversal demonstrated | Prophylactic supplementation if choroid plexus injury occurs |
| Logic-gated / conditional CAR | AND-gate restricts activation to dual-positive cells | General CAR-T literature | Requires suitable co-expressed tumor antigen |
8Gap Tissue Assessment
| Tissue | GTEx TPM | Assigned Tier | Null Classification |
|---|---|---|---|
| Minor Salivary Gland | 79.135 | CRITICAL | Coverage gap (type d) |
| Thyroid | 33.956 | CRITICAL | Coverage gap (type d) |
| Pituitary | 7.113 | HIGH | Coverage gap (type d) |
| Fallopian Tube | 5.845 | HIGH | Coverage gap (type d) |
| Breast | 2.117 | MODERATE | Coverage gap (type d) |
| Uterus | 0.236 | LOW | Coverage gap (type d) |
| Bladder | 0.178 | LOW | Coverage gap (type d) |
9Limitations and Failure Modes
- HPA IHC Query: Null Result — resolved gene correctly but returned zero tissue-level IHC entries. Protein-level orthogonal validation not achieved.
- RNA-to-Protein Discordance — FOLR1 is GPI-anchored; scRNA-seq may underestimate apical surface protein density. IHC confirmation required.
- No FOLR1 CAR-T Clinical Safety Data — all extrapolation relies on ADC (mirvetuximab) and TCB bispecific data which differ in mechanism.
- Seven Tissues Retain Coverage Gaps — salivary gland, thyroid, pituitary, fallopian tube, breast, uterus, and bladder are absent from the atlas.
10Final Verdict and Confidence
| Conclusion | Confidence | Basis |
|---|---|---|
| Kidney is the dominant on-target/off-tumor risk | High | Convergent scRNA-seq, GTEx, IHC ×3, in vivo biodistribution |
| Lung is a CRITICAL-tier liability | High | Multi-cell-type expression, GTEx highest, organ-on-chip confirmation |
| Hematopoietic safety window validated | High | Zero across 33 BM cell types + IHC |
| Normal ovarian stroma is FOLR1-negative | High | Convergent scRNA-seq (<0.1%) and GTEx (0.14 TPM) |
| Choroid plexus is CRITICAL-tier | Moderate | Functional consequence data from autoantibody model |
| Complete safety profile | Moderate | Seven tissue gaps + no CAR-T clinical data |
11Recommended Next Steps
- Priority 1: Comparative tumor/kidney FOLR1 IHC scoring — define protein-level therapeutic window.
- Priority 2: Salivary gland and thyroid FOLR1 IHC and scRNA-seq — confirm CRITICAL-tier designations.
- Priority 3: Affinity-dose relationship in normal tissue organoids — kidney tubular, airway, gastric.
- Priority 4: In vivo FOLR1 CAR-T organ function monitoring — serum creatinine, DLCO, CSF folate.
- Priority 5: Choroid plexus injury reversibility characterization with folinic acid supplementation.
- Priority 6: Fallopian tube FOLR1 scRNA-seq characterization — relevant to HGSOC biology.
Early-Onset Progressive Ataxia with Cerebellar Atrophy — Integrated Causal Gene and Druggability Report
1Original Research Question
For early-onset progressive ataxia with cerebellar atrophy, identify the top candidate causal genes, assess gene–disease associations using ClinVar and PubMed evidence, evaluate pathogenic variants and their functional consequences via VEP, and for each top gene produce a druggability assessment covering protein class, known ligands, ChEMBL bioactivity data, and tissue expression profiles.
2Executive Summary
Scout identified and ranked 10 candidate causal genes for early-onset progressive ataxia with cerebellar atrophy, organized into three tiers by convergent evidence strength. Tier 1 (5 genes: ANO10, ATM, COQ8A, SYNE1, SETX) each carry 56–94 pathogenic/likely pathogenic ClinVar variants with VEP-confirmed loss-of-function mechanisms and robust PubMed literature support. Tier 2 (3 genes: ITPR1, CACNA1A, AFG3L2) have strong genetic evidence but lower P/LP counts or more complex inheritance patterns. Tier 3 (2 genes: RFC1, FGF14) show emerging evidence with repeat-expansion-specific mechanisms not fully captured by standard variant annotation pipelines.
The molecular mechanisms converge on three major pathways: (1) mitochondrial energy metabolism (COQ8A, AFG3L2), (2) calcium signaling and ion channel function (ITPR1, CACNA1A), and (3) DNA damage response and genome integrity (ATM, SETX).
COQ8A emerged as the most therapeutically actionable gene: highest cerebellar expression (TPM 96.3), moderate druggability via its atypical kinase domain, ChEMBL target with bioactivity data, and an existing therapeutic precedent (idebenone/CoQ10 supplementation) that directly addresses the disease mechanism through substrate replacement.
4Detailed Findings
4.1 — Gene–Disease Association Ranking
| Gene | Tier | Inheritance | P/LP Count | Ataxia Condition |
|---|---|---|---|---|
| ANO10 | Tier 1 | AR | 94 | Autosomal recessive cerebellar ataxia type 3 (ARCA3) |
| ATM | Tier 1 | AR | 87 | Ataxia-telangiectasia |
| COQ8A | Tier 1 | AR | 72 | Primary CoQ10 deficiency-4 (cerebellar ataxia) |
| SYNE1 | Tier 1 | AR | 68 | SCAR8 / Autosomal recessive cerebellar ataxia type 1 |
| SETX | Tier 1 | AR / AD | 56 | AOA2 (AR); ALS4 (AD gain-of-function) |
| ITPR1 | Tier 2 | AD / AR | 43 | SCA15/SCA29 (AD); SCAR26 (AR) |
| CACNA1A | Tier 2 | AD | 38 | SCA6 / Episodic ataxia type 2 / FHM1 |
| AFG3L2 | Tier 2 | AD / AR | 29 | SCA28 (AD); SPAX5 (AR) |
| RFC1 | Tier 3 | AR | 12 | CANVAS (cerebellar ataxia, neuropathy, vestibular areflexia) |
| FGF14 | Tier 3 | AD | 8 | SCA27B (GAA repeat expansion) |
4.3 — GTEx Cerebellar Expression Profiles
| Gene | Cerebellar TPM | Cerebellum Rank (of 54) | Cerebellum/Median Ratio | Interpretation |
|---|---|---|---|---|
| COQ8A | 96.3 | Top 5 | 3.2× | Highest cerebellar expression — tissue-concordant with disease site |
| ITPR1 | 84.7 | Top 3 | 4.8× | Extreme cerebellar enrichment — Purkinje cell signature |
| CACNA1A | 71.2 | Top 5 | 3.9× | High cerebellar expression — consistent with Purkinje cell Ca2+ channel role |
| ATM | 42.8 | Top 15 | 1.4× | Moderate cerebellar expression — ubiquitous DNA damage response |
| SYNE1 | 38.1 | Top 20 | 1.1× | Broad tissue expression — cerebellar effect likely size-dependent (giant protein) |
| AFG3L2 | 34.6 | Top 15 | 1.6× | Moderate cerebellar enrichment — mitochondrial protease |
4.4 — Druggability Assessment per Gene
ITPR1 — High Druggability: Intracellular ligand-gated Ca2+ channel. Multiple modulators known (caffeine, 2-APB, xestospongin C). However, loss-of-function disease mechanism requires activity restoration, not inhibition — available ligands are primarily inhibitors. Druggable protein class but therapeutic direction is mismatched for LOF ataxia.
ATM — High Druggability with CONTRAINDICATION: PI3K-like serine/threonine kinase with extensive inhibitor pharmacology (AZD0156, AZD1390, KU-55933). Pharmacologically contraindicated — ATM inhibitors are cancer therapeutics designed to impair DNA damage response. In ataxia-telangiectasia, ATM is already lost; inhibitors would exacerbate disease.
COQ8A — Moderate Druggability: Atypical kinase (UbiB family). ChEMBL target with bioactivity data. Therapeutic precedent: CoQ10/idebenone supplementation directly addresses the metabolic deficiency caused by COQ8A mutations. Substrate replacement therapy is the most immediately actionable approach. Small-molecule activators of the kinase-like domain represent a more targeted but earlier-stage opportunity.
CACNA1A — Moderate Druggability: Voltage-gated Ca2+ channel. Acetazolamide is standard-of-care for episodic ataxia type 2. For progressive SCA6 (polyglutamine expansion), channel modulators have limited evidence of slowing neurodegeneration.
AFG3L2 — Low Druggability: Mitochondrial matrix AAA+ protease. No known small-molecule modulators. Substrate-level rescue or mitochondrial-targeted therapies are speculative.
SYNE1, SETX, ANO10 — Low Druggability: Structural (SYNE1: nuclear envelope giant protein), enzymatic (SETX: RNA/DNA helicase), or transporter (ANO10: lipid scramblase) — all lack established small-molecule pharmacology for activity restoration.
4.5 — Druggability Hierarchy Summary
| Rank | Gene | Druggability | Therapeutic Direction | Precedent |
|---|---|---|---|---|
| 1 | COQ8A | Moderate | Substrate replacement (CoQ10/idebenone) | Clinical use; active trials |
| 2 | CACNA1A | Moderate | Acetazolamide (EA2); limited for SCA6 | Standard-of-care for EA2 |
| 3 | ITPR1 | High (class) / Low (direction) | Activators needed; only inhibitors available | No restoration precedent |
| 4 | ATM | High (class) / Contraindicated | Restoration needed; only inhibitors exist | Inhibitors used in oncology |
| 5–10 | Others | Low | Gene therapy / ASO; no small molecule | Preclinical |
6Final Verdict and Confidence
| Conclusion | Confidence | Basis |
|---|---|---|
| ANO10, ATM, COQ8A, SYNE1, SETX are Tier 1 causal genes | High | Convergent ClinVar (56–94 P/LP), VEP (LOF confirmed), PubMed (extensive literature) |
| COQ8A is the most therapeutically actionable gene | High | Highest cerebellar TPM, ChEMBL target, CoQ10 supplementation precedent |
| ATM kinase inhibitors are contraindicated | High | LOF disease mechanism + inhibitor pharmacology = exacerbation |
| RFC1 and FGF14 are underrepresented in ClinVar | Moderate | Repeat expansion mechanisms not captured by standard variant annotation |
| 10-gene diagnostic panel is comprehensive | Moderate | Covers major AR and AD ataxia genes; may miss ultra-rare causes |
7Recommended Next Steps
- Priority 1: Clinical genetic testing using the 10-gene panel with repeat-expansion-specific assays for RFC1 (AAGGG) and FGF14 (GAA). Standard NGS panels miss these expansion mutations.
- Priority 2: For patients with COQ8A mutations — initiate CoQ10/idebenone supplementation trial (evidence-based therapeutic intervention with established safety profile).
- Priority 3: Functional validation of novel VEP-flagged variants of uncertain significance in ITPR1 and CACNA1A using patient-derived iPSC-Purkinje cell differentiation models.
- Priority 4: For CACNA1A-positive patients with episodic features — trial acetazolamide (standard-of-care for EA2 phenotype).
- Priority 5: Whole-genome sequencing if panel-negative — to capture structural variants, deep intronic variants, and regulatory mutations not detected by exome-level analysis.
- Priority 6: Longitudinal natural history characterization by genotype to inform future gene therapy and ASO trial design for currently undruggable targets (SYNE1, ANO10, SETX).
Cas9 Ortholog Phylogenetics, PAM-Interacting Domain Analysis, and Experimental Candidate Flagging
1Original Research Question
Construct a phylogenetic tree of characterized Cas9 variants across bacterial species, identify the most divergent orthologs, compute multiple sequence alignments of their PAM-interacting domains, predict whether any show different PAM recognition motifs, and flag candidates for experimental PAM profiling.
2Executive Summary
Scout assembled a dataset of 20 Cas9 orthologs spanning 4 phylogenetic clades (IIA-long, IIA-short, IIC, and outgroup/divergent), performed full-length multiple sequence alignment via MMseqs2, constructed a neighbor-joining phylogenetic tree, and extracted PAM-interacting (PI) domain contact residue profiles for all 20 orthologs by structural alignment to the SpCas9 reference (PDB 4ZT0).
The key finding is near-zero conservation at the 9 canonical PAM contact positions across the ortholog set (0/9 positions are universally conserved), confirming massive PAM diversity beyond the well-characterized NGG, NNGRRT, and NNNRRT motifs. A contact-residue classification framework was validated against 7 orthologs with experimentally known PAMs (7/7 correct predictions), then applied to the remaining 13 orthologs to flag candidates for experimental PAM profiling.
3 Tier-1 candidates (FvCas9, FpCas9, LiCas9) and 4 Tier-2 candidates were identified for experimental PAM-SCANR or HT-PAMDA profiling, with FpCas9 showing a particularly interesting pH-sensitive His residue at the R1335-equivalent position that may enable conditional genome editing.
4Detailed Findings
4.1 — Ortholog Dataset
| Ortholog | Species | Subtype | Length (aa) | PI Domain | Known PAM |
|---|---|---|---|---|---|
| SpCas9 | S. pyogenes | IIA-long | 1,368 | 1,099–1,368 | NGG |
| SaCas9 | S. aureus | IIA-short | 1,053 | 841–1,053 | NNGRRT |
| StCas9 | S. thermophilus (CRISPR1) | IIA-long | 1,121 | 897–1,121 | NNAGAAW |
| NmCas9 | N. meningitidis | IIC | 1,082 | 866–1,082 | NNNNGATT |
| CjCas9 | C. jejuni | IIC | 984 | 787–984 | NNNNRYAC |
| FnCas9 | F. novicida | IIA-long | 1,629 | 1,303–1,629 | NGG (weak) |
| SluCas9 | S. lutetiensis | IIA-long | 1,368 | 1,099–1,368 | NNGG |
| ScCas9 | S. canis | IIA-long | 1,375 | 1,103–1,375 | NNG |
| GeoCas9 | G. stearothermophilus | IIC | 1,087 | 870–1,087 | NNNNCRAA |
| BhCas9 | B. halodurans | IIC | 1,092 | 874–1,092 | NNNNCNDD |
| FvCas9 | F. varium | Divergent | 1,548 | 1,239–1,548 | Unknown |
| FpCas9 | F. prausnitzii | Divergent | 1,476 | 1,181–1,476 | Unknown |
| LiCas9 | L. innocua | IIA-short | 1,078 | 862–1,078 | Unknown |
| TdCas9 | T. denticola | IIA-long | 1,395 | 1,116–1,395 | NAAAAC |
| PdCas9 | P. disiens | IIA-long | 1,401 | 1,121–1,401 | Unknown |
| SmCas9 | S. mitis | IIA-long | 1,362 | 1,093–1,362 | NAG > NGG |
| SiCas9 | S. intermedius | IIA-long | 1,370 | 1,098–1,370 | Unknown |
| ClCas9 | C. lari | IIC | 1,003 | 802–1,003 | Unknown |
| AsCas9 | A. succinogenes | IIA-long | 1,395 | 1,116–1,395 | Unknown |
| PaCas9 | P. acnes | IIC | 1,054 | 843–1,054 | Unknown |
4.2 — Phylogenetic Tree Topology
The neighbor-joining tree resolved 4 major clades with high bootstrap support (≥85%) at all deep nodes:
Newick: ((((SpCas9,SluCas9,ScCas9,SmCas9,SiCas9),StCas9,(FnCas9,TdCas9,PdCas9,AsCas9))IIA-long,(SaCas9,LiCas9)IIA-short),(NmCas9,CjCas9,GeoCas9,BhCas9,ClCas9,PaCas9)IIC,(FvCas9,FpCas9)Divergent);
| Clade | Members | PAM Pattern | Key Feature |
|---|---|---|---|
| IIA-long | SpCas9, SluCas9, ScCas9, SmCas9, SiCas9, StCas9, FnCas9, TdCas9, PdCas9, AsCas9 | NGG / NNG / NNAGAAW / NAAAAC | Largest clade; G-rich PAMs predominate; R1333/R1335-equivalent positions key |
| IIA-short | SaCas9, LiCas9 | NNGRRT / Unknown | Shorter PI domain; LiCas9 has Pro at D1135-eq position |
| IIC | NmCas9, CjCas9, GeoCas9, BhCas9, ClCas9, PaCas9 | NNNNGATT / NNNNRYAC / NNNNCRAA / NNNNCNDD | Longer PAMs (8-nt); fundamentally different PI domain architecture |
| Divergent | FvCas9, FpCas9 | Unknown | Outgroup; longest branches; fully non-canonical contact profiles |
4.5 — PAM Contact Residue Profiles
Nine canonical PAM contact positions were mapped from the SpCas9:PAM co-crystal structure (PDB 4ZT0). Conservation analysis across the 20 orthologs revealed:
| SpCas9 Position | SpCas9 Residue | Function | Conservation (20 orthologs) | Most Variant Substitutions |
|---|---|---|---|---|
| D1135 | Asp | PAM +1 base contact | 25% (5/20) | Pro (LiCas9), Asn, Ser, Gln |
| S1136 | Ser | PAM backbone contact | 30% (6/20) | Thr, Ala, Gly |
| G1218 | Gly | PI domain hinge | 40% (8/20) | Ala, Ser |
| E1219 | Glu | PAM +1 water-mediated | 20% (4/20) | Asp, Gln, Asn, His |
| R1333 | Arg | PAM +2G direct H-bond (major) | 35% (7/20) | Glu (FvCas9), Lys, Gln, Asn |
| R1335 | Arg | PAM +3G direct H-bond (major) | 30% (6/20) | His (FpCas9), Gln, Lys, Thr |
| T1337 | Thr | PAM backbone contact | 20% (4/20) | Ser, Asn, Ala, Val |
| K1107 | Lys | Phosphate backbone grip | 35% (7/20) | Arg, Asn, Gln |
| S1109 | Ser | PAM-proximal backbone | 25% (5/20) | Thr, Ala, Asn |
4.7 — Experimental PAM Profiling Candidates
Tier 1 — Highest Priority (fully non-canonical contact profiles):
| Candidate | Clade | Key Substitutions | Rationale | Recommended Assay |
|---|---|---|---|---|
| FvCas9 | Divergent | Glu at R1333-eq; Gln at R1335-eq | Unique acidic residue at major G-reading position — predicts non-G PAM; longest branch in tree | HT-PAMDA (high-throughput PAM determination assay) |
| FpCas9 | Divergent | His at R1335-eq; Asn at E1219-eq | pH-titratable His may enable conditional PAM recognition; fully non-canonical profile; co-outgroup with FvCas9 | HT-PAMDA at pH 6.5 and pH 7.4 |
| LiCas9 | IIA-short | Pro at D1135-eq; divergent from SaCas9 | Proline introduces backbone rigidity at PAM +1 contact — may constrain or alter base-reading geometry; sister to SaCas9 but with different predicted specificity | PAM-SCANR or bacterial selection |
Tier 2 — Secondary Priority (partially non-canonical):
| Candidate | Clade | Key Feature | Rationale |
|---|---|---|---|
| PdCas9 | IIA-long | Asn at R1333-eq | Loss of Arg at major G-reading position; NAAAAC-like PAM predicted but unverified |
| SiCas9 | IIA-long | Lys at R1335-eq | Charge-conservative but size-altered substitution; may subtly shift PAM preference |
| ClCas9 | IIC | Fully divergent IIC PI domain | Long-PAM architecture; no experimental data available |
| PaCas9 | IIC | Fully divergent IIC PI domain | Distinct from all characterized IIC members; potential novel long-PAM |
7Final Verdict and Confidence
| Conclusion | Confidence | Basis |
|---|---|---|
| 4-clade tree topology is robust | High | Bootstrap ≥85% at all deep nodes; consistent with published Cas9 phylogenies |
| PAM contact residues are near-zero conserved | High | Structural alignment to PDB 4ZT0; 0/9 universally conserved; validated 7/7 known PAMs |
| FvCas9 and FpCas9 have novel PAM specificities | High | Fully non-canonical at both major G-reading positions; outgroup placement; no known PAM |
| FpCas9 His substitution enables pH-conditional editing | Moderate | Biochemical plausibility from His pKa (~6.0); requires experimental validation |
| Contact-profile framework generalizes to uncharacterized orthologs | Moderate | 7/7 validation set; but small N and all from well-characterized subtypes |
8Recommended Next Steps
- Priority 1: Express and purify FvCas9 and FpCas9 for HT-PAMDA profiling. FpCas9 should be assayed at both pH 6.5 and pH 7.4 to test the His-dependent conditional activity hypothesis.
- Priority 2: PAM-SCANR bacterial selection for LiCas9 to determine its divergence from SaCas9 (NNGRRT) given the Pro substitution at the D1135-equivalent position.
- Priority 3: Expand the ortholog dataset to 50+ Cas9 sequences from metagenomic sources to increase phylogenetic resolution and discover additional divergent PI domain architectures.
- Priority 4: Structural modeling (AlphaFold2) of Tier-1 candidate PI domains in complex with candidate PAM duplexes to predict PAM specificity prior to experimental profiling.
- Priority 5: For any confirmed novel-PAM Cas9 — immediately assess human genome targetable site density to determine therapeutic applicability versus SpCas9/SaCas9 baselines.
- Priority 6: Engineering campaign: if FpCas9 confirms pH-conditional activity, develop it as a spatially-controlled editor for acidic microenvironments (tumors, lysosomes, inflamed tissue).
APOE rs429358 — Comprehensive Integrated Genomic Report
1Executive Summary
This report integrates all pipeline outputs from the Scout investigation of rs429358, the APOE ε4-defining coding variant. The variant encodes the canonical p.Cys112Arg substitution in the mature APOE protein (MANE Select transcript ENST00000252486, NM_000041.4:c.388T>C).
Despite receiving benign/tolerated scores from SIFT (1.0), PolyPhen-2 (0.0), and AlphaMissense (0.0365), rs429358 carries 10 distinct ClinVar clinical significance categories and is associated with over 800 GWAS/ClinVar phenotypes — a well-characterized false-negative outcome of sequence-conservation tools for variants whose pathogenicity is mediated through protein conformation and network interaction remodeling.
2Population-Stratified LD Landscape
LD analysis revealed a qualitatively distinct population-stratified landscape:
| Population | Partners at r²≥0.8 | Top Partner | Top r² |
|---|---|---|---|
| EUR (1000G Phase 3) | 0 | rs769449 | 0.766 |
| EAS (1000G Phase 3) | 6 | rs769449 | 0.888 |
This stark disparity means haplotype-tag SNP panels designed in European cohorts will systematically undercount the ε4 regulatory haplotype in East Asian populations.
3Regulatory Architecture
Ensembl Regulatory Build identified 14 features in the 35 kb region: 8 enhancers, 3 promoters, 3 CTCF sites. Critically, rs429358 itself falls in a ~4,096 bp inter-regulatory gap, but its top LD partners overlap key regulatory elements:
| Variant | EUR r² | EAS r² | Regulatory Overlap |
|---|---|---|---|
| rs769449 | 0.766 | 0.888 | Enhancer ENSR19_C6GBF |
| rs10119 | — | 0.865 | Enhancer ENSR19_C6GB8 |
4GTEx eQTL Evidence
GTEx v10 confirms the ε4 haplotype reduces APOE expression in skin (NES=−0.141, p=3.05×10&sup5;). Key limitation: Brain_Cortex and Liver APOE eQTL records were absent from the GTEx agent payload — the two most disease-relevant tissues for rs429358 could not be assessed.
5ClinVar Co-localization
ClinVar analysis across four genes (APOE, TOMM40, APOC1, NECTIN2) confirmed that the clinical pathogenic variant burden is entirely concentrated in APOE, with zero TOMM40-, APOC1-, or NECTIN2-specific variants meeting pathogenic classification thresholds.
6Data Sources & Provenance
Sources: Ensembl Variation, Ensembl VEP, Ensembl LD Window (1000G Phase 3), Ensembl Regulatory Build, ClinVar, GTEx v10. Reference genome: GRCh38.
Rivaroxaban Repurposing Analysis
Scout Automated Drug Repurposing Investigation
1Executive Summary
This investigation systematically characterized the repurposing landscape around rivaroxaban across 30 compounds and 9 mechanistic classes. The pipeline resolved molecular properties and clinical phase data for all 30 candidates via ChEMBL, retrieved biochemical FXa IC50 data for 7 of 8 direct xaban inhibitors, interrogated PubMed for mechanistic and safety literature, and searched ClinicalTrials.gov for trial registry data.
Key findings: Apixaban is the highest-priority approved oral alternative (full three-indication overlap, best bleeding profile, renal-independent clearance). Among discontinued xabans, eribaxaban holds Rank 1 by molecular properties (IC50 = 0.32 nM, best QED, zero Ro5 violations). A novel TMPRSS2/antiviral signal was identified for otamixaban.
2Direct FXa Inhibitor Panel
| Compound | Status | FXa IC50 (nM) | QED | Ro5 Viol. | Route |
|---|---|---|---|---|---|
| Rivaroxaban (REF) | Approved | 0.7 | 0.78 | 0 | Oral |
| Apixaban | Approved | 2.26 | 0.63 | 0 | Oral |
| Edoxaban | Approved | 3.4 | 0.48 | 1 | Oral |
| Betrixaban | Approved | 1.5 | 0.39 | 0 | Oral |
| Eribaxaban | Disc. Ph2 | 0.32–0.57 | 0.57 | 0 | Oral |
| Darexaban | Disc. Ph2/3 | 54.6 | 0.47 | 0 | Oral |
| Otamixaban | Disc. Ph3 | 0.59–0.93 | 0.16 | 0 | IV only |
| Razaxaban | Disc. Ph2 | N/R | 0.32 | 1 | Oral |
3Tiered Recommendations
Tier 1 — Approved Oral FXa Inhibitors
Rank 1: Apixaban — Full three-indication overlap (VTE, DVT/PE, AF stroke prevention), best-in-class clinical bleeding profile, renal-independent clearance. High Confidence.
Rank 2: Edoxaban — Full overlap, once-daily dosing, but requires renal dose adjustment. 1 Ro5 violation.
Rank 3: Betrixaban — Narrow indication (VTE prophylaxis only). hERG IC50 = 8,900 nM flagged.
Tier 2 — Distinct Mechanism
Dabigatran etexilate — Direct thrombin inhibitor with full VTE/AF overlap. Available reversal agent (idarucizumab). Additional TMPRSS2 off-target signal confirmed. High Confidence.
Tier 3 — Parenteral Alternative
Fondaparinux — Indirect FXa inhibitor, ACC/AHA guideline-endorsed. Subcutaneous route is the principal limitation.
Tier 4 — Discontinued Candidates
Eribaxaban (Rank 1) — Most potent FXa inhibitor in dataset (0.32 nM), best QED. Discontinuation reason UNRESOLVED. Moderate Confidence.
Darexaban (Rank 2) — Hepatotoxicity definitively resolved as absent (PMID 21878434). Discontinued for bleeding in triple-therapy context.
4Novel Finding: TMPRSS2 Signal
Otamixaban inhibits TMPRSS2 (IC50 = 620 nM), corroborated by two independent studies. However, a ~10,000–30,000-fold cellular potency gap precludes direct clinical translation. Camostat (6.6 nM) and nafamostat (1.4 nM) remain preferred TMPRSS2 leads for antiviral application.
5Data Sources & Provenance
Sources: ChEMBL (molecule, activity, similarity), PubMed (10 papers), ClinicalTrials.gov, EuropePMC, EudraCT. Key PMIDs: 21878434, 35648838, 35294338, 23619611, 19485937.
Not a chatbot. An agentic operating system.
Scientist-in-the-Loop
Scout augments expert judgment, never bypasses it. ASK_USER pauses when the planner encounters genuine ambiguity. The scientist directs; the AI executes.
Hundreds of Specialized Agents
Not one model trying to do everything. Hundreds of purpose-built computational agents — ADMET, docking, retrosynthesis, expression profiling, literature mining — orchestrated by an LLM planner, with more added every week.
Two Modes of Operation
Hand-craft validated pipelines that run autonomously across hundreds of compounds. Or let Scout plan from a research question. The pipeline handles volume; the planner handles novelty.
See Scout answer your research questions
Schedule a live demo with a real research question. No slides. Just science.
Request a Demo