Interpretable deep learning for structure-based molecular discovery.

We build interpretable models for structure-based drug discovery, connecting deep learning predictions to mechanistic evidence, with calibrated uncertainty and explicit attribution so teams can distinguish model limitations from biological signal.

Protein-ligand complex: structure-based binding site visualization

Bioforager

Enrichment & prioritization workflows

Multi-stage computational funneling from large enumerated libraries to nominated chemotypes; each gate emits scores, rationales, and uncertainty so medicinal chemistry can trace promotions and deprioritizations before synthesis.

Bioforager: ligand interaction analysis Bioforager: homology map across related binding sites
See product

Genesys model family

Multimodal foundation models for biomolecular systems

Joint representation of proteins, peptides, small molecules, and analogs under a single architectural prior, with data and compute efficiency suitable for iteration, and inspection hooks aligned to structural and dynamical validation.

Architecture

  • Unified encoding of protein, peptide, and small-molecule chemotypes including close analogs
  • Multi-Unimodal (MUM) training for incremental extension across modalities without full retraining
  • Latent organisation over protein / domain classes

Sample & compute efficiency

  • Fewer labelled examples than typical benchmark regimes
  • Lower training and inference budgets at matched task fidelity
  • Emphasis on out-of-distribution robustness for novel scaffolds

Inspection & attribution

  • Intermediate representations exposed for audit
  • Attention and layer-wise probes; coupling to conformational ensembles where applicable
  • Confidence and failure-mode decomposition for downstream QA

Genesys overview

Circe

Model introspection for discovery teams

Agentic workflows that interrogate discriminative and generative components: where uncertainty concentrates, which features drive a prediction, and whether evidence is consistent across poses, homologs, and auxiliary readouts, prior to IND-enabling spend.

Circe structure-based output visualization

Scientific & operational differentiation

Design choices aimed at translational risk, analogous to platform companies that pair large-scale inference with measurable feedback loops, but with interpretability as a first-class output.

  1. Epistemic vs ontic error. Decompose apparent failures into model misspecification, dataset shift, and scoring artefacts versus biophysical mechanisms (off-target engagement, kinetics, ADME liabilities).
  2. Ensemble and perturbed complexes. Stress-test docked or predicted poses under conformational and side-chain variation; rank compounds that are stable across perturbations.
  3. Homology-aware selectivity. Explicit mapping of predicted affinity and confidence across related binding sites to anticipate polypharmacology and anti-target risk.
  4. Traceable funnel decisions. Each enrichment stage exports rationales tied to inputs and model state, reducing undiagnosed false negatives and false positives in the medchem loop.
  5. Retrospective and prospective validation. Benchmark against clinical and preclinical outcomes where labels exist; partner-forward studies on novel targets and chemotypes.
  6. Documentation for governance. Structured narratives suitable for internal QA and external scrutiny when predictions and outcomes diverge.

Computational platform coverage

Custom model and workflow integration for sponsor teams, from target validation through lead optimisation, in line with structure-enabled discovery programs.

Ramachandran ensemble: conformational sampling for target assessment

Target & structure assessment

  • Tractability and druggability framing for nominated targets
  • Mutation and isoform-aware pocket definition
  • Force-field parameterised conformational ensemble sampling
  • QM Hamiltonian-based algorithms to probe MD-relevant conformational change
Protein complex four-view poses in solvent for hit identification

Hit identification & optimisation

  • Affinity and pose prediction with uncertainty propagation
  • Exploration of analog series under SAR constraints
  • Physics-grounded interaction fingerprint scoring
  • Prioritisation for synthesis under interpretable criteria
Ca displacement correlation map for lead nomination analysis

Lead nomination

  • Structure-aware shortlisting with explainable filters
  • Binding free energy decomposition across candidate series
  • Local structural perturbation analysis (R-group, linker)
  • Explicit handling of false-positive structural hypotheses
Publication: Multi-Unimodal Model (MUM) architecture, ISBI. IEEE Xplore.