Noah Flynn

Druggability, ligandability, and modality choice in the AlphaFold 3 era

2026-05-08T16:20:00+00:00

In 2002, a foundational analysis estimated that roughly 10–15% of the human proteome - the full set of proteins a genome can produce - could be hit by small-molecule drugs. Restated in current terms, 80–90% of the proteome is “undruggable” by conventional small-molecule inhibitors: proteins without the deep, hydrophobic pockets that support high-affinity ligand binding, including flat protein–protein interaction surfaces, where proteins contact each other over broad shallow areas, transcription factors, and intrinsically disordered regions, which do not settle into a stable folded structure. That estimate shaped two decades of target selection. If a protein fell outside the druggable fraction, teams usually moved on to another project. By 2026, the picture is different. AlphaFold 3 can jointly predict a protein’s structure with its ligand, DNA, RNA, and post-translational modifications in one pass. The AlphaFold Database now covers roughly 200 million predicted structures. One PROTAC is awaiting an FDA June 5, 2026 PDUFA decision - the agency’s target date for acting on a New Drug Application - and the first regulatory test of a modality class that barely existed in the clinic a decade ago. The operative question in 2026 is which modality fits this target best?

Concept Translation: “Undruggable” is an artifact of the tool - it means the current inhibitor toolkit (small molecules that fit deep pockets) can’t act on this protein. It does not mean the protein is intrinsically resistant to intervention. The ML analogy: a feature you cannot perturb with your current intervention API. Once the API expands (PROTACs, ASOs, ADCs), the same protein moves from undruggable to druggable without any biological change.

Why druggability assessment matters in 2026

Two shifts have moved the frontier of druggability at roughly the same time, and together they change the analysis.

The first is structural. AlphaFold 2 turned single-chain protein structure prediction from a research problem into routine infrastructure in 2020. AlphaFold 3 in May 2024 extended that capability to biomolecular complexes - proteins with small-molecule ligands, with DNA and RNA, with ions, and with post-translational modifications, all in a single diffusion-based model (Abramson et al., Nature 630:493–500, 2024). The AlphaFold 2 work earned Demis Hassabis and John Jumper a share of the 2024 Nobel Prize in Chemistry, with David Baker taking the other half for computational protein design. AF3 now sits inside a broader field that includes Boltz-1 and Boltz-2 (MIT Jameel Clinic), Chai-1 and Chai-2 (Chai Discovery), Protenix (ByteDance), HelixFold-3 (Baidu), and OpenFold3. Predicted structures are now a default input to virtual screening - the computational ranking of large compound libraries against a target - rather than a special case.

The second shift is modality. Antisense oligonucleotides (ASOs), small interfering RNAs (siRNAs), targeted protein degraders (PROTACs and molecular glues), antibody-drug conjugates (ADCs), and cyclic peptides all have approved products in the market or late-stage assets in development. Each of these expands the “druggable” envelope along a different axis. An RNA-binding ASO can modulate a protein that has no structured binding pocket at all. A PROTAC can take down a scaffolding protein - one that organizes other proteins rather than catalyzing a reaction - even when there is no catalytic activity to inhibit. An ADC can deliver a cytotoxic payload selectively to tumor cells whose surface antigen (a protein displayed on the cell surface) is otherwise non-actionable.

A 2015-era druggability analysis might ask does this target have a small-molecule binding pocket, and is its structure known? A 2026 analysis has to ask that question across at least five modality classes. It also has to answer the harder follow-up: if several modalities are tractable, which one fits this target’s biology, safety profile, and commercial context best?

What “druggable” actually means

The vocabulary needs care because the field uses these terms loosely.

Druggability refers to the ability of a protein, peptide, or nucleic acid to be modulated by a drug. The 2002 analysis that anchored the field estimated that about 3,000 of the ~20,000 human protein-coding genes - roughly 15% - are druggable in the small-molecule sense, and that only around 27% of those druggable genes actually have an approved drug. Small-molecule drugs target roughly 2–5% of the human genome in total. That figure marks the reach of current chemistry.

Druggability decomposes into two more specific questions.

Target quality asks whether modulating the target will actually change the disease, and whether doing so is safe. The relevant variables are the target’s role in essential cellular processes, its network centrality (how many pathways and interactions depend on it), its tissue-expression pattern (a target essential for heart-muscle function is a bad target, almost regardless of its affinity profile), its approval-adjacent precedent (whether closely related targets or modalities have already cleared regulatory review), and its cellular localization (an intracellular target requires permeability that a cell-surface target does not). Target quality is what [target–disease evidence] → How to tell a drug target matters: evidence frameworks for target–disease linkage (C1) gives you the machinery to evaluate.

Ligandability asks whether a drug molecule can bind the target in a way that modulates it. Ligandability depends on three things: the availability of the target’s structure (experimental or predicted), the presence of closely related proteins with known ligands (if a close cousin of your target has been drugged, the odds are better), and the accessibility of a binding site (a small-molecule pocket for small molecules, a regulatory region for nucleic-acid-targeted drugs, or a surface-exposed patch for antibodies).

Concept Translation: Target quality and ligandability are independent axes. A target can be highly ligandable - it has a deep, well-defined pocket, a structure, and a binder - while being a terrible drug target if hitting it doesn’t change disease or causes serious toxicity. Many of the 80–90% of proteins called “undruggable” in older analyses are really not yet ligandable by small molecules; their target quality may be excellent. New modalities are mostly ligandability-extenders, not target-quality-extenders.

Classical small-molecule druggability tools such as DoGSiteScorer (Volkamer et al., 2012), fpocket (Le Guilloux et al., 2009), and SiteMap (Halgren, 2009) turn that third question into a quantitative pocket score. They compute geometric and chemical descriptors of candidate cavities. These remain widely cited workhorses. Newer ML-based approaches (PockDrug, PocketMiner) have emerged and are worth tracking, but the original three are still common reference points in 2026 ligandability analyses. Structure prediction enters the picture at both stages of ligandability: it provides a structure for targets that have never been crystallized, and it increasingly provides one with a candidate ligand already docked - computationally placed into a proposed binding pose.

The classical druggability landscape by protein family

Before structure prediction started moving the boundary, the druggable genome was shaped largely by which protein families had accessible binding sites and tractable chemistry. A handful of families dominate approved drugs.

G protein-coupled receptors (GPCRs) are the single most targeted class. They sit in the cell membrane and relay outside signals inward. Their extracellular ligand-binding domains can be reached by small molecules, peptide-like molecules, antibodies, and allosteric modulators - drugs that bind away from the main ligand site and tune receptor activity.

Ion channels provide a good contrast. The sequenced human genome identified more than 400 putative ion channels, though only a fraction have been cloned and functionally characterized. Their widespread tissue distribution and the physiological consequences of opening or closing them make them both compelling and treacherous targets.

Kinases are the third-most-targeted group. Kinases transfer phosphate groups from ATP to substrate proteins, and the ATP-binding pocket is the canonical small-molecule site. Five inhibitor strategies recur across the literature: ATP mimics (the default), covalent inhibitors (which form a bond with a cysteine in or near the pocket and are irreversible), bivalent inhibitors (binding the catalytic site and a second stability or localization site), allosteric inhibitors that bind distal regulatory sites (type III and IV), and degraders or molecular glues that hijack the ubiquitin–proteasome system to remove the kinase rather than inhibit it.

Concept Translation: Kinases are useful to keep in mind because they’re the canonical “well-modeled” target family - thousands of crystal structures, deep mutagenesis data, and a consistent pocket geometry across the family. They’re roughly the ImageNet of structure-based drug design: the family where everything was tuned first and where most ML benchmarks were established. The frontier targets are unlike kinases - flat surfaces, no obvious pocket, sparse training data - and that’s why the modality expansion matters.

Kinase degraders blur the classical taxonomy. A “kinase inhibitor” and a “kinase degrader” act on the same target through different mechanisms, so the druggability criteria differ as well. That overlap sets up the rest of this post.

Structure prediction and the 2020/2024 inflections

For most of drug discovery’s history, structure-based drug design was available only for targets that had been successfully crystallized. The c-Met/crizotinib program is a useful illustration of that earlier workflow. c-Met is a receptor tyrosine kinase with elevated levels and abnormal activity in several cancers. Pfizer’s program started from a series of pyrrole-substituted 2-indolinones (oxindoles), ATP-competitive tyrosine kinase inhibitors first explored in the mid-1990s, that showed activity against c-Met but poor drug-like properties. Iterative co-crystallography with PHA-665752 (an early lead) revealed the c-Met ATP-pocket environment in detail; that structure motivated a switch to a novel 2-amino-5-aryl-3-benzyloxypyridine series, which yielded the 3,5-disubstituted 2-aminopyridine clinical candidate crizotinib (PF-02341066, Cui et al., J. Med. Chem. 54:6342–6363, 2011). Each iteration of the design cycle was gated on a new crystal. Crizotinib, a c-Met/ALK dual inhibitor, was approved by the FDA in 2011.

That workflow still works, and for targets that crystallize well it remains the standard. Crystallography, however, is slow and expensive. For many targets - especially membrane proteins, intrinsically disordered regions, and transient complexes - it either fails or produces low-resolution artefacts that are hard to use for design.

Two inflection points changed the practical calculus.

2020: AlphaFold 2. At CASP14, the community benchmark competition for protein-structure prediction, DeepMind’s AlphaFold 2 produced predictions indistinguishable from experimental structures for a majority of single-chain proteins. By 2022 the AlphaFold Database had released ~800,000 predicted structures. By 2025 it was around 200 million predicted structures, covering the bulk of UniProt, the standard protein-sequence database. Single-chain structure prediction stopped being a bottleneck.

2024: AlphaFold 3. AF3 (Abramson et al., Nature 630:493–500, 2024) moved from single-chain prediction to complex prediction using a diffusion-based architecture built around a Pairformer block - which updates pairwise residue relationships - plus a diffusion module. It jointly predicts proteins with small-molecule ligands, DNA, RNA, ions, and post-translational modifications in a single forward pass. On the PoseBusters ligand-pose benchmark, a test of whether models place ligands in the right binding pose, DeepMind reports AF3 as roughly 50% more accurate than the best traditional physics-based docking tools. That pushes protein–ligand pose-prediction success rates to about 76% versus roughly 50% for the docking programs AutoDock Vina and Gold on the same benchmark, and makes AF3 the first AI system to outperform physics-based tools for biomolecular structure prediction. Model weights were released for non-commercial academic use in November 2024, with broader (still non-commercial) availability in February 2025. Commercial use remains gated behind Isomorphic Labs partnerships.

Concept Translation: The AF2-to-AF3 jump is conceptually familiar to anyone who has watched generative-model architectures evolve. AF2 was a transformer-style model that emitted a single deterministic structure. AF3 swaps the structure-emission head for a diffusion module - the same general framework as image diffusion models - that samples from a distribution over plausible structures. The Pairformer is a transformer-style block that operates on pairwise residue relations rather than tokens. The architectural pattern (encoder over pairwise relations + diffusion decoder) is now the template for every AF3-class model in the field.

The competitive field matters because AF3’s non-commercial licensing makes commercially usable alternatives important for industrial drug discovery:

Boltz-1 (MIT, Nov 2024): MIT license, fully open source, matches AF3 accuracy on standard benchmarks. Boltz-1x added physics-based steering to reduce steric clashes; Boltz-2 (2025) added binding-affinity prediction, claiming roughly 2× the precision of standard ML and docking baselines on hit discovery.
Chai-1 and Chai-2 (Chai Discovery): Apache 2.0 license, inference-only weights, can run with or without a multiple-sequence alignment. Chai-2 (2025) introduced zero-shot de novo antibody design with reported experimental hit rates around 16–20% on novel targets.
Protenix (ByteDance, 2025), HelixFold-3 (Baidu), and OpenFold3 (in preview as of late 2025) round out the field.

By 2026, teams treat predicted structures as routine inputs to virtual screening and structure-based design. AF3-class models have substantially reduced the need for experimental crystallographic structures in early-discovery programs, particularly for targets where co-crystal structures are slow or expensive to generate. Virtual screening campaigns at 10K-compound scale now run in days on cloud GPUs.

A worked example: AlphaFold-enabled design for an undrugged target

A 2022 hepatocellular carcinoma (HCC) program provides the first published demonstration of using AlphaFold-predicted structures for hit identification in practice (Ren, Ding, Zheng et al., Chemical Science 14:1443–1452, 2023). The target, CDK20, also known as cell cycle-related kinase (CCRK), had no experimental structure and was effectively undrugged. Traditional structure-based design on CDK20 would have required first solving its crystal structure, a multi-year project in itself.

Instead, the team used the AlphaFold-predicted structure (AF-Q8IZL9-F1-model_v1) of CDK20 directly. The prediction was high-confidence across most of the protein except the C-terminus, which they removed because it occluded the solvent-exposed region of the ATP pocket. Pocket analysis identified a shallow ATP binding pocket of approximately 150 Å³ in the DFG-in conformation - one common active-state arrangement of kinase active-site residues - with MET84 as the hinge residue and PHE81 occupying the gatekeeper position (the hinge is where many kinase inhibitors make key hydrogen bonds; the gatekeeper residue helps control access to the back pocket). They fed the AlphaFold structure into Insilico Medicine’s generative chemistry platform (Chemistry42), which generated 8,918 candidate molecules. After docking, clustering, and pose inspection, seven were synthesized for the first round of testing. The first hit (ISM042-2-001) bound CDK20 with a Kd of 9.2 ± 0.5 µM (lower Kd means tighter binding) and was identified 30 days from target selection. A second optimization round produced ISM042-2-048, with Kd ≈ 567 nM and a CDK20 kinase-inhibition IC50 of 33.4 ± 22.6 nM - the concentration required to cut enzyme activity by half. ISM042-2-048 also showed selective antiproliferation against the CDK20-overexpressing HCC line Huh7 (cellular IC50 ≈ 209 nM) versus the HEK293 counter-screen (~1707 nM), a comparison cell line used to check selectivity.

AlphaFold converted a target that would previously have been deferred until crystallography succeeded into a target that could be designed against immediately. In programs that use this workflow, the structural-modeling portion of the pipeline has moved from years to weeks. Hit-to-lead optimization, preclinical pharmacology, and everything downstream still take the time they always did.

Compared with the crizotinib workflow described above, the difference is sharp. Crizotinib needed a crystal of each intermediate compound; CDK20’s lead compound was designed against an AI-predicted structure before any experimental structure existed. Structure-based drug design used to be gated on crystallography. In 2026 it is gated on whether the structure-prediction model is accurate enough at the specific pocket you care about. That is a softer gate, and the tooling keeps improving.

Beyond small molecules: modality choice is now a first-class question

Categories of therapeutic molecule that did not have approved products a decade ago now have market-stage drugs and late-stage candidates, each reaching targets that small molecules cannot and expanding the druggable envelope.

Targeted protein degradation (PROTACs and molecular glues)

Targeted protein degradation (TPD) shifts pharmacology from occupancy-driven - where you block a protein’s function by occupying its binding site - to event-driven, where you induce the cell’s endogenous waste-disposal machinery to destroy the protein outright. The relevant machinery is the ubiquitin–proteasome system (UPS), the main pathway cells use to tag and remove unwanted intracellular proteins. A cascade of three enzymes - E1 (activating), E2 (conjugating), and E3 (ligase) - tags target proteins with polyubiquitin chains and marks them for destruction by the 26S proteasome. TPD drugs hijack this cascade.

Concept Translation: The occupancy-vs-event shift is closer to a control-flow change than a chemistry change. An occupancy-driven inhibitor is a function call that holds an object in a blocked state for as long as the call is on the stack - release the call, the object returns to normal. An event-driven degrader is a single instruction that triggers garbage collection on the object. The first scales with drug concentration; the second scales with the rate at which the cell remakes the protein. That difference is why PROTACs can be effective at much lower concentrations than equivalent inhibitors and why their pharmacology is often dominated by protein resynthesis rates rather than drug residence time.

A PROTAC (PROteolysis TArgeting Chimera) is a heterobifunctional molecule - it has one binding end for the target and another for an E3 ligase, joined by a linker. By forming a ternary complex (a three-part assembly of target, PROTAC, and ligase), it brings the target into proximity with the ubiquitination machinery. Because of this tripartite structure, PROTACs are large molecules, typically 700–1,000 Da. A PROTAC also acts catalytically. Once the target is ubiquitinated and released into the proteasome, the PROTAC dissociates and recruits another target copy, so one degrader molecule can remove many target molecules. It can therefore hit targets that have no functional small-molecule binding site at all, as long as some binder exists for the target.

Molecular glues are the close cousin: small molecules typically <500 Da, with no linker, that bind directly to an E3 ligase and induce a conformational change in the ligase surface, creating a novel interaction interface that recruits “neosubstrate” proteins (proteins the ligase would not normally bind) for degradation. The original IMiDs (thalidomide, lenalidomide, pomalidomide) are retrospectively understood as molecular glues. In 2026 the line between PROTACs and molecular glues is blurring; newer-generation degraders combine features of both.

The opportunity space is large. The human genome encodes over 600 E3 ligases, of which fewer than 2% have been successfully exploited for TPD to date. Most current PROTACs recruit just a few E3 ligases, especially CRBN or VHL; DCAF15 and MDM2 are emerging; the rest of the ligase repertoire is largely untouched. Tumor-specific E3 ligases are an active frontier for tissue-selective degradation. The motivation is direct: a CRBN-recruiting PROTAC can degrade its target in healthy tissue as well as diseased tissue.

The state of TPD in 2026 is a useful test case for “modality choice as a first-class question”:

Bavdegalutamide (ARV-110), Arvinas’s androgen-receptor PROTAC, was the first PROTAC to enter human clinical trials (Phase 1, 2019). In patients with AR T878X/H875Y mutations in ARDENT, it produced 46% PSA50 response rates, where PSA50 is a prostate-cancer endpoint based on prostate-specific antigen decline. Development was discontinued in 2024 in favor of ARV-766, a next-generation AR degrader with broader mutant coverage. The first clinical entrant did not become the lasting lead program.
Vepdegestrant (ARV-471), Arvinas/Pfizer’s estrogen-receptor PROTAC, reached Phase 3 (VERITAC-2) in estrogen-receptor-positive, HER2-negative breast cancer after prior CDK4/6 inhibitor failure. Topline results in March 2025 met the primary endpoint in the ESR1-mutant subpopulation, the subgroup whose tumors carried ESR1 mutations (median PFS 5.0 vs 2.1 months, HR 0.57, p<0.001). PFS is progression-free survival, and HR is the hazard ratio comparing event rates over time. The study did not reach statistical significance in the intent-to-treat population, meaning all enrolled patients. NDA, the New Drug Application, filed June 2025. PDUFA action date: June 5, 2026. If approved, vepdegestrant will be the first FDA-approved PROTAC.
In September 2025, Arvinas and Pfizer publicly sought a third-party partner to commercialize vepdegestrant. That move suggests cooling internal commitment despite the positive result in the ESR1-mutant subgroup. The modality has clinical signal, but the commercial case remains unsettled.

The broader TPD space has expanded well beyond ER and AR. CC-94676 (BMS, AR PROTAC), the CELMoDs mezigdomide and iberdomide - cereblon E3-ligase modulators now in Phase 3 in multiple myeloma - and a growing number of Phase 1 molecular-glue degraders for BCL6, BTK, and other targets are all active programs.

For a target-assessment document, this changes the screen. For a scaffolding protein or transcription factor without an inhibitable catalytic site, the relevant question becomes “does it have a surface patch we can bind at all?” If yes, a PROTAC may be tractable.

RNA-targeting drugs (ASOs and siRNAs)

Antisense oligonucleotides (ASOs) are 15–30 nucleotide sequences that bind target RNA via Watson–Crick base pairing - that is, ordinary sequence complementarity. Two mechanisms dominate:

RNase H1 cleavage. The ASO–RNA duplex is a substrate for endogenous RNase H1, an enzyme that cleaves the RNA strand. Most approved ASOs work this way.
Steric blockage / splice switching. The ASO binds pre-mRNA (the RNA transcript before splicing) and prevents ribosomal assembly or alters splicing. Exon-skipping ASOs (Duchenne muscular dystrophy) and exon-inclusion ASOs (spinal muscular atrophy) are the canonical examples.

siRNAs act through the RNA-induced silencing complex (RISC), a cellular complex that uses the guide RNA to find and silence matching transcripts. Approved examples include patisiran (hereditary transthyretin amyloidosis), givosiran (acute hepatic porphyria), lumasiran (primary hyperoxaluria type 1), and inclisiran/Leqvio - Novartis’s siRNA targeting PCSK9, originally approved by the FDA in December 2021 for adults with ASCVD (atherosclerotic cardiovascular disease) or heterozygous familial hypercholesterolemia, expanded in 2023 to broader primary hyperlipidemia, and updated again in July 2025 to permit first-line monotherapy use in hypercholesterolemia.

Concept Translation: ASOs and siRNAs target messenger RNA rather than the protein itself. The matching is sequence-based - Watson–Crick base pairing is exact-match string lookup over a 4-letter alphabet (A, U, G, C). For target assessment this is a different design space entirely: the target is a string, the drug is a string, and the binding rule is a known function of the two strings. Compare this to small-molecule binding, where the binding rule is a learned function of two 3D shapes. Sequence-matching is much easier to design but introduces a new failure mode - off-target hits anywhere in the transcriptome that match closely enough.

For target assessment, ASOs and siRNAs require an accessible sequence rather than a structured protein binding pocket. A target with no deep pocket, no covalent warhead opportunity, and no PROTAC handle can still be tractable if its RNA is accessible. That rule has caveats: small-molecule splice modulators like risdiplam for SMA exploit RNA secondary structure - the local folded shape of the RNA strand - so structure does come back into play at the RNA level for some programs.

Antibody-drug conjugates (ADCs)

An ADC is a monoclonal antibody covalently linked to a cytotoxic payload via a chemical linker. The antibody provides selectivity; the payload provides potency. An ideal ADC stays stable in circulation, internalizes on binding to the target (the antibody–target complex is pulled into the cell), and releases the payload intracellularly in the vicinity of the target.

Concept Translation: An ADC is a delivery system. The antibody is a routing label that says “deliver to cells expressing this surface marker”; the payload is the actual cytotoxic drug, often too toxic to administer freely. The linker controls when the payload separates - ideally only after the package has been internalized. The pattern is roughly the same as targeted delivery in any system: route on a label, release on arrival. The engineering challenges are also analogous (label specificity, in-transit stability, controlled release).

2026 status: 15 ADCs are FDA-approved, with 19 ADCs approved globally across FDA, EMA, NMPA, and PMDA. Pipeline scale is over 400 ADCs in development, 200+ in clinical trials, and at least 24 in Phase 3. Recent FDA approvals include Datroway (datopotamab deruxtecan, January 2025) and Emrelis (telisotuzumab vedotin-tllv, May 2025, targeting c-Met-overexpressing NSCLC, or non-small cell lung cancer). Topoisomerase I inhibitor payloads (DXd, SN-38) now dominate late-stage development due to strong bystander effects, where the released drug can also kill neighboring cells.

For target assessment, ADCs open up targets where selective expression matters more than pocket tractability. A tumor-surface antigen that cannot be usefully modulated by a naked antibody (an antibody without an attached payload) can still be a good ADC target if it internalizes on binding.

Therapeutic peptides and cyclic peptides

Therapeutic peptides occupy the middle ground between small molecules and biologics. They typically fall in the 500–5,000 Dalton range, often offer high target specificity, are less likely to trigger an immune response than antibodies, and have better membrane permeability than most proteins. Linear peptides suffer from poor stability and short half-life. Cyclic peptides address both by constraining the peptide into a ring, which improves rigidity, stability, and permeability. Cyclic peptides sit within the broader “beyond-Rule-of-Five” (bRo5) chemical space - shorthand for compounds that break the usual small-molecule size and polarity heuristics. Macrocycles (large rigid rings of 12 or more atoms), PROTACs, peptides, and metallodrugs all populate this space. By 2024, the field broadly accepted that strict adherence to Lipinski’s Rule of Five was not a prerequisite for intracellular engagement, and that bRo5 chemistry was central to engaging the flat protein–protein interaction surfaces that dominate the “undruggable” set. Cyclic peptides have shown activity in oncology, antiviral, antibacterial, and antimalarial contexts.

CAR-T, gene therapy, cell therapy

CAR-T and TCR-T therapies genetically modify a patient’s T cells to express a chimeric antigen receptor or T-cell receptor, then infuse the modified cells back. Kymriah and Yescarta were the first two FDA approvals in 2017; Tecartus, Breyanzi, and Abecma followed for mantle cell lymphoma, large B-cell lymphoma, and multiple myeloma respectively. By early 2026, the FDA reported having approved close to 50 cell and gene therapy products over the prior decade across the Center for Biologics Evaluation and Research (CBER) Office of Therapeutic Products, the cumulative result of approvals such as Casgevy and Lyfgenia (sickle cell disease), Beqvez (hemophilia B), Elevidys (Duchenne muscular dystrophy), Vyjuvek (dystrophic epidermolysis bullosa), Aucatzyl (B-ALL, or B-cell acute lymphoblastic leukemia), and an ongoing run of CAR-T approvals.

For target assessment, CAR-T targets are a different class of object from small-molecule targets. A CAR-T target is a surface antigen expressed, ideally exclusively, on disease cells - the “druggability” question is about on-tumor versus off-tumor expression (whether the antigen appears on cancer cells, healthy cells, or both), not pocket geometry. That leads to the other half of the 2026 druggability question.

The druggable genome is expanding, and the ignorome remains large

There are more than 10,000 known human diseases. The original 2002 estimate put 3,000–10,000 disease-related genes in the genome, with roughly 10% of those being disease-modifying on knockout (disabling the gene changed a disease-relevant phenotype in model systems). That yields the 600–1,500 small-molecule-druggable target estimate cited at the top of this post.

The Rule of Five (Lipinski et al., 1997) shaped small-molecule design for a generation: no more than 5 hydrogen-bond donors, 10 acceptors, molecular weight under 500, and LogP under 5, with LogP being a measure of lipophilicity. By 2026, approved drugs increasingly exceed Rule-of-Five boundaries. The main alternative is the Rule of Three for fragment-based discovery, a strategy that starts from very small weak-binding molecules (Congreve et al., Drug Discov. Today 8:876–877, 2003: MW < 300, ≤3 H-bond donors, ≤3 H-bond acceptors, cLogP ≤ 3, where cLogP is the computed version of LogP). The empirical boundaries of “drug-like” are shifting as chemistry expands to PROTACs, macrocycles, ADCs, and covalent warheads. These categories routinely break Lipinski’s original rules and still get approved.

Concept Translation: Rule of Five was a hand-crafted feature filter - a set of property thresholds that empirically separated absorbable drugs from non-absorbable ones, fitted on roughly 2,200 oral drugs in the late 1990s. Like any hand-crafted filter, it was tuned to its training distribution. As the chemistry expanded into larger and more polar molecules, the filter started rejecting things that worked. The 2026 stance is to use Lipinski thresholds as a soft prior for the small-molecule region of chemical space and to ignore them entirely outside it.

Meanwhile, over 75% of protein research still focuses on the 10% of proteins known before the human genome was mapped. The NIH’s Illuminating the Druggable Genome (IDG) initiative, launched in 2014, created a four-level Target Development Level classification to make this visible:

Tclin: targets linked to at least one approved drug by mechanism of action.
Tchem: proteins known to bind small molecules with high potency, but without approved-drug links.
Tbio: proteins with a confirmed Mendelian disease phenotype or meeting certain experimental criteria.
Tdark: proteins meeting none of the above.

IDG found that Tdark proteins receive less research funding than other categories, which perpetuates the knowledge gap. The ignorome - the Tdark set - is both the largest opportunity space for novel target discovery and the hardest place to work, because almost every downstream tool (antibodies, assays, knockout models) has to be built from scratch. The computational tractability gains from AF3-class structure prediction matter most here: a target with no crystal structure and minimal biological characterization now at least comes with a predicted structure to work from.

Concept Translation: The TDL classification is a label hierarchy on the human proteome - closer to a knowledge-graph annotation than a clean training-data split. Tclin proteins have the most data on them (drugs, structures, papers, assays) and Tdark have the least. ML methods trained on protein-level features inherit the same skew as the labels: they perform best on Tclin and worst on Tdark, the long tail. The “ignorome” framing emphasizes that the tail is where novel drug targets actually live.

A 2026 assessment of a novel target should include, at minimum, an AF3-class structure prediction with pocket analysis, a TDL classification, a scan across modality options (small molecule, PROTAC, ASO, ADC, peptide), and an explicit note on whether Rule-of-Five chemistry is even the right design envelope for this target. The question [novelty scoring and the ignorome] → Novel vs repurposed targets: quantifying novelty and extending drug-repurposing methods (C5)¹ takes up how novelty itself is now quantified across the TDL spectrum.

What’s next in druggability assessment

By 2026, the central druggability question is “which of five or six modalities fits this target best, and what is the safety profile of hitting it that way?” Three follow-ups stand out.

Safety. A PROTAC that catalytically degrades its target in tumor tissue may also degrade the same target in healthy tissue where it is essential. An ASO that knocks down a transcript in the liver may behave differently in the kidney. Modality-specific on-target, off-tissue toxicity - where the intended target is hit in the wrong tissue - is the next frontier, and it’s directly continuous with the tissue-specificity question the next post in this series picks up: [tissue specificity] → Tissue specificity as a safety filter: GTEx, Human Protein Atlas, and scRNA-seq for target prioritization (C3).²

Two targets, not one. The entire framing of druggability assumes you’re choosing a single target and asking what modality hits it. For some diseases, most notably cancers with defined synthetic-lethal vulnerabilities (where a cell tolerates either perturbation alone but not both together), the right answer is that you need to hit two targets together, and each of those targets individually is a poor drug candidate. That shifts the druggability question to a two-body problem, with its own computational and experimental methods: [synthetic lethality] → Synthetic lethality and combination targets: ML methods for finding drug pairs that work together (C7).³

Benchmarks for modality-choice models. We don’t yet have a standard benchmark for “given this target, which modality will work best?” The PoseBusters benchmark answers a narrower question (given this target and this ligand, is the predicted pose right?). Benchmarks for PROTAC ternary-complex geometry, ASO off-target profiles, and ADC internalization kinetics exist in pieces but do not yet form a coherent evaluation suite. That gap is a natural target for the next round of infrastructure investment. For the pillar-post view of where this fits in the overall workflow: [the pillar post on drug target discovery] → Target discovery: the front-of-funnel decision behind most Phase II failures (P0).

AlphaFold 3 was an inflection point. The remaining work now sits in modality selection, safety, and benchmark design.

How to tell a drug target matters: evidence frameworks for target–disease linkage

2026-05-08T16:10:00+00:00

A drug candidate can hit its intended protein with favorable potency, selectivity, and pharmacokinetics and still fail in Phase II because the target was never causally linked to the disease in humans. Nelson et al. (2015, Nature Genetics), followed by a revised analysis from King, Davis, and Degner (2019, PLOS Genetics), established that drug programs backed by human genetic evidence are roughly twice as likely to succeed in Phase II and Phase III as programs without it. BIO/Informa industry data over 2011–2020 put biomarker-stratified programs - programs that enroll or analyze patient subgroups defined by a measurable biomarker - at roughly 2× the likelihood of approval versus unstratified ones, a finding that has replicated in subsequent analyses. So evidence for why this target, for this disease, in this patient population carries much of the Phase II burden long before the drug reaches humans.

Concept Translation: Think of a drug target as a feature in a model that you’ve decided to intervene on. Picking a feature that strongly correlates with disease tells you nothing about whether modulating it will change the outcome - that’s the difference between a predictive feature and a causal one. Most of this post is about the evidence techniques the field uses to tell these apart for biological “features” (genes and proteins).

Why target–disease evidence matters in 2026

Industry-wide Phase I likelihood of approval has fallen from about 10.4% to 6.7%. The drivers include a shift toward first-in-class and riskier targets, greater use of biomarkers as surrogate efficacy endpoints (early readouts used as stand-ins for clinical benefit), and tougher regulatory scrutiny. As easy targets are mostly drugged, what remains rests on thinner evidence bases. Biomarker-stratified programs run at roughly double the likelihood of approval of unstratified ones, and targets backed by human genetic evidence run at roughly double the success rate of targets without that support.

Two industry-side facts motivate greater evidence rigor. First, the pipeline is crowded around a small number of targets. About 25% of the global R&D pipeline (roughly 13,600 drug-target pairs) concentrates on just 38 unique biological targets, and in oncology specifically the number of assets per target has grown from 1.8 to 9 over the last two decades (Plenge, 2026). When nine companies are pursuing the same protein, evidence becomes the differentiator - which program has the clearest human-genetic case linking target to disease, the most defensible biomarker, and the cleanest safety picture after accounting for off-target effects (unwanted activity at proteins other than the intended target). Second, human genetic evidence now accompanies approximately two-thirds of recently FDA-approved therapeutics (Ochoa et al., 2022, Nat Rev Drug Discov; Trajanoska et al., 2023, Nature). What was a competitive advantage a decade ago is now table stakes for serious target programs.

What target–disease association evidence actually means

Target-disease evidence is a structured argument built from heterogeneous data sources. A credible argument has four parts:

A mechanistic hypothesis. A story about how modulating this protein would alter the disease phenotype. We seek a plausible causal chain, going beyond naive correlation with the disease state.
Human evidence that the hypothesis is correct. Genetic when possible, since human genetic variation imitates the design of a randomized controlled trial without requiring a drug intervention. Multi-omics, patient-tissue, and literature evidence all count, with known caveats.
Experimental evidence that the modulation produces the therapeutic effect. Knockout, knockdown, or pharmacological perturbation - that is, genetic removal, genetic reduction, or chemical modulation of the target - in disease-relevant models, ideally more than one, that reverses the disease phenotype rather than only altering a biomarker.
A defensible position on the competing hypothesis. Why this mechanism rather than the three adjacent ones the literature also supports?

Concept Translation: Knockouts and knockdowns are the biology field’s version of ablation studies. Knockout = the gene is removed entirely (full ablation). Knockdown = expression is reduced but not eliminated (partial ablation, more like dropout). Pharmacological perturbation = a chemical inhibits the protein, which is closer to “intervening at inference time” than “retraining without the feature.” All three answer the same question: what does the system do when this component is missing or weakened?

Teams that skip one of these four risk failure. The most common failure is #4 - the team has strong data on its favored mechanism and has never seriously engaged with the alternatives.

Two frameworks for organizing this argument are the AstraZeneca 5R framework and the GOT-IT recommendations.

The AstraZeneca 5R framework: “right target” as the first R

The 5R framework defines five criteria a drug program has to satisfy: right target, right tissue, right safety profile, right patient, right commercial potential. The “right target” criterion is defined narrowly as a strong link between target and disease, predictive biomarkers that identify likely responders, and demonstrated differentiated efficacy relative to alternatives. Lack of efficacy has been the most important cause of project failure in clinical trials. The 5R framework forces teams to separate evidence of target-disease linkage from evidence of tissue availability, safety, and patient stratification (the choice of which patient subgroup to treat). A single piece of data often speaks to only one of these categories, and conflating them makes the evidence base look stronger than it is.

The framework originated at AstraZeneca in the mid-2010s with Cook et al. (2014) Nature Reviews Drug Discovery, followed by Morgan et al. (2018) in Drug Discovery Today, which documented an approximately five-fold improvement in the company’s small-molecule project survival from candidate-selection to Phase III decision after the framework was institutionalized.

GOT-IT: four assessment blocks

The GOT-IT (Good Target Identification and Validation) recommendations, published by Emmerich et al. (2021) in Nature Reviews Drug Discovery, organize target assessment into four blocks. The first is the focus of this post:

Disease linkage: evaluating the relationship between the target and the disease of interest and determining whether the target is involved in the underlying disease biology.
Target-related safety: covered in a later post on tissue-specificity.
Strategic issues: commercial potential, portfolio fit, competition.
Technical feasibility: covered in a later post on druggability.

5R and GOT-IT overlap, but they aren’t redundant. 5R is a pipeline-level framework, like a checklist for whether a program should advance. GOT-IT is a target-level framework, a checklist for whether the target-assessment work has been done thoroughly. Many organizations use both, sometimes customizing one internally and treating the other as the external benchmark, or using them in tandem with additional frameworks.

Four important distinctions within target–disease linkage

Driver vs passenger mutations

In cancer genomics, driver mutations provide selective advantages for cancer initiation and progression. Passenger mutations accumulate alongside them without contributing to disease fitness. A single gene can host both classes. The APC gene, a highly mutated tumor suppressor in colorectal cancer, carries both driver and passenger mutations across its coding region. Distinguishing them is a computational problem: which variants observed in a tumor caused the malignant phenotype, and which reflect background mutation noise?

Concept Translation: Drivers and passengers are the cancer-genomics version of signal vs. correlated noise. A tumor genome contains hundreds to thousands of mutations, only a handful of which actually move the phenotype. The other mutations co-occur with the drivers because they accumulated in the same lineage. Picking drivers out of the noise is the same kind of feature-selection problem as identifying which input variables actually drive a model’s predictions when the inputs are highly correlated.

A list of “genes mutated in disease X” is raw material. A target-discovery workflow has to rank those genes by probability of being a driver, using statistical tests on mutation frequency against background, functional impact scoring, recurrence across patients, pathway enrichment, and so on. Tools like MutSigCV, OncodriveCLUST, and later deep-learning classifiers exist for this. The output of a cancer-genomics pipeline is a ranked driver list. Moving from that list to a target list requires additional evidence.

A related wrinkle is that some passengers are clinically informative. Weak drivers that fall below detection thresholds, and accumulated “passenger” burden that shifts cell phenotype cumulatively, can matter clinically. Some passengers also provide evolutionary evidence of tumor lineage and can be used for timing inferences.

Causative, supportive, and symptomatic treatment

A second distinction concerns the kind of therapeutic effect a target is meant to produce:

Causative treatment: therapy directed against the cause of disease. Antiviral agents against SARS-CoV-2. Ritonavir inhibiting HIV protease. A PARP inhibitor in a BRCA-mutated tumor. Targets for causative treatment live upstream in the disease mechanism.
Symptomatic treatment: therapy that eases symptoms without addressing the underlying cause. Pain relievers. Cough suppressants. Targets for symptomatic treatment live downstream of the disease cause, typically in host response pathways.
Supportive treatment: care that addresses broader patient needs (physical, existential, palliative) rather than targeting disease biology directly.

For target discovery, this distinction clarifies what “linkage to the disease” has to establish. A target for causative treatment has to be causally upstream of the disease process. A target for symptomatic treatment has to be causally upstream of a symptom, which is a weaker claim. A program’s target-evidence burden depends on which kind of drug it’s trying to develop.

Alzheimer’s provides a standard example of a disease where causative target discovery has resisted easy success. Acetylcholinesterase inhibitors (donepezil, rivastigmine, galantamine) and memantine provide symptomatic therapies that target downstream neurotransmitter biology rather than disease etiology. The amyloid-targeting antibodies aducanumab (accelerated approval 2021, voluntarily withdrawn in 2024), lecanemab (Leqembi, FDA traditional approval July 2023, EMA approval late 2024), and donanemab (Kisunla, FDA approval July 2024) aim at a putative causal mechanism. Phase III data for both lecanemab and donanemab show statistically significant slowing of cognitive decline relative to placebo over 18 months. Debate remains over whether those effect sizes are clinically meaningful. The safety burden of amyloid-related imaging abnormalities (ARIA) has also been a major clinical concern, and the EMA has had a mixed reception of the two drugs. The target-disease linkage evidence for amyloid is therefore stronger than it was a decade ago - there is now Phase III evidence that targeting amyloid produces a measurable clinical effect - but this does not clarify whether amyloid is the causative driver of cognitive decline, a downstream marker, or one of several parallel drivers.¹ The earlier generation of amyloid-targeting programs provides a cautionary tale, though continuing studies test whether the hypothesis was right but underpowered, or only partly right with diminishing returns.

Oncogene addiction

Oncogene addiction describes cancer cells’ dependence on individual oncogenes - genes whose altered activity drives tumor growth - to sustain the malignant phenotype. A tumor that depends on an activated oncogene is selectively vulnerable to inhibition of that oncogene. Tumor cells need the oncogenic signal to survive; healthy cells do not. This has been a successful organizing concept in targeted oncology. EGFR mutations in a subset of non-small cell lung cancer, BCR-ABL fusion in chronic myeloid leukemia, and BRAF V600E mutations in melanoma all produce tumors with oncogene-addicted phenotypes, and targeted inhibitors in each context have produced durable clinical responses.

For target-discovery methodology, oncogene addiction is useful because it predicts a specific clinical-trial readout: patients whose tumors harbor the addiction should respond, and others should not. That prediction is testable in Phase II with biomarker-stratified cohorts, and it helps explain why biomarker-stratified oncology programs show higher LOA than unstratified ones (we’ll quantify this in a later post on LOAs by therapeutic area). An evidence package for an oncogene-addiction target should therefore include the candidate biomarker - the measurable feature that marks the likely responder subgroup.

Target-disease correlation vs causal target-disease linkage

The most consequential distinction is also the most abstract. A gene whose expression differs between disease tissue and healthy tissue is associated with the disease. A gene whose loss-of-function variants - variants that reduce or abolish a gene’s activity - segregate with disease risk in human populations at a genetic locus that survives multiple-testing correction, with a plausible protein-level mechanism and Mendelian-randomization-style evidence that inherited variation in the gene affects disease rather than the reverse, is a causal candidate. Curating such evidence is hard. Transcriptomic association is relatively cheap; causal genetic evidence is expensive and sparse.

Concept Translation: Mendelian randomization is the genetics field’s instrumental-variable method. Because alleles are randomly assorted at conception, an inherited variant acts like a randomized treatment assignment - it isn’t influenced by lifestyle, environment, or downstream disease state. If carriers of a loss-of-function variant in gene X are healthier on average, that’s strong evidence that reducing X’s activity reduces disease risk, not just that X is correlated with it. This is closer to an A/B test than to an observational dataset, which is why genetic evidence carries more causal weight than transcriptomic evidence.

Referring back to the 2× headline from earlier, the Nelson et al. (2015) result and the King et al. (2019) revised analysis both confirmed the population-average ~2× clinical-success multiplier for genetically supported drug programs. The same general 2× factor also enriches for labeled side effects, giving it predictive value for toxicology programs (Minikel & Nelson, 2024; Carss et al., 2023). The multiplier is not constant across evidence types. King et al. found that when the causal gene is unambiguous - Mendelian traits, single-gene disorders with clear inheritance patterns, or GWAS associations driven by coding variants where the variant-to-gene mapping is direct - the approval probability multiplier rises above 2× and into the 3× range. The 2026 update from Minikel and Nelson refined this, showing that the multiplier scales with confidence in the causal gene assignment and is largely independent of genetic effect size, minor allele frequency, or year of discovery. In practice, a low-effect-size GWAS hit with confident causal-gene mapping is worth more than a high-effect-size hit at a locus where the variant-to-gene call is ambiguous. Trajanoska et al. (2023, Nature) place this in longer historical context: they identified 40 germline genetic observations that translated directly into approved therapies for 36 rare and 4 common conditions, with a median 25-year interval between target discovery and drug approval. The genetic-anchor strategy works, but it compounds slowly.

When assembling a target’s evidence package, the quality of the genetic anchor matters more than the quantity of associated data. A coding variant with established protein consequences is worth more than a half-dozen non-coding GWAS signals at adjacent loci with ambiguous fine-mapping.

The omics stack as target–disease evidence

Target-disease linkage evidence comes from an omics stack, and each layer has a characteristic failure mode. Knowing when each layer can mislead you is the core methodological skill.

Genomics and GWAS

Genomics is usually the strongest layer for target-disease linkage. Human genetic variation provides a natural experiment: people who carry a loss-of-function variant in gene X are, on average, similar to people who would be pharmacologically treated with an inhibitor of gene X. If carriers of the variant have lower disease risk, that is a strong argument that inhibiting gene X would lower disease risk too. This is the Mendelian-randomization logic that gives genetic evidence special weight.

Genome-wide association studies (GWAS) are the dominant method for finding these signals. GWAS identifies statistical associations between genetic variants and traits or diseases across large population cohorts. The method has identified thousands of risk loci across hundreds of diseases, now aggregated in resources like the NHGRI-EBI GWAS Catalog.

Concept Translation: GWAS is in some ways a giant feature-importance scan over the genome - for each of millions of variants, test whether allele frequency differs between cases and controls. The genome-wide significance threshold of p < 5×10⁻⁸ is a Bonferroni-style correction for testing roughly a million effective independent variants. The catch is that statistical association picks the neighborhood the causal variant lives in, not the variant itself. Most GWAS hits sit in non-coding DNA - regulatory regions outside protein-coding sequence - which is why follow-up fine-mapping is needed to narrow the signal. Fine-mapping is a feature-attribution problem for a region of the genome: which variant in the correlated cluster is actually doing the work?

The practical limitations matter. GWAS has to apply heavy multiple-testing correction (the classic p < 5×10⁻⁸ genome-wide significance threshold), which means it misses real signals at smaller effect sizes, especially in underpowered cohorts. Most GWAS hits are in non-coding regions, so the protein-level mechanism is often implicit. Identifying the causal gene from a genome-wide-significant locus often requires additional work - fine-mapping (narrowing the locus to the most plausible causal variants), eQTL analysis (testing whether a variant shifts gene expression), and experimental follow-up.

The populations represented in GWAS are also overwhelmingly European-ancestry, which creates a real equity problem when target discovery built on GWAS evidence is applied clinically to more diverse populations. The June 2021 GWAS Catalog inventory put European-ancestry participants at approximately 86.3% of all GWAS subjects, with East Asian at ~5.9% and African at ~1.1%. By 2023 the GWAS Catalog European share had only modestly declined, to around 86.5%, while the GWAS Diversity Monitor, which tracks the running participant counts more broadly, recorded European representation of ~94.5% of total participants by September 2024 (Mills & Rahal, Nat Genet, 2019; Corpas et al., Cell Genomics, 2025). The two metrics differ because the GWAS Diversity Monitor weights large biobank cohorts heavily, but the conclusion is the same: target discovery built on this evidence base will systematically under-serve non-European populations unless deliberately corrected with fine-mapping in diverse cohorts and downstream validation across ancestries. This is a target-evidence problem as much as a deployment problem. A target whose causal gene is well mapped only in European cohorts may not support a defensible target-disease linkage claim for patients of other ancestries until cross-ancestry replication is done.

Concept Translation: This is the same dataset-bias problem that ML practitioners deal with constantly. A model trained on a non-representative dataset can be highly accurate on the in-distribution group while failing on out-of-distribution groups. The fix is the same in principle: collect more representative data, evaluate per-subgroup, and don’t ship a model whose validation doesn’t cover the deployment population.

Transcriptomics

Transcriptomics measures RNA levels, sitting between genotype and protein as a sensitive readout of cellular state. Bulk and single-cell RNA sequencing, along with older microarray data, make up the largest public omics archives (GEO, ArrayExpress) and provide differential-expression evidence: which genes change in disease tissue versus healthy tissue?

The main failure mode of transcriptomics for target discovery is simple. Many transcriptomic changes are bystanders. A gene that changes expression in disease is not necessarily causing the disease; it may be responding to the disease, or changing in parallel with it because of upstream regulation. A transcriptomic hit generates a hypothesis; it does not establish causal linkage. Evidence becomes much stronger when transcriptomics and genetics agree, and the gene is both differentially expressed and under genetic selection in disease.

Concept Translation: Bystander vs. causal in transcriptomics is the same trap as confounded correlation in observational ML. A feature that strongly correlates with the label can still be downstream of the label rather than upstream - predictive of it without driving it. RNA-level changes pick up both kinds and can’t separate them on their own. This is why GWAS-supported transcriptomic hits are weighted more heavily than transcriptomic hits in isolation.

Proteomics and metabolomics

Proteomics measures protein abundance directly. Since most drugs target proteins, proteomic evidence sits closer to the mechanism-of-action question than transcriptomic evidence. The limitations are technical: mass-spec proteomics is expensive, less sensitive than RNA-seq, and struggles with membrane proteins, which include many of the cell-surface receptor and transporter classes that matter most in drug discovery.

Metabolomics measures small-molecule metabolites. It often points to disease-associated metabolic changes that can be traced back to protein targets - which enzymes are producing or consuming these altered metabolites. Metabolomics is particularly useful for metabolic diseases and for target discovery in contexts where disease biology is about pathway flux more than signaling.

Epigenomics

Epigenomics (DNA methylation, histone modifications, chromatin accessibility) provides a reversible regulatory layer above the genome. Epigenetic enzymes are themselves drug targets (HDAC inhibitors, DNMT inhibitors, both with approved drugs), and epigenetic data can identify non-genetic mechanisms driving disease. The characteristic limitation is interpretability: it is often unclear what functional impact a given epigenetic change has on gene expression in the specific disease context.

Concept Translation: A useful mental model: if the genome is the source code, epigenomics is the runtime configuration. Methylation marks and chromatin states change which parts of the code get executed in which cell types, without altering the code itself. Same source, different behavior - and the configuration is reversible, which is why epigenetic enzymes are drug targets.

Multi-omics integration and its limits

Combining these layers sounds straightforward. In practice, it’s a difficult technical problem. Batch effects across studies, confounding by disease severity or patient demographics, differential dropout in single-cell data, and different noise models across modalities all complicate naive integration. The best-case design, when available, analyzes multi-omics data from the same set of patients rather than integrating across cohorts, so sample-level confounders are controlled. That is rarely possible at GWAS-cohort scales, where multi-omics profiling of every participant is prohibitively expensive. Target-discovery workflows that build evidence from public multi-omics resources need to state these limitations explicitly. Workflows that report “our multi-omics integration identified target X” without addressing them are not making a credible argument.

A useful target-disease evidence package usually includes (a) genetic evidence where it exists, as the anchor, (b) transcriptomic and proteomic evidence in disease tissue that is consistent with the genetic signal, (c) tissue-specificity data (see [tissue-specificity evidence] → Tissue specificity: the safety half of target selection²), and (d) network or pathway context that situates the target in plausible biology. No single layer is enough - the layers have to agree.

Networks and knowledge graphs: evidence beyond single-gene associations

A gene-at-a-time view of the omics stack misses mechanistic context. A gene with moderate genetic support and moderate transcriptomic support may look like a weak target in isolation, yet sit at a critical node in a disease-implicated pathway. In that setting, network position is itself evidence. Target-discovery methodology increasingly builds evidence at the network level as well as the gene level.

The method family is biological graphs and knowledge graphs. Protein-protein interaction networks, gene regulatory networks, gene co-expression networks, metabolic networks, and signaling networks each represent a different axis of biological relationship, and each carries evidence relevant to target linkage. Knowledge graphs integrate these with literature-derived associations, drug-target relationships, disease-gene associations, and related data, producing heterogeneous graphs with multiple types of nodes and relationships that can be queried for evidence patterns around a candidate target.

Concept Translation: A biomedical knowledge graph is a heterogeneous graph in the standard ML sense - multiple node types (genes, proteins, diseases, drugs, pathways) and multiple edge types (binds, expressed-in, treats, associated-with). The same graph-ML toolkit applies: node classification (is this gene a likely target?), link prediction (does this gene-disease edge exist but is undiscovered?), graph neural network embeddings, and metapath-based random walks. The biology adds structure but doesn’t change the modeling problem.

Two network features are especially useful for target-linkage arguments. Hubs are highly connected nodes. They are enriched for essential genes and are often promising targets in diseases driven by loss of function, but they are also more likely to be toxic to modulate pharmacologically. Bridging nodes are high-betweenness nodes - nodes that sit on many shortest paths between modules. They are often more attractive targets because their inhibition tends to be non-lethal while still disrupting specific disease-relevant pathways.

Concept Translation: Hubs are nodes with high degree centrality (think high-PageRank pages); knock them out and a lot of the graph collapses, which is why they’re both attractive and dangerous. Bridging nodes are nodes with high betweenness centrality - they sit on many shortest paths, like a bottleneck router connecting subnets. Inhibiting a bridge can disconnect a specific module without crashing the whole network, which is roughly what a clean drug effect looks like.

A knowledge graph over disease-gene-drug-pathway data also lets teams aggregate and score evidence across heterogeneous sources. The Open Targets Platform is the canonical public example: an open-source platform that integrates evidence from genetics (GWAS, ClinVar, ClinGen), somatic mutation data (Cancer Gene Census, COSMIC), known drug-target relationships, expression data, text mining, and animal models, producing a quantitative association score between each gene and each disease. As of the most recent platform release, the Open Targets evidence layer comprises over 27.8 million timestamped evidence assertions across the genes-and-diseases space.

The platform’s scoring methodology matters in practice. The overall target–disease association score is computed as a harmonic sum of source-weighted evidence scores - a scheme in which each additional piece of similar evidence adds less than the previous one, rather than a raw sum or arithmetic mean. The harmonic-sum construction keeps heavily studied targets from accumulating inflated scores through redundant literature citations. Adding the 50th literature reference does not move the score the way the 1st reference does. Within data types that contain multiple sources, the scaling factor uses an inverse-squares construction so that source contributions saturate gracefully rather than letting any single high-throughput source dominate.

Concept Translation: The harmonic-sum aggregator is a regularizer against redundant evidence. Each additional similar source contributes less than the previous one, much like a saturating activation function. Without it, well-studied targets - those with thousands of papers - would dominate the rankings purely by literature volume, and the score would be a popularity ranking instead of an evidence ranking. This is the same problem that plagues citation-count metrics in academia and is solved with the same general technique: diminishing returns on duplicate signal.

Source weights vary by evidence type. ClinGen clinical-validity curations at the “definitive” tier receive an absolute weight of 1.0, the strongest possible single-source evidence. Somatic mutation evidence from the Cancer Gene Census uses a tiered modifying scheme that adjusts the base score by mutational frequency and disease-specificity context: a mutation observed in only 1 sample in the dataset receives a −0.25 modifier (penalizing weak recurrence), while a mutation that occurs more frequently in a particular disease relative to others receives a +0.25 modifier (rewarding disease-specificity). The platform’s “Associations on the Fly” (AOTF) feature, introduced in recent releases, lets users dynamically reweight the evidence contributions to formulate custom therapeutic hypotheses without forking the underlying scoring schema.

The consortium itself is worth noting for field-guide purposes. Open Targets is a pre-competitive public-private partnership - direct competitors contribute to a shared evidence resource before they diverge into proprietary drug programs. As of 2026 the active partner roster includes EMBL-EBI, the Wellcome Sanger Institute, Genentech (Roche Group), GSK, MSD (Merck & Co.), Pfizer, Sanofi, and Bristol Myers Squibb (which joined in late 2022). Biogen and Takeda were earlier members who exited in 2020. The platform aggregates signals that no single company would assemble alone.

Formalizing evidence aggregation as a graph problem also opens the door to graph machine learning. Target-disease association can be formulated as a link-prediction problem on a heterogeneous biomedical knowledge graph: given a graph where some gene-disease edges are known, predict which missing edges are real. Graph neural networks, network-embedding methods, and random-walk-based algorithms have all been applied to this problem. The [knowledge graphs and the rentosertib case study] → Knowledge graphs, network medicine, and the first end-to-end AI-discovered drug picks up this thread in detail.³

A worked example: senolytic target identification in idiopathic pulmonary fibrosis

To make the frameworks concrete, consider a target-identification workflow for a specific disease-hypothesis pair: cellular senescence as a target class in idiopathic pulmonary fibrosis (IPF).

The mechanistic hypothesis. Cellular senescence is a permanent state of cell-cycle arrest with a secretory phenotype (SASP) that promotes inflammation and disrupts tissue homeostasis. Senescent cell accumulation is associated with many age-related diseases, and in IPF specifically, senescent cell programs have been documented in disease tissue. The working hypothesis: compounds that selectively kill senescent cells (senolytics) should slow IPF progression by removing the SASP-producing source of inflammation and fibrotic signaling. This is a causative hypothesis in the sense of the earlier framework - senescent cells are positioned as upstream drivers rather than symptoms.

Concept Translation: Senescent cells are sometimes called “zombie cells” - they’ve stopped dividing but won’t die, and they leak inflammatory signals (the SASP) into surrounding tissue. The senolytic strategy is to selectively kill them. From a target-discovery angle, the question is which proteins keep senescent cells alive that are not equally important for keeping healthy cells alive. Those proteins are the senolytic targets.

Evidence layer 1: tissue pathology. Senescent cells are identifiable in IPF lung tissue by senescence-associated markers; their burden correlates with disease severity. This is differential-expression-style evidence - the disease tissue differs from healthy tissue in a specific, characterizable way. It is necessary but insufficient on its own. Senescence could still be a bystander response to lung injury rather than a driver of ongoing fibrosis.

Evidence layer 2: mechanism. Senescent cells are apoptosis-resistant by design; that is part of what makes them persistent. The resistance depends on anti-apoptotic pathways collectively called SCAPs (senescent cell anti-apoptotic pathways). That yields a specific target-class hypothesis: proteins involved in SCAPs are candidate senolytic targets, because inhibiting them should remove the apoptosis resistance and allow senescent cells to die. This narrows the search from “any gene differentially expressed in IPF” to “genes in a specific functional class whose inhibition would have a mechanistically predicted effect.”

Evidence layer 3: computational target prioritization. A knowledge-graph-based target discovery platform can operationalize the combined evidence. The workflow:

Start from a disease-specific dataset (IPF patient tissue vs healthy controls) and score candidate targets by disease-association strength.
Apply a gene-set filter restricting candidates to pre-specified functional classes (apoptosis-related pathways, SCAPs).
Apply a small-molecule druggability filter. This asks whether a target can plausibly be modulated by a conventional small-molecule drug. For senolytic drug development, the delivered modality is small molecule, so candidates that are only tractable by antibody or biologic are de-prioritized.
Apply an expression filter that focuses on protein classes suitable as drug targets (e.g., kinases, enzymes).
Apply an optional novelty filter that preferentially surfaces targets with fewer published references, for teams pursuing first-in-class programs against new mechanisms rather than fast-follower strategies against established mechanisms.

The output, checked against known biology. A representative output of this workflow is PTK2 (also known as FAK, focal adhesion kinase) as a high-ranked senolytic candidate. PTK2 is a tyrosine kinase, druggable by small molecules, and expressed in relevant contexts. The biological cross-check comes from independent experimental literature. Dasatinib, a multi-kinase inhibitor originally approved for chronic myeloid leukemia, has been identified as a senolytic in human dermal fibroblasts, and its senolytic activity has been attributed in part to inhibition of PTK2/FAK among other kinases. This independent wet-lab evidence predates the computational prioritization and converges on the same target class.

What this example illustrates. The evidence package for PTK2/FAK as a senolytic target is an aggregation of four sources: disease-tissue pathology (senescent cell burden in IPF), a mechanistic hypothesis (SCAP dependence), computational target prioritization weighted by disease association, pathway membership, and druggability, and convergent experimental evidence from a repurposed clinical-stage compound. No individual layer would be sufficient; together they make a credible argument.

The claim is narrower than it might sound. PTK2 remains a candidate rather than a validated senolytic target in IPF patients; validation would require dosing studies, biomarker responses, and clinical evidence. On the evidence presented here, PTK2 clears the 5R “right target” and GOT-IT “disease linkage” bars for advancing into target validation. Moving from candidate to validated target requires [druggability assessment] → Druggability, ligandability, and modality choice in the AlphaFold 3 era⁴, and the downstream posts in this series.

Open questions in target–disease evidence work

The benchmarking problem. Target-discovery ML is unusually hard to benchmark. The ground-truth signal - did this target hold up in Phase III? - takes a decade and nine figures per data point. Surrogate benchmarks exist (does the method rediscover known targets for known diseases when given only pre-discovery literature? does it recover genetic signals already in Open Targets? does it predict held-out drug-target relationships?) but all share the same weakness: the evaluation corpus overlaps with the corpus the ML methods were trained on. Truly unbiased benchmarking for target-discovery ML, where the test set is future-clinical rather than past-published, remains a live methodological problem.

The ignorome problem. Research attention across the ~20,000 human protein-coding genes is highly skewed: a small fraction of genes accounts for most of the literature, the annotations, the available assays, and the training data for ML methods. A target-discovery method that only surfaces candidates adjacent to already-well-studied proteins is solving a weaker version of the problem than the one the industry actually needs. The [novelty vs repurposing] → Novelty vs repurposing: when to invent a target and when to reuse one post picks up this thread.⁵

The causality problem. Most of the evidence stack is associational. Genetic evidence - especially Mendelian-randomization-style evidence - carries more causal weight because it approaches causality in a way other omics layers do not. Methods that can extract more causal structure from non-genetic data (perturbation screens, single-cell perturbation readouts like Perturb-seq, large-scale CRISPR screens) are one of the most active areas of methodological development, and they feed directly into target-disease evidence work. The [synthetic lethality] → Synthetic lethality and combination targets: ML methods for finding drug pairs that work together post gets into CRISPR-screen analysis in depth.⁶

Evidence frameworks for target-disease linkage separate disciplined target discovery from assertion. Teams that internalize them are less likely to skip steps, and skipped steps are rarely recoverable later. The framework languages (5R, GOT-IT) and the evidence-stack discipline - genomics as anchor, corroborated by transcriptomics and network context, challenged by alternative hypotheses - are what distinguish a defensible target list from one that merely sounds plausible.

Drug target discovery: the front-of-funnel decision behind most Phase II failures

2026-05-08T16:00:00+00:00

A drug candidate can enter Phase II clinical trials with a nine-figure budget behind it, years of preclinical work completed, and an IND (Investigational New Drug application) cleared by the FDA, yet still carry the wrong answer to a question asked five to ten years earlier: is this the right biological target? For readers new to the pipeline, Phase I is mainly about safety and dosing. Phase II is where efficacy first gets a serious test in patients. Phase III is the larger confirmatory stage. Roughly 72% of drugs entering Phase II do not transition to Phase III, and approximately 90% of investigational drugs fail somewhere in clinical development overall (source: GlobalData clinical analytics, 2024). Lack of efficacy against the intended disease accounts for roughly half of all clinical trial failures; another ~30% are halted for unmanageable safety and toxicity findings. In many cases, the problem starts before dosing or formulation matter. The target was wrong, or the evidence for its role in the disease was too weak. Target discovery is one of the most consequential stages of the drug development pipeline, and this series intends to teach why.

Why drug target discovery matters in 2026

When it comes to understanding chemistry and biology, the models are getting better. Generative models can design and optimize novel small molecules with greater success. AlphaFold 3 and the open-weights field around it, meaning models released with their parameters, have made routine structure prediction a solved problem for many classes of proteins. Docking, virtual screening, and ADMET prediction (absorption, distribution, metabolism, excretion, and toxicity) continue to mature as well.

Yet clinical success rates have worsened. Industry-wide likelihood of approval (LOA) for a Phase I compound fell from about 10.4% for the 2014 single-year cohort to around 6.7% (average LOA for 10-year period from 2014-2023). Phase II remains the dominant attrition step, with only about 28% of drugs that enter Phase II making it to Phase III (phase-by-phase transitions: Phase I = 47%, Phase II = 28%, Phase III = 55%, Filing-to-approval = 92%).

To assist instruction, front-of-funnel ML now has a concrete clinical example. In June 2025, Nature Medicine published Phase IIa results for rentosertib, a TNIK inhibitor for idiopathic pulmonary fibrosis (IPF), whose target (identified from multi-omics analysis of IPF patient tissue, meaning joint analysis of several molecular data types) and molecule (generative chemistry) were both AI-derived end-to-end. The study was small and short: 71 patients over 12 weeks (it was hypothesis-generating rather than a study designed to support approval) with reported improvements in standard lung-function measures relative to a placebo. Still, it marked the first time a fully AI-discovered drug, target included, produced a direction-of-effect efficacy signal in humans. Industry trackers now count more than 173 AI-discovered drug programs in clinical testing as of early 2026.

This post lays out the series: what target discovery is, why it matters, what makes it hard, and where ML helps. Each subsequent post will take one axis of target discovery and examine it in detail:

How to tell a drug target matters using evidence frameworks
Druggability, ligandability, and modality choice in the AlphaFold 3 era
Tissue specificity: the safety half of target selection
The most crowded and abandoned therapeutic areas
Novelty vs repurposing: when to invent a target and when to reuse one
Synthetic lethality and combination targets: ML methods for finding drug pairs that work together
Knowledge graphs and case studies in AI-driven target discovery
Virtual cells for target discovery, perturbation models, and benchmarks

What a drug target is

A drug target is a biomolecule, usually a protein and sometimes a nucleic acid, whose activity we want to modulate to produce a therapeutic effect. “Modulate” covers a range: inhibit an overactive enzyme, block a receptor from binding its ligand, degrade a disease-driving protein, replace a missing transcript. Binding the target has to produce the therapeutic effect. A molecule that binds to the target without changing disease outcomes is hitting a decoy.

Nearly half of oral drugs on the market target enzymes, with kinases as the single most productive sub-family. About a third target cell-surface receptors, especially G-protein-coupled receptors. Ion channels, transporters, and nuclear hormone receptors round out the major protein classes. Nucleic acids form a smaller but fast-growing non-protein class. If you’ve read chapter 1 of Machine Learning for Drug Discovery, we derive a ~10⁵ protein “biological search space” covering on the order of 100,000 potential human protein targets once splice variants and post-translational modifications, that is, alternative versions of a protein and chemical changes added after it is made, are counted. A drug-like molecule has to find, bind, and modulate the right one.

Target discovery, the front-of-funnel activity this series focuses on, is the work of deciding which of those proteins to go after for a given disease. It splits into two phases that casual usage often folds together:

Target identification: generating candidate target-disease links. “Here are ten proteins that seem to be involved in pancreatic fibrosis; let’s prioritize the top three.”
Target validation: building the evidence that one of those candidates is actually causal. Does knocking it out in a disease model reverse the phenotype? Does a human genetic signal support it? Is there pharmacological precedent?

Both phases happen before a single compound is screened. Get either wrong and the downstream pipeline is solving the wrong problem.

Funnel Economics: The Cost of Being Wrong at Target Selection

The drug-development funnel runs roughly like this:

Disease hypothesis and early target identification: years to decades, often academic.
Target validation: up to ~2.5 years, averaging around $353M when done end-to-end including follow-on studies.
Lead discovery and optimization: up to ~2 years, over half a billion dollars.
Preclinical development: about a year, ~$340M.
Clinical development: Phase I about 1.5 years, Phases II/III about 2.5 years combined, roughly half a billion dollars in total.
Regulatory approval: ~1.5 years, ~$3M, the cheapest transition once you’re there.

Aggregate “cost to develop a new drug” figures vary widely because analysts treat time costs, failure allocation, and capitalization differently. The Tufts Center for the Study of Drug Development’s widely cited $2.8 billion mean per-approved-drug cost includes roughly $1.16 billion of foregone-investor returns over the development window plus $312 million in post-approval R&D. More recent cost estimates from RAND (2025), JAMA (Wouters et al., 2020), and Deloitte (2024) report different distributions, with medians closer to $0.7 billion to $1.0 billion and means weighted by high-cost outliers in the $0.95 billion to $2.23 billion range. Though reported headline figures depend heavily on which costs are included, the point is consistent. This is an expensive process, and being wrong costs a lot of cash!

That makes the target-selection decision unusually consequential. It spends comparatively less money, yet determines whether everything underneath it is wasted. Successful or unsuccessful completion of Phase 2 for an individual drug costs pharmaceutical companies around 20% of the sum spent on the drug discovery pipeline recalculated for each individual drug. Phase III is a larger absolute commitment, but Phase II is where we first get an answer to “is this target right?” and where many disappointments occur.

In a 2024 report, GlobalData attributes roughly half of clinical trial failures across phases to lack of efficacy and another ~30% to safety or toxicity. If efficacy collapses because the chosen target never had enough evidence behind it, then better target selection compounds downstream. Prevent one Phase II failure and the savings can exceed what you spent on target-assessment work by an order of magnitude. Catch the same mistake at target selection instead of in Phase II and the savings are larger still.

The 2014–2023 industry data also points to a second problem. Phase I likelihood of approval fell from 10.4% for the 2014 cohort to 6.7% for the most recent ten-year cohort, partly driven by an industry shift toward first-in-class, riskier targets where the evidence base is thinner. Many easier targets have already been drugged. What’s left is harder, and harder problems put more weight on front-of-funnel methods. Biomarker-stratified programs, which enroll patient subgroups defined by a measurable biomarker, run at roughly double the LOA of unstratified ones. This result follows from stronger target-selection evidence, as the biomarker identifies the patient subgroup in whom the target matters. For this reason, target and biomarker development often move together.

Why drug target discovery is hard

There is no single solution or approach as every target hunting strategy is context dependent. Useful target-assessment work has to satisfy at least five criteria simultaneously.

Target–disease linkage

Does the target play a causal role in the disease rather than merely correlate with it? Genetic evidence carries the most weight in most organizations. That includes GWAS hits, population-scale studies that link genomic loci to disease, Mendelian randomization, which uses inherited variants as a quasi-natural experiment, and rare-variant burden tests, which ask whether damaging variants in a gene accumulate more often in cases than controls. Multi-omics integration, which combines several molecular data layers such as RNA, protein, and chromatin measurements, acts as supporting evidence.

Can humans tolerate modulation of this protein for years? Evolutionary conservation predicts essentiality and toxicity, wwhereas tissue specificity predicts off-target effects. Knockout phenotypes also provide a useful preview into safety implications if we were to neutralize the target.

Commercial strategy

Is there a path to reimbursement? Is this a first-in-class program, meaning a new mechanism, a best-in-class program, meaning a superior drug against an established mechanism, or a repurposing play, meaning an existing drug or target used in a new disease? These questions inform which target to prioritize even when several are scientifically equivalent.

Technical feasibility

Is the target druggable, meaning some therapeutic class can plausibly bind or modulate it? Can the organization develop the right modality, such as a small molecule, antibody, or oligonucleotide? A protein-protein interaction with no obvious pocket may be validated yet undruggable.

Data quality

Is the evidence base solid, or built on a few underpowered studies that never replicated? Manual curation, field-standard evaluation, and in some cases meta-analyses feed into this.

Taken together, a “good target” satisfies a multi-objective optimization problem across all five properties, involving a variety of data modalities. In practice, these criteria produce weighted votes rather than verdicts. Few targets get a clean yes across all five. More often, we see mixed signals and have to decide whether one target candidate’s profile beats the other eight candidate targets on our list. Target-discovery ML tries to improve this decision-making process.

Where ML helps in drug target discovery

The chemistry-side ML playbook of generating candidates, predicting properties, and optimizing molecules does not transfer cleanly to target discovery. There is not an explicity accuracy metric to optimize against, training data is fragmented across a dozen modalities, and ground truth, i.e., “was this the right target?”, takes a decade and often a Phase III trial to establish. Despite these limitations, ML does three things well here.

Scale

Datasets for target discovery are enormous and heterogeneous:

GenBank contains over 300 billion base pairs of sequencing data.
UniProt has more than 180 million protein sequences.
The ENCODE multi-omics database exceeds one petabyte.
PubMed has more than 35 million citations
The USPTO database has hundreds of thousands of patents
ClinicalTrials.gov has more than 438,000 clinical studies

No human team can read or integrate all of this. ML can convert it into searchable, queryable representations. Knowledge graphs built from biomedical text, which turn papers into linked entities and relationships, compressed numerical representations of multi-omics datasets, and network-proximity algorithms that rank genes by their topological distance from disease-associated nodes are all methods that scale to the size of the actual evidence base.

Cryptic patterns

A well-known example comes from SARS-CoV-2 repurposing, where we consider existing drugs that might work in a new or unaddressed disease. One study used network-diffusion and network-proximity algorithms on a combined dataset of the human interactome, the network of protein-protein interactions in cells, viral targets, and drug interaction data to rank 6,340 drugs for expected efficacy. The top-ranked drugs, tested experimentally, showed a 62% success rate in reducing viral infection, and 76 of the 77 drugs that worked did not bind to proteins directly targeted by the virus, suggesting network-based mechanisms rather than direct target engagement. That kind of signal, which requires multiple hops of complex reasoning, is hard to recover by manual review. In this case, graph algorithms operating across the interactome were able to surface it.

Automation

Literature review for a new target can take days or months. Named-entity recognition, which identifies mentions of genes, diseases, drugs, authors, and institutions in text, can be automated with usable quality (state-of-the-art NER reaches roughly 90% F-score). Relation extraction, which asks whether a paper asserts a specific link between those entities, is harder, around 50%. A knowledge graph that updates as new papers are published treats the evidence base as a living object, instead of freezing it at the last manual review.

These are front-of-funnel tasks that ML handles well. Final prioritization is a different matter*. In practice, target selection remains heavily human-curated. ML produces ranked candidate lists, aggregated evidence dashboards, and network-scored hypotheses, but the decision to allocate a program’s budget to target #3 rather than #7 still depends on organizational expertise, portfolio balance, competitive intelligence, and the intellectual-property landscape. Those factors are usually absent from training data. For now, target-discovery ML compresses the evidence into a form a human committee can reason about.

*If you’re coming from a background in recommender systems, this is similar to a cascading systems approach where we care about maximizing recall at the start (i.e., we don’t want to preemptively filter out a candidate and cut a potential multibillion dollar revenue stream) and, by the end, are most trying to maximize precision (i.e., avoid a costly false positive that damages our corporate brand, wallet, and sanity).

Worked Example: Rentosertib

Insilico Medicine’s rentosertib (also known as ISM001-055 / INS018_055) for idiopathic pulmonary fibrosis (IPF) serves as an instructional walkthrough (though I do not go so far as to categorize it as a gold-standard benchmark) of one end-to-end AI-driven target-and-molecule program.

Target identification began with multi-omics analysis of lung tissue from IPF patients versus healthy controls, combined with text mining across IPF literature and knowledge-graph reasoning over protein-protein interactions and pathway data. TNIK, a TRAF2- and NCK-interacting kinase, emerged as a novel regulator of fibrotic pathways. I.e., TNIK was not an established IPF target when the program started. The timeline is also notable. Project initiation to preclinical candidate nomination, the point where a program chooses the molecule it will advance into formal preclinical testing, took roughly 12–18 months, and the asset reached human clinical trials in under 30 months total, against an industry average of 4.5–6 years for early-stage discovery. Target validation ran in parallel with early chemistry; preclinical models confirmed that TNIK inhibition reduced fibrotic phenotypes, and generative-chemistry tools designed a series of inhibitors. The knowledge-graph workflow behind this will be the subject of a future article in this series.

Phase I established safety and pharmacokinetics, meaning how the body absorbs, distributes, and clears the drug. The Phase IIa trial (GENESIS-IPF, NCT05938920) enrolled 71 IPF patients across 21 sites in China over 12 weeks. The secondary endpoint, change in forced vital capacity (a standard measure of lung function) from baseline at week 12, showed +98.4 mL for the 60 mg daily dose versus −20.3 mL for placebo. Nature Medicine published the results in 2025, where they were positioned as the first clinical proof-of-concept for an end-to-end AI-discovered drug.

Keeping limitations in view, seventy-one patients over 12 weeks is a hypothesis-generating dataset, not a study sized or designed to support approval. No pivotal trial is running as of April 2026; Insilico is in regulatory discussions about a Phase IIb pivotal study, and a separate US Phase IIa (NCT05975983) is enrolling, with eight of the planned 60 patients having completed the 12-week treatment as of mid-2025. Keeping within the subject matter of this article series, TNIK had to be identified as a target before any chemistry could start. That identification came from a workflow combining multi-omics data, literature mining, and knowledge-graph reasoning. The Phase IIa result matters because it is consistent with the target call having been right. Whether the methodology generalizes is a question the next decade of readouts will answer.

A useful counterweight is Recursion’s REC-994, a lead pre-merger AI-discovered candidate for cerebral cavernous malformation, which was discontinued in May 2025 after long-term Phase II data failed to confirm earlier efficacy trends. High-profile AI-guided programs can fail in the clinic for the same reason other programs do. Through curating better evidence, we hope to lower the failure rate.

Who does drug target discovery

For readers whose mental model of drug discovery centers on Big Pharma, it’s motivating to keep in mind that, in the United States, nearly 60% of newly approved drugs were discovered in universities or biotechnology companies, not by Big Pharma’s own R&D. Small, often academic-adjacent biotechs take the early-stage innovation risk. Big Pharma licenses, acquires, or partners to bring the late-stage program through approval and marketing.

The reasons are structural. Big Pharma faces the “better than the Beatles” problem (the bar for new drug approval keeps rising because better drugs already exist), the “low-hanging fruit” problem (the easy targets are mostly drugged), the “cautious regulator” problem (FDA standards ratchet up after each safety scare and rarely relax), and a tendency to industrialize the wrong activities; scaling basic research and brute-force screening has not improved clinical success rates in aggregate. For more information, we discuss these problems and the related Eroom’s Law in detail within chapter 1 of “Machine Learning for Drug Discovery.”

For a practitioner in 2026, this means target-discovery tooling is disproportionately built inside smaller, often AI-native companies, licensed into Big Pharma programs, or run academically against public data. Big Pharma’s target-discovery groups increasingly act as evaluators and integrators rather than primary generators.

The rest of the drug target discovery series

The rest of this series goes deep on what this pillar surveys:

Evidence frameworks → How to tell a drug target matters Driver vs passenger mutations, oncogene addiction (tumors becoming unusually dependent on one gene), multi-omics integration, GWAS, and practical target-assessment frameworks such as AstraZeneca’s 5R and GOT-IT.
Druggability in the AlphaFold 3 era → Druggability, ligandability, and modality choice in the AlphaFold 3 era Classical druggability, the expanding druggable-genome concept, therapeutic modalities such as antisense oligonucleotides (ASOs), PROTAC degraders, and antibody-drug conjugates (ADCs), and what AlphaFold 3 changed about structure-based drug design.
Tissue specificity → Tissue specificity: the safety half of target selection Single-cell RNA-seq and reference atlases such as GTEx and the Human Protein Atlas, and how tissue-level gene-expression data feeds into safety prediction.
Likelihood of approval by therapeutic area → The most crowded and abandoned therapeutic areas Likelihood of approval by therapeutic area, the economics of repurposing, and why some disease areas are systematically more tractable.
Novelty vs repurposing → Novelty vs repurposing: when to invent a target and when to reuse one When to invent a new target and when to reuse one. The Illuminating the Druggable Genome program, which focuses on understudied proteins, and research bias toward well-studied proteins.
Synthetic lethality → Synthetic lethality and combination targets: ML methods for finding drug pairs that work together CRISPR-based screening, synthetic lethality, where dual perturbation of two genes kills a cell even though either single perturbation does not, and ML methods for drug-synergy prediction, including tools such as MAGeCK, CRISPRi, and Perturb-seq.
Knowledge graphs and the rentosertib case study → Knowledge graphs and case studies in AI-driven target discovery Biomedical named-entity recognition, relation extraction, knowledge-graph embedding models, and walkthroughs of publicly documented AI-discovered-drug programs.
Virtual cells → Virtual cells for target discovery, perturbation models, and benchmarks Single-cell atlases, perturbation-response models, Perturb-seq benchmarks, and how simulated interventions can prioritize target-validation experiments.

How drug target discovery connects to Machine Learning for Drug Discovery

Machine Learning for Drug Discovery concentrates on methods that begin after a target has been chosen. Molecular property prediction, virtual screening, generative chemistry, protein structure prediction, drug repurposing, and multimodal pipelines are chapter-length topics because they are well-defined ML problems with benchmarks and data.

If you came here from the book and have already built a property predictor, screened a virtual library, or trained a generative model for lead compounds, this series asks a prior question: how did anyone decide that was the right target in the first place? As you read through this series, you might notice that the methods change with the problem. For example, the work may involve graph learning over biomedical knowledge graphs, natural-language processing over biomedical literature, or exploiting multi-omics integration with multimodal models. However, at their core, these methods are more like variants of the methods we discuss in the book, rather than new or alien concepts.

What’s next in drug target discovery: open questions

Near-term readouts to watch. The 15 or so AI-discovered programs expected to enter pivotal Phase III trials in 2026 will provide the first broad test of whether front-of-funnel AI improves late-stage success rates. Rentosertib is the clearest test of AI-led target identification, because the target-selection step itself was AI-derived. Schrödinger/Takeda’s zasocitinib (TYK2 inhibitor) is a corresponding test of physics-based AI design.

Methodological open questions. First, how do you benchmark target-discovery ML without waiting a decade for each label? Surrogate endpoints, or proxy benchmarks, such as asking whether a method can rediscover known targets for known diseases using only pre-discovery literature, help but do not close the loop and may suffer from data leakage, data snooping, and related problems. Second, how far does the current generation of target-discovery ML extend beyond well-studied indications? The “ignorome,” the large set of human proteins that remain thinly studied, biases the training data toward what has already been studied. A target-discovery method that mostly finds targets adjacent to known ones misses the hardest part of the problem. The ignorome is the subject of a future article on novelty vs repurposing.

Regulatory shift. The FDA’s January 2025 draft guidance on AI in drug development established a risk-based credibility framework for when and how AI contributions must be qualified for regulatory submissions. Final guidance is expected within 2026, which will influence how target-discovery AI can be cited in an IND package (the dossier submitted to request permission for human testing).

For the first time, the ML methods used in target discovery are starting to produce clinical readouts. However, the jury’s still out and the next decade will show how much they change outcomes – anyone telling you differently has a financial interest competing with transparency ;)

What This Site Is For

2026-04-29T17:00:00+00:00

This site is my home base for work that sits between research, production AI systems, drug discovery, and teaching.

The near-term plan is simple: publish notes that make complicated systems easier to reason about. Some posts will unpack ideas from Machine Learning for Drug Discovery. Others will come from teaching graduate cheminformatics and machine learning at UC Berkeley, or from the practical messiness of building agentic systems that have to work outside a benchmark.

I want the writing here to be useful to people who build things: researchers, students, applied scientists, and drug discovery folks trying to make sense of modern ML without losing the thread of the underlying science.

This site is also the canonical archive. I may syndicate posts to Substack, LinkedIn, or X, but the version here is the one I will keep current. You can follow via RSS if you prefer the old, sturdy internet.

For now, the best starting points are the book, the talks page, and the publications list.

Noah Flynn

Druggability, ligandability, and modality choice in the AlphaFold 3 era

Why druggability assessment matters in 2026

What “druggable” actually means

The classical druggability landscape by protein family

Structure prediction and the 2020/2024 inflections

A worked example: AlphaFold-enabled design for an undrugged target

Beyond small molecules: modality choice is now a first-class question

Targeted protein degradation (PROTACs and molecular glues)

RNA-targeting drugs (ASOs and siRNAs)

Antibody-drug conjugates (ADCs)

Therapeutic peptides and cyclic peptides

CAR-T, gene therapy, cell therapy

The druggable genome is expanding, and the ignorome remains large

What’s next in druggability assessment

Further reading

How to tell a drug target matters: evidence frameworks for target–disease linkage

Why target–disease evidence matters in 2026

What target–disease association evidence actually means

The AstraZeneca 5R framework: “right target” as the first R

GOT-IT: four assessment blocks

Four important distinctions within target–disease linkage

Driver vs passenger mutations

Causative, supportive, and symptomatic treatment

Oncogene addiction

Target-disease correlation vs causal target-disease linkage

The omics stack as target–disease evidence

Genomics and GWAS

Transcriptomics

Proteomics and metabolomics

Epigenomics

Multi-omics integration and its limits

Networks and knowledge graphs: evidence beyond single-gene associations

A worked example: senolytic target identification in idiopathic pulmonary fibrosis

Open questions in target–disease evidence work

Further reading

Drug target discovery: the front-of-funnel decision behind most Phase II failures

Why drug target discovery matters in 2026

What a drug target is

Funnel Economics: The Cost of Being Wrong at Target Selection

Why drug target discovery is hard

Target–disease linkage

Target-related safety

Commercial strategy

Technical feasibility

Data quality

Where ML helps in drug target discovery

Scale

Cryptic patterns

Automation

Worked Example: Rentosertib

Who does drug target discovery

The rest of the drug target discovery series

How drug target discovery connects to Machine Learning for Drug Discovery

What’s next in drug target discovery: open questions

Further reading

What This Site Is For