How is Numerate’s approach different from traditional drug discovery?

For over 20 years, researchers have been trying to apply computers to address some of the key cost, time, and attrition problems associated with drug discovery.

Most of these efforts have been aimed at answering a very open-ended (and difficult) question: "What can be predicted about any compound's activity?" This effort has led to technologies like docking, simulation, and QSAR.

In contrast, Numerate focuses on the very practical (and easier) question: "What compounds should I make next given all the data I have?" Our approach is to use what we know—encoding understood chemistry, physics, and biology—then bridging the knowledge gap with statistics rather than assumptions. Our focus is on modeling assays, not simulating nature.

How do you represent the empirical data?

Much progress has been made by the computational chemistry community in general, and at Numerate in particular, in developing appropriate representations. For example, the features of a small molecule that could potentially impact its binding to a protein are fairly well understood. We build on and extend the shape and standard electrostatic descriptors to capture the most important features of a molecule.

What about the statistics?

To make useful predictions about which compounds are likely to work in the laboratory, the key unsolved problems have been in the area of statistics.

Specifically, making accurate predictions requires us to address issues of bias (publication and medicinal chemistry-bias or scaffold-bias); experimental noise typical in assay results; and statistical complexity, the mismatch between the small number of data points and large number of variables needed to represent the phenomena being modeled.

The result. Our process works in real world projects where the challenges have included small data sets (a few dozen data points), convoluted data (phenotypic assays), multi-target profiles (including selectivity), and even deceptive data (multiple binding sites).

The predictions are scaffold independent, generally applicable to any small molecule target, and extremely fast—fast enough to search through spaces of 100 billion compounds in one month (on 1,000 computers).

Numerate
Data-driven drug design