A semi-supervised learning framework for quantitative structure-activity regression modelling.

Watson O.; Cortes-Ciriano I.; Watson JA.

Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

A semi-supervised learning framework for quantitative structure-activity regression modelling.

Watson O., Cortes-Ciriano I., Watson JA.

MOTIVATION:Quantitative structure-activity regression (QSAR), a type of supervised learning, is increasingly used in assisting the process of preclinical, small molecule drug discovery. Regression models are trained on data consisting of a finite dimensional representation of molecular structures and their corresponding target specific activities. These models can then be used to predict the activity of previously unmeasured novel compounds. RESULTS:This work provides methods that solve three problems in QSAR modelling. First, (i) a method for comparing the information content between finite dimensional representations of molecular structures (fingerprints) with respect to the target of interest. Second, (ii) a method that quantifies how the accuracy of the model prediction degrades as a function of the distance between the testing and training data. Third, (iii) a method to adjust for screening dependent selection bias inherent in many training data sets. For example, in the most extreme cases, only compounds which pass an activity-dependent screening are reported. A semi-supervised learning framework combines (ii) and (iii) and can make predictions which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias. We illustrate the three methods using publicly available structure-activity data for a large set of compounds reported by GlaxoSmithKline (the Tres Cantos AntiMalarial Set, TCAMS) to inhibit asexual in vitro P. falciparum growth. AVAILABILITY:https://github.com/owatson/PenalizedPrediction. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

Original publication

DOI

10.1093/bioinformatics/btaa711

Type

Journal article

Journal

Bioinformatics (Oxford, England)

Publication Date

10/08/2020

Addresses

Evariste Technologies Ltd, Goring on Thames, RG8 9AL, United Kingdom.