(242d) Sample Imbalance and Its Role in Understanding Drug Network Characteristics | AIChE

(242d) Sample Imbalance and Its Role in Understanding Drug Network Characteristics

Authors 

Shoemaker, J. E. - Presenter, University of Tokyo

Recently, several studies have shown that the targets of dangerous or ineffective compounds often have distinct topological positions in protein-protein interaction networks. These distinctions suggest that machine learning algorithms can play a valuable role in evaluating the therapeutic value of potential compounds. Here, we use compounds listed in the DrugBank databases to analyze the topological differences of the targets of clinically useful versus unsafe compounds. We show that while initially some mild distinctions are observed, these distinctions prove to be of minimal value for predicting clinical value. By applying permutation-based cross-validation, we find that the imbalances in the number of known targets between useful and unsafe compounds inhibits classifier training. We end by suggesting several methods for resolving target imbalance and discuss how to best optimize classifier training for biological networks.