(15e) Accelerating Drug Discovery and Development Using an Ontology-Based Machine Learning Framework | AIChE

(15e) Accelerating Drug Discovery and Development Using an Ontology-Based Machine Learning Framework

Authors 

Viswanath, S., Eli Lilly and Company
Vaidyaraman, S., Eli Lilly and Company
Venkatasubramanian, V., Columbia University
In the current era of AI-based drug discovery and development, an automated information extraction framework that uncovers rich information and hidden insights from pharmaceutical documents is required. Such a framework would lead to faster, efficient, and (semi) automated decision-making, thus accelerating the drug discovery and development cycle. We present a pharmaceutical information extraction framework that automatically extracts important information and relationships from pharmaceutical text. The developed framework comprises two steps – first, identifying entity mentions that signify specific information of interest, and second, inferring relationships between identified entities representing important information. Our approach includes a variety of modules including – a custom-built pharmaceutical ontology [1, 2], standard ontologies from Unified Medical Language System [3], custom algorithms for information contextualization, natural language-inspired custom approaches for relationship extraction, and a BioBERT model [4] for improving generalizability of the framework. Our framework is a blend of AI and domain knowledge-based methods, thus making it a hybrid AI model as opposed to purely data-driven approaches. We demonstrate the efficacy of our approach on pharmaceutical drug briefing reports and guideline documents.

References

1. Hailemariam, Leaelaf, and Venkat Venkatasubramanian. "Purdue ontology for pharmaceutical engineering: part I. Conceptual framework." Journal of Pharmaceutical Innovation 5 (2010): 88-99.

2. Hailemariam, Leaelaf, and Venkat Venkatasubramanian. "Purdue ontology for pharmaceutical engineering: Part II. Applications." Journal of Pharmaceutical Innovation 5 (2010): 139-146.

3. Bodenreider, Olivier. "The unified medical language system (UMLS): integrating biomedical terminology." Nucleic acids research 32.suppl_1 (2004): D267-D270.

4. Lee, Jinhyuk, et al. "BioBERT: a pre-trained biomedical language representation model for biomedical text mining." Bioinformatics 36.4 (2020): 1234-1240.