(596d) YODA: A Question- Answering System for Pharmaceutical Informatics

Venkatasubramanian, V., Purdue University
Malik, T., Purdue University

A huge body of pharmaceutical information gets generated in terms of scientific documents associated with every pharmaceutical product developed. The documents include wide ranging information such as pre-formulation studies, product formulation, process development, and manufacturing [1]. Further, information is represented in a variety of ways: raw data or unstructured laboratory reports; tabular or graphical data describing study designs or experimental results; and mathematical models. Owing to the lack of an inherent structure, it is not possible to organize and make (re)use of information for new design decisions in an efficient manner. There is critical need for cyber infrastructure services that help users to find relevant information quickly.

We are addressing this important challenge in an ontology-based pharmaceutical informatics project in collaboration with Eli Lilly. We are developing Your Ontology-driven Answerer (YODA), a question answering environment that can provide precise answers to specific questions. Given a question such as ?which experiments used Riboflextine HCL?? a keyword-based search engine such as Google might present the user with pdf files in which the words ?experiments? and ?Riboflextine HCL? may have potentially occurred, whereas the YODA system would attempt to directly answer the question with the name of the experiment. Our ontology-driven QA system is designed to answer questions about experiments done on pharmaceutical product systems. YODA integrates natural language processing (NLP), first order logic, ontologies and information retrieval techniques in a uniform framework. The key feature is the use of Purdue Ontologies for Pharmaceutical Engineering (POPE) [2] at various stages of the system. The POPE ontology is used in the refinement of the initial query, in document annotation and in the reasoning process. The ontology is used to provide an intelligent reformulation of the question, with the intent to reduce the chances of failure to answer the question.

Figure 1 shows the architecture of the YODA system and its various components. The POPE ontology is an initial semantic vocabulary that is used for annotating pharmaceutical documents with semantic content. The initial vocabulary is used for extracting entities (or concepts) and relations between entities. We use a classification model based on conditional random fields [3] to tag document text using predefined entity types such as TABLET, API, MANUFACTURING_PROCESS and OPERATING_CONDITION. The instances of the entities will be populated based on a similarity algorithm as described in [4]. These instances can be written in RDF [5] or RDFS [6], notations which provide a basic framework for expressing meta-data on the web. English language queries will be classified and reformulated with the help of POPE ontology into SPARQL queries which return exact answers to users. In our presentation, we will describe this system in some detail and present examples of its use.

[1] P. Beringer, A. DerMarderosian and L. Felton. Remington: The science and practice of pharmacy, 21st Edition, Lippincott, Williams and Wilkins, University of the sciences, Philadelphia 2006. [2] Venkatasubramanian, V., Zhao, C., Joglekar, G., Jain, A., Hailemariam, L., Sureshbabu, P., Akkisetti, P., Morris, K. and Reklaitis, G.V., ?Ontological Informatics Infrastructure for Chemical Product Design and Process Development?, Computers and Chemical Engineering, CPC 7 Special Issue, 30(10-12), 2006, 1482-1496. (Invited paper. Won the Best Paper Prize for 2006 from CACE) [3] A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons, Seventh Conference on Natural Language Learning (CoNLL), 2003. [4] M. Vergas-Vera and E. Motta. AQUA ? Ontology-Based Question Answering System, Advances in Artificial Intelligence, 2004. [5] P. Hayes: RDF Model Theory, W3C Working Draft, February 2002. URL:http://www.w3.org/TR/rdf-mt/ [6] D. Brickley and R. Guha: Resource Description Framework (RDF) Schema Specification 1.0. Candidate recommendation, World Web Consortium, 2000. URL:http://www.w3.org/TR/2000/CR-rdf-schema-20000327.