(596d) YODA: A Question- Answering System for Pharmaceutical Informatics

Conference

AIChE Annual Meeting

Year

2010

Proceeding

2010 Annual Meeting

Group

Computing and Systems Technology Division

Session

Innovations in Information Technology

Time

Thursday, November 11, 2010 - 10:00am to 10:30am

Authors

Venkatasubramanian, V. - Presenter, Purdue University

Malik, T. - Presenter, Purdue University

A huge body of pharmaceutical information gets generated in terms of scientific documents associated with every pharmaceutical product developed. The documents include wide ranging information such as pre-formulation studies, product formulation, process development, and manufacturing [1]. Further, information is represented in a variety of ways: raw data or unstructured laboratory reports; tabular or graphical data describing study designs or experimental results; and mathematical models. Owing to the lack of an inherent structure, it is not possible to organize and make (re)use of information for new design decisions in an efficient manner. There is critical need for cyber infrastructure services that help users to find relevant information quickly.

We are addressing this important challenge in an ontology-based pharmaceutical informatics project in collaboration with Eli Lilly. We are developing Your Ontology-driven Answerer (YODA), a question answering environment that can provide precise answers to specific questions. Given a question such as ?which experiments used Riboflextine HCL?? a keyword-based search engine such as Google might present the user with pdf files in which the words ?experiments? and ?Riboflextine HCL? may have potentially occurred, whereas the YODA system would attempt to directly answer the question with the name of the experiment. Our ontology-driven QA system is designed to answer questions about experiments done on pharmaceutical product systems. YODA integrates natural language processing (NLP), first order logic, ontologies and information retrieval techniques in a uniform framework. The key feature is the use of Purdue Ontologies for Pharmaceutical Engineering (POPE) [2] at various stages of the system. The POPE ontology is used in the refinement of the initial query, in document annotation and in the reasoning process. The ontology is used to provide an intelligent reformulation of the question, with the intent to reduce the chances of failure to answer the question.

Figure 1 shows the architecture of the YODA system and its various components. The POPE ontology is an initial semantic vocabulary that is used for annotating pharmaceutical documents with semantic content. The initial vocabulary is used for extracting entities (or concepts) and relations between entities. We use a classification model based on conditional random fields [3] to tag document text using predefined entity types such as TABLET, API, MANUFACTURING_PROCESS and OPERATING_CONDITION. The instances of the entities will be populated based on a similarity algorithm as described in [4]. These instances can be written in RDF [5] or RDFS [6], notations which provide a basic framework for expressing meta-data on the web. English language queries will be classified and reformulated with the help of POPE ontology into SPARQL queries which return exact answers to users. In our presentation, we will describe this system in some detail and present examples of its use.

[1] P. Beringer, A. DerMarderosian and L. Felton. Remington: The science and practice of pharmacy, 21st Edition, Lippincott, Williams and Wilkins, University of the sciences, Philadelphia 2006. [2] Venkatasubramanian, V., Zhao, C., Joglekar, G., Jain, A., Hailemariam, L., Sureshbabu, P., Akkisetti, P., Morris, K. and Reklaitis, G.V., ?Ontological Informatics Infrastructure for Chemical Product Design and Process Development?, Computers and Chemical Engineering, CPC 7 Special Issue, 30(10-12), 2006, 1482-1496. (Invited paper. Won the Best Paper Prize for 2006 from CACE) [3] A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons, Seventh Conference on Natural Language Learning (CoNLL), 2003. [4] M. Vergas-Vera and E. Motta. AQUA ? Ontology-Based Question Answering System, Advances in Artificial Intelligence, 2004. [5] P. Hayes: RDF Model Theory, W3C Working Draft, February 2002. URL:http://www.w3.org/TR/rdf-mt/ [6] D. Brickley and R. Guha: Resource Description Framework (RDF) Schema Specification 1.0. Candidate recommendation, World Web Consortium, 2000. URL:http://www.w3.org/TR/2000/CR-rdf-schema-20000327.

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

6th Middle East Process Engineering Conference and Exhibition

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.