Data Management Schema Design for Effective Nanoparticle Formulation for Probing and Treating Neurological Disease
- Type: Conference Presentation
- Conference Type: AIChE Annual Meeting
- Presentation Date: November 8, 2021
- Duration: 15 minutes
- Skill Level: Intermediate
- PDHs: 0.50
Database management systems already commonly exist in clinical medical applications. For example, The Brain Database is a clinical neuroanatomy database developed over three decades ago . However, pre-clinical data management schemas have different complexities than clinical applications. Pre-clinical databases need management schemas that logically connect experimental methodologies, which can be highly repetitive, vary in quality, are prone to rapid iteration and evolution, and often research lab or facility-specific. There is limited or non-existent literature on effective database management for nanoformulations used preclinically, but similar approaches for utilizing databases exist in computational cell biology for managing data-rich biological cell process data . For example, BioNumbers is a computational cell biology database that stores and organizes key quantitative features from cell biology . The BioNumbers database is composed mainly via natural language processing of the literature, containing one specific number per property. In contrast, nanoformulation-brain databases are focused on organizing wet lab experimental research prior to the publication phase and often have repetitive experiments with varying values for each property. A successful standardized data management schema for pre-clinical nanotherapeutic experimental pipelines has three main effects: (1) increased sustainability for data organization and storage, (2) increased insight into related variables and their interactions, and (3) increased searchability of specific results or methods.
To show the efficacy of a well-designed pre-clinical database management schema for nanoformulation data, we create a database using PostgreSQL that performs three functions: (1) Obtains the nanoparticle formulation variables that produce specific nanoparticle sizes and charges; (2) Connects nanoformulations to all related experimental elements, including nanoformulation methodology, biological specimen prep, and biological specimen characterization; and (3) Defines a standard lab protocol for regularly updating the database as new nanoformulation experiments are carried out.
Our first step was to design the entity-relationship diagram to relate three main aspects: experiment, nanoformulation, and biological specimen (e.g. application). We then created a process flow diagram as a visual representation of data management from the chemical engineering, experimentalist perspective (Figure 1). This process flow diagram was converted into an entity-relationship diagram using LucidChart. With LucidChart, we were able to visualize our relationship details within the database, including all variables, keys, and variable types. From LucidChart, we directly exported the database schema in PostgreSQL and subsequently imported the data onto a local server. We hosted the database locally due to the overall data size (51.5 kilobytes). Before we uploaded all of our data to the schema, we cleaned and standardized the data sets, which were collected from nine independent researchers.
After the database was created, we developed standard queries for accessing nanoformulation and biological data based on commonly used variables. For the nanoformulation data, we use three key physical characteristics of nanoformulations to inform our queries: size, zeta potential, and polydispersity index (PDI). Nanoparticle size and surface charge can influence nanoparticle passage across the blood-brain barrier and penetrate within the brain parenchyma . Zeta potential is an indirect measure of nanoparticle surface charge and PDI is an indicator of particle size uniformity. We developed standardized queries which allow us to search biological characterization information including catalase activity, cytotoxicity over time, and glutathione levels for experiments that use nanoformulations in biological specimen. Finally, we developed a standardized method of inputting data into a cloud-based database by creating group spreadsheets with commonly used variables for each nanoformulation technique and characterization method. We additionally standardized a method for assigning unique keys for each individual nanoformulation, researchers, characterization experiment, and biological specimen based on input order and time information.
Once the database was set up, data input, and queries written, we evaluated our data management schema's effectiveness via two methods. The first approach evaluated the times it took for each query to run and return results. The first set of our queries selects all nanoparticle formulations with specific characterization features in either size, zeta potential, polydispersity, or a mix of these three variables. The second set of our queries obtains all nanoparticle formulations and their formulation variables for specific nanoparticle features for either single emulsion, double emulsion, or nanoprecipitation methodologies. The slowest query ran with an average time of 72.5 milliseconds. Before creating the database, to obtain a list of nanoformulation methodologies and their variables for a specific nanoformulation characterization result, an interested researcher would need to make personal asks to other researchers or perform additional nanoformulation experiments. Both options are time-consuming and ineffective. With a standard database, it is possible for researchers to easily access previous experiments from other researchers in an efficient way by searching directly for a variable of interest.
The second evaluation method visualizes our nanoformulations' physical characteristics for each query. The visualization of our nanoformulation data is important because it allows us to check that the queries are doing what we intended and provides valuable scientific insight into our formulation data. For example, if we query the size, zeta potential, and PDI, it is clear that the queries successfully minimize our data set to those within user-defined parameters. Our first query restricted the size of formulated nanoparticles to within 50 to 100 nanometers. This query restricted the total results from 538 formulations to 237. Our second query added an additional restriction on the zeta potential values to reside between -10 and 10 millivolts. Query 2 restricted our results from 237 formulations to 160. Finally, we restricted our overall formulations with an additional parameter on PDI values between 0.0 and 0.2 which brought our viable nanoformulation with these parameters to 109. These evaluation plots also provide unique insight into how our different nanoformulation methods compare to one another. For example, we can now query our data for specific nanoparticle feature ranges and then compare our single emulsion methodologies to our double emulsion methodologies. We are also able to gain insight from our zeta potential chart as is. From queries 2 and 3, we learned that although our nanoformulations are within the neutral range (-10 to 10 mV), they still tend towards negative values. The database works to successfully relate formulations and methodologies with specific formulation outcomes.
Developing and utilizing a database management schema on already established experimental pipelines provided new insights, effective data storage, and data querying options. We were able to query our nanoformulations to obtain approximately 100 nanoformulation batches that fit the requirements imposed in this project. We also have shown a proof-of-concept that our experimental data can benefit from a well-structured database management schema. Additionally, we are currently expanding the database to match nanoformulation experiments to biological experiments. We are doing this by creating additional tables for biological specimen variables and related biological characterization methods. For example, an in vivo rat experiment can be connected to behavioral testing results, litter information, and any tissue processing. Once biological information is input into the database, we use an experiment tag to connect nanoformulations and their characterizations to the biological specimen they were tested in. The database organizes these connections for efficient querying, and then we develop queries for nanoformulations based on biological features such as species, age, or behavioral results.
Finally, we plan to develop standard practices for discussing, planning, and participating in better data management. These practices will lead to better data utilization and data longevity, and greater reproducibility and interpretability of datasets generated from independent researchers. Insights into relationships and trends in nanomedicine formulations have benefitted from developing this database management schema. A standard data management schema for experimental nanoformulation experiments decreases the need for repetitive experimentation, connects nanoformulation variables with biological outcome in an interpretable and efficient way, and enables science-informed querying of any standardized variables. The overall benefits of a standardized database contribute to efficient experimentation and cross-experimental insight, which will ultimately improve pre-clinical to clinical translation of nanoformulations.
 Anselmo, A.C. and S. Mitragotri, Nanoparticles in the clinic: An update. Bioengineering & Translational Medicine, 2019. 4(3).
 Wetheim, S., The Brain Database: A Multimedia Neuroscience Database for Research and Teaching. 1989.
 Loew, L.M. and J.C. Schaff, The Virtual Cell: a software environment for computational cell biology. Trends in Biotechnology, 2001. 19(10): p. 401-406.
 Milo, R., et al., BioNumbersâthe database of key numbers in molecular and cell biology. Nucleic Acids Research, 2010. 38(suppl_1): p. D750-D753.
|AIChE Member Credits||0.5|
|AIChE Graduate Student Members||Free|
|AIChE Undergraduate Student Members||Free|