(721e) Organization of Mongodb Database in Modena Project

Authors: 
Preisig, H. A., Norwegian University of Science and Technology (NTNU)
Birgen, C., Chemical Engineering, Norwegian University of Science and Technology

MoDeNa aims at developing, demonstrating and assessing an easy-to-use multi-scale software-modelling framework application under an open-source licensing scheme that delivers models with feasible computational loads for process and product design of complex materials. Four scales are linked together by this interconnected modelling-software framework namely the nano-, micro-, meso-, and macroscale. The orchestrator enables the linking of all scales which is a necessary condition to obtain an integral approach (MoDeNa).

The orchestrator employs document-oriented NoSQL datastore as a back-end repository. The datastore handles a great variety of data types including execution state, inputs and outputs. NoSQL is chosen primarily due to its flexibility. Unlike other relational datastores, it does not require a predefined data schema between all these different types of data. As the data grow in size and the data structure evolve with new data, NoSQL can easily adapt to these changes instead of modifying a complex relational data schema (Gunter et. al., 2012).

Datastore choice is MongoDB, a schemaless document store database developed in an open-source project (Shermin, 2013). It has several advantages among other document-oriented datastores, namely powerful and simple query language, ease of administration, and good performance on read-heavy workloads where most of the data can fit into memory (Gunter et. al., 2012). Its relative weakness of huge datasets and write-heavy workloads can be a trade-off for MoDeNa project.

This presentation focuses on the organization of the datastore. In MongoDB, there are collections which hold data of similar type, called documents. Documents can also store documents in embedded form which provide ease in read operations. In MoDeNa project, the input data is in JSON format, and holds models and initial data points which are inserted by a user. Calculated data points are stored along with the input data. The execution state data is generated by the software and stores the state and intermediate results for all tasks in the system. The input data and execution state data are stored in separate datastores. The data structure will be illustrated further in the presentation.

References

  1. Shermin, M. (2013). An Access Control Model for NoSQL Databases.

  2. Gunter, D., Cholia, S., Jain, A., Kocher, M., Persson, K., Ramakrishnan, L., ... & Ceder, G. (2012, November). Community Accessible Datastore of High-Throughput Calculations: Experiences from the Materials Project. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: (pp. 1244-1251). IEEE.

  3. MoDeNa project website. Retrieved May 11, 2015 from http://www.modenaproject.eu/