A Dicom Extension Supporting Data Acquisition in Synthetic Biology | AIChE

A Dicom Extension Supporting Data Acquisition in Synthetic Biology

Authors 

Bultelle, M. A., Imperial College
Kitney, R. I., Imperial College


Usage of standard parts is a common approach successfully adopted in several engineering disciplines. Different biology initiatives have mainly focused on maximising the public availability of bioparts without a strong focus on data quality. To improve the quality of data, the Centre for Synthetic Biology and Innovation at Imperial College has developed an automated characterisation pipeline for biological parts, that comprises:
1. A robust data standard for the acquisition of experimental data.
2. A common IT-spine to track and store all data as they are processed and curated.
3. A robust dissemination strategy, enabling public access to high quality biopart information. The Synthetic Biology Information System (SynBIS) has been developed to implement the points 2
and 3 above. It will centralise biological knowledge from the characterised part data, such that only high quality information is disseminated.
This paper presents our work with regard to point 1: the development of a robust data standard for the acquisition of experimental data. In order to this standard to successfully support our characterisation pipeline, we pursued the following objectives:

ï?· Enable the encoding of good quantitative and qualitative information.

ï?· Empower reproducibility of the same results when the experiment is repeated in a different location.

ï?· Support the representation of modular data.

ï?· Optimise the storage of massive amounts of characterisation data.

ï?· Provide services oriented to the automation and exchange of information between data acquisition points and central repositories.

DICOM-SB

We undertook a review of existing standards which might be augmented for synthetic biology. The result was that the existing standard, Digital Imaging and Communications in Medicine (DICOM), was able to represent experimental data, whilst complying with the objectives above. DICOM is an extensive standard supported by a large industrial consortium (NEMA), including the main medical hardware manufacturers. Given its long term success and proven industry orientation, we decided to take advantage of that work and experience and develop our synthetic biology data acquisition standard as an extension of DICOM. The Synthetic Biology extension for DICOM (DICOM-SB, working title) provides a framework that allows the integration of wet lab experimental data acquisition modalities into a common data model. It enhances the basic architecture inherited from DICOM, allowing to encode new synthetic biology data acquisition modalities not present in the general standard (NEMA PS3 / ISO 12052, n.d.).
DICOM encodes data objects as a series of items (or data elements), such that each item is identified by a predefined attribute (also called tag). Each attribute is related to a data type (e.g. integer, float, character, string, etc.) which in DICOM is called Value Representation (VR). Finally, the value to be represented is encoded at the end (last bytes) of each item, preceded by the total items length.
Since the amount of DICOM attributes is so extensive, building consistent objects that include all the required information can become a tedious task if they have to be searched and chosen one by one. In order to ease this task, DICOM clusters the attributes describing properties of one entity or
group of entities into the same Information Module. This way, when designing a DICOM object to encode a certain data structure, modules will be the minimal blocks that will be combined to assemble them. The module specification also determines what attributes are mandatory (and thus must always be completed) and which ones can be left incomplete. Finally, the different

Information Entities (IE's) are built from the Information Modules, and Information Object

Definitions (IOD's) are aggregated from IE's.

Each and every data object must implement a standard IOD, and their entities are related following a hierarchical information model. We have defined the DICOM-SB Hierarchical Information Model, comprising the following Information Entities:

ï?· Biopart is the object of study, and thus this entity is located at the top of the hierarchy.

ï?· Each biopart can be studied by a characterization experiment, whose objective is to
perform all the experimental procedures required to generate a standard biopart datasheet.
ï?· Each characterisation experiment comprises a set of assays / modalities that are repeated over a number of days. Each single repeat of a modality, performed with a concrete equipment and following a specific protocol constitutes a series.
ï?· Each series can include a number of modalities, encoding the information relative to a specific data acquisition activity within a concrete equipment.
The experimental measurements for each data acquisition activities are stored in the corresponding attributes of each modality. Our current protocol uses two different types of modalities:

ï?· Cell population measurements: a plate reader provides pairs of correlated time series of

GFP intensity and total amount of cells per wave.

ï?· Single cell measurements: a flow cytometer provides a series of individual GFP

measurements per cell.
The remaining modality fields, as well as the rest of the higher entities in the data model, store the metadata required to classify, process, analyse and disseminate the experimental measurements.
Once a new data model (IOD) is set, DICOM must offer a service to allow the communication of between different data acquisition points and repositories. In a data acquisition context like ours, there is a need of (at least) a service that stores the acquired IOD into a data repository. For that purpose we have developed a web service that implements a DICOM Message Service Element (DIMSE): the Store service. The combination of an IOD and the corresponding DIMSE service creates the Service Object Pairs (SOP's).

Discussion

A standard like DICOM-SB can provide very important advantages when supporting data acquisition processes in synthetic biology.
One mayor advantage is optimized data storage. Instead of using text-based representations like most standards (e.g. XML, SBML, SBOL), DICOM encodes data in binary format. While this doesn't make a difference when dealing with text strings or characters, it offers significant savings when dealing with numbers: binary representations allow to encode up to 256 different numbers per byte; conversely, text based representations use 1 byte per digit, meaning up to 10 numbers per byte. In sum: DICOM offers up to a 25:1 downscale just by using binary encoding without data
compression. This feature becomes especially interesting when dealing with data intensive modalities, such as flow cytometries (up to 50000 events per file), microarrays, etc.
Communication is another very relevant feature, especially when applied to a data acquisition process. The fact that a standard data representation is always packed within a corresponding communication service facilitates the automated distribution of results between measuring equipment and repositories. Medical imaging has benefited from this automation for over the last decade thanks to DICOM being embedded in most of their commercial modalities. The adoption of a similar standard by the synthetic biology modality suppliers would be a key milestone towards the achievement of those automation standards. We have trialled DICOM-SB with the Poh lab at NTU in Singapore. Based on the results, we are confident about the potential benefits and its
acceptance by industry.
Far from competing with other synthetic biology standards, we think DICOM-SB complements what is currently available in the area. For example, SBML is widely accepted as the standard to encode simulation models; SBOL is very powerful providing qualitative descriptions of bioparts (Galdzicki
et al., 2014, 2012) and their interactions (Roehner et al., 2014), as well as sequence annotations. But neither of them offer a solution to encode experimental data as DICOM-SB does. We aim to empower the coexistence of complementary standards, therefore we use both DICOM-SB and SBOL within our SynBIS platform. We integrate them using a property of SBOL 2.0 (Roehner et al.,
2014) that allows the annotation of bioparts using links to resources encoded following a different standard (in our case, DICOM-SB).

References

Galdzicki, M., Clancy, K.P., Oberortner, E., Pocock, M., Quinn, J.Y., Rodriguez, C.A., Roehner, N., Wilson, M.L., Adam, L., Anderson, J.C., Bartley, B.A., Beal, J., Chandran, D., Chen, J., Densmore, D., Endy, D., Grünberg, R., Hallinan, J., Hillson, N.J., Johnson, J.D., Kuchinsky, A., Lux, M., Misirli, G., Peccoud, J., Plahar, H.A., Sirin, E., Stan, G.-B., Villalobos, A., Wipat, A., Gennari, J.H., Myers, C.J., Sauro, H.M., 2014. The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology.
Nat. Biotechnol. 32, 545â??550. doi:10.1038/nbt.2891
Galdzicki, M., Wilson, M., Rodriguez, C.A., Pocock, M.R., Oberortner, E., Adam, L., Adler, A., Anderson, J.C., Beal, J., Cai, Y., Chandran, D., Densmore, D., Drory, O.A., Endy, D.,
Gennari, J.H., Grünberg, R., Ham, T.S., Hillson, N.J., Johnson, J.D., Kuchinsky, A., Lux,
M.W., Madsen, C., Misirli, G., Myers, C.J., Olguin, C., Peccoud, J., Plahar, H., Platt, D., Roehner, N., Sirin, E., Smith, T.F., Stan, G.-B., Villabos, A., Wipat, A., Sauro, H.M., 2012. Synthetic Biology Open Language (SBOL) Version 1.1.0.
NEMA PS3 / ISO 12052, n.d. , Digital Imaging and Communications in Medicine (DICOM) Standard. Rosslyn, VA, USA.
Roehner, N., Oberortner, E., Pocock, M., Beal, J., Clancy, K., Madsen, C., Misirli, G., Wipat, A.,
Sauro, H., Myers, C.J., 2014. Proposed Data Model for the Next Version of the Synthetic
Biology Open Language. ACS Synth. Biol. doi:10.1021/sb500176h