(698a) A Novel Constrained Total Least Squares Formulation for the Identification of Gene Networks From Highly Noisy and Correlated Measurements

Conference

AIChE Annual Meeting

Year

2010

Proceeding

Computing and Systems Technology Division

Session

Thursday, November 11, 2010 - 3:15pm to 3:35pm

Authors

Guner, U. - Presenter, Georgia Institute of Technology

Realff, M. - Presenter, Georgia Institute of Technology

Lee, J. H. - Presenter, Korea Advanced Institute of Science and Technology (KAIST)

Genes, proteins, and metabolites can regulate one another in various ways. Regulatory proteins bind to DNA to affect the transcription of genes. Proteins can also combine to form multi-protein complexes that can take part in various functions in regulation [1]. All these interactions form a complex network of regulatory control. Experimentally, it is quite difficult to obtain the information on the levels of gene regulation. A key objective in systems biology is to map out and model the topological and dynamical properties of these networks.

Recently, different types of genomic data have been obtained to understand transcription regulation, e.g., DNA sequence data, micro-array gene expression data, and protein-DNA binding data. The advent of such diverse data has motivated various researchers to develop computational methods to model transcription regulation [2]. DNA-protein binding data provides information to understand the regulators involved in transcription. Time-series micro-array expression experiments are the main source of data which provides dynamic information about the expressions of thousands of genes that are activated or repressed in response to external stimuli [3].

Extensive studies on gene regulatory network modeling, using time-series data, have focused on linear discrete time model equations. In this model, the expression level of a gene is assumed to be the concentration of its transcript. The concentration of a particular transcript at time point
,
is given by the linear function of the concentrations of other RNA species at time point,
;

,

                                         (1)

where N is the number of transcripts in the network and
is the regulatory strength between gene pairs
and
.
is the error term for the difference between observation and the model. The errors are assumed to have Gaussian distribution with zero mean and standard deviation of
. The aim is to estimate parameter values,
's, from micro-array observations,
, thereby reconstructing the gene network. A negative
indicates an inhibition, and a positive value for
stands for activation between the gene pair. In general, only a small subset of all RNA species regulates a particular transcript, which means most of the
's are zero. In other words, the gene networks are sparse. [4].

Microarray data is usually subject to high levels of additive and multiplicative errors [5]. Therefore, one can write concentration levels for genes as follows;

                                         (2)

In this equation,
is the unknown true value for concentration of
gene at
time point and
is the measurement error. The terms
and
correspond to multiplicative and additive parts of the measurement error.

Using equation (1) and (2) , one can write the model for all genes,

(3)

where ,
,
and

Equation (3) can be written for all time points,
, as follows;

(4)

Where
,
,
,and
.

One can see that the error terms in both sides of the equation (4),
and
are serially correlated as they have same columns except for the first and last columns.

A significant problem from the regression standpoint is that both independent and dependent variables have high level of noise. Moreover, these noise terms are serially correlated. Other challenging characteristics include limited number of available data and sparse but unknown structure of the parameter matrix. There is limited access to the topology information of the network through noisy protein-DNA binding data.

Many parameter estimation algorithms applied to this problem in gene network identification literature [1]. Here, we will benchmark different regression methods for this model. In the context of this problem, the most commonly used method is least squares estimation. In the classical least squares regression theory, the errors are assumed to be confined only to response variables. However, in this model, the predictor variables are also noisy, thus, least squares estimator is not appropriate for this model (See
in equation (4) ). Total least squares is another method of fitting that is appropriate when there are errors in both independent and dependent variables [6]. Constrained total least squares (CTLS) is an additional improvement over total least squares which addresses the correlation in errors in both variable types. This method is particularly well suited to this problem. We will introduce a novel CTLS formulation for this particular problem that is capable of integrating possible time-independent correlation in gene concentrations.

The objective of CTLS method is simply minimizing the following objective function;

(5)

Where,
. The word ?constrained? in CTLS refers to the model constraint given as in equation (4).

We will benchmark the performance of our CTLS formulation against least squares, total least squares, and partial least squares methods with respect to different level of noises, problem, correlation structure and data size through in-silico examples. Our CTLS formulation is also compared to CTLS application of Kim et al [8]. We demonstrated a significant improvement over their method. Furthermore, we will incorporate appropriate constraints in our problem formulation to address sparseness of networks and evaluate its performance.

REFERENCES

[1] Driscoll, M. E., Gardner, T.S, Identification and control of gene networks in living organisms via supervised and unsupervised learning, Journal of Process Control 16 (2006) 303-311.

[2] Sun, N., Carroll, R.J, Zhao, H., Bayesian Error Analysis model for Reconstructing transcriptional regulatory networks, PNAS 103 (21) (2006), 7988-7993.

[3] Ernst, J., Vainass, O., Harbison, C. T., Simon,
I., Bar-Joseph, Z., Reconstructing dynamic Regulatory maps. Molecular Systems Biology 3 (74) (2007), 1-13.

[4] Ideker, T., Thorsson, V., Siegel, A.F., and Hood, L.E. Testting for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray data. Journal of Computational Biology 2 (2005), 65-88.

[5] Gardner, T.S., Faith, J. J., Reverse-engineering transcription control networks. Physics of Life Reviews 2 (2005), 65-88.

[6] Bansal, M., Giusy, D.G., Bernado, D., Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 22 (2006), 815-822

[7] Huffel, S. V. (1991). The total least squares problem: computational aspects and analysis, Society for Industrial and Applied Mathematics, Philadelphia.

[8] Kim, J., Bates, D. G., Postlethwaite, I., Harrison, P., and Cho, K. (2007). BMC Bionformatics, 8, 8.

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 mRNA Technology Conference

5th Engineering Cosmetics and Consumer Products Conference

Upcoming Conferences & Events

2024 DIERS Virtual Spring Meeting

2024 Pacific Northwest Student Regional Conference

2024 Western Student Regional Conference

CCPS Middle East Regional Meeting

Hydrogen Fueling Station Safety

Streamlining Permit-to-Work Processes With a Digital Solution

6th Middle East Process Engineering Conference and Exhibition

Quantum Computing and Artificial Intelligence Applications Workshop

2024 Offshore Technology Conference

CEP: April 2024

CEP: March 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(698a) A Novel Constrained Total Least Squares Formulation for the Identification of Gene Networks From Highly Noisy and Correlated Measurements

AIChE Annual Meeting

2010

2010 Annual Meeting

Computing and Systems Technology Division

Modeling and Identification

Thursday, November 11, 2010 - 3:15pm to 3:35pm

Authors

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams

Code of Conduct

Beware of Hotel and Attendee-list Scams