Data management (low density microarrays)

Raw data: Initially two fluorescence intensity values for each gene are obtained from two replica spots printed on the microarray (example). The deviation of the two replica spot measurements should be very small. Otherwise variation of the two spot measurements indicates technical problems and demands visual inspection of spot intensity and quality.

Raw data aquisition: Each microarray is scanned six times at increasing photomultiplier (PMT) settings to account for the differences in fluorescence signals that are obtained by fluorophore-cRNA/oligonucleotide probe hybridization. Specifically, abundant mRNAs give strong signals at low scan intensity but saturated non-linear signals at higher scan intensity, while low abundant mRNAs may only be detected at maximal scan intensity. This generates six image files (the original scans) that are stored as .TIFF files. Six times two (for the replica spot) numerical data are obtained for each gene and are integrated into a single intensity value (called I value) for each gene by the MAVI software which was developed at MWG Biotech (for an example see here: pdf, 451kB).

Normalization: Since the number of genes on the Inflammation array is small and most of them are highly regulated, `classical` normalization between two samples using the total fluorescence intensity derived from all gene expression measurements is not possible.

We use a number of house keeping genes (view list) that are detected over a range of different signal intensities to calculate an average intensity for the house keeping genes for each individual array (example).

The average house keeping gene intensity is then used to calculate a relative signal intensity for each inflammatory gene called IPC value (example). This relative intensity value enables comparisons across different experiments.To obtain the IPC value, first the logarithmic mean intensity of all house keeping genes is calculated and then single intensity values of genes are expressed in percent of the calculated mean divided by 100.

Lower detection level: We use arabidopsis oligo nucleotide probes to determine background hybridization (example). We also use the average signal for any inflammatory gene across a large number of different samples. Signal values that lie at least two SD higher than this average are an indicator of relevant (basal) expression of this gene (example).

Ratio: We routinely label all cRNAs with Cy3 and hybridize each sample on a single microarray. Samples derived from the same biological experiment form a group (e.g. control, treatment 1, treatment 2,….). We name the control always (B) and any treatment (S1) to (Sn). Dividing the gene expression value (I or IPC) from a particular sample of one group by values derived from another sample of that group results in a measurement for the relative gene expression, which is called ratio of gene expression. For standard ratio comparisons (e.g. (S1)/(B), (S2)/(B) etc.) we have created several Excel macros (example).

Data derived from pairwise comparisons can be depicted as bar graphs using a sigma plot macro. The sigma plot graph also contains the intensity values for each gene under investigation. To easily navigate through the data genes are ordered into functional groups (example).

CytoBASE: Registered users can view and analyze all their experiments as well as a set of  `publically accessible` experiments using CytoBASE.

Customized analysis: Very often data need to be extracted and presented for presentations or publications (e.g. as a table or Sigma Plot graph). We have used a number of different options, e.g. formatted tables and bar graphs to fulfill these needs. According to the demands of the microarray projects we offer any help derived from our own experience to further analyze data in more detail.

Result files

All users obtain a "standard result file" which is sent via email and can also be retrieved from CytoBASE. For an example see here. BASE always contains the most recent version of a results file.


Data management (high density microarrays)

DNA oligonucleotide microarray platform and raw data acquisition and storage
The microarray laboratory is equipped with an Agilent scanner G2565C with a maximum resolution of 2µm which allows automated scanning of up to 48 slides using arrays carrying up to 1 Million oligonucleotide probes per slide. Reverse transcription-, amplification-, fluorophore labeling-, cRNA fragmentation- and hybridization-procedures are all performed using Agilent chemistry and reagents and highly standardized work flows. However, the lab has also experience with a number of other cDNA synthesis-, amplification-, and labeling protocols. In order to maintain a high level of flexibility and to ensure highest reliability of output data, many quality control (QC) routines have been introduced in the course of sample processing (e.g. assessment of absolute and relative quality of Input-RNA samples; comparison of yields, fluorescence incorporation rates and fragment lengths of labeled cRNA samples). This comprehensive monitoring provides the opportunity to selectively repeat single reaction steps of “outlier samples” within experimental series prior to the final (most expensive) microarray hybridization step. After microarray scanning, resulting Tiff-Images are subjected to raw data extraction procedures using Feature Extraction (FE) software (V10.7), largely utilizing recommended default protocols. All relevant information pertaining to the current study (e.g. regarding processed sample characteristics, utilized array batches, adverse effects, ...) are routinely documented in standardized excel formats and will finally be attached to CytoBASE along with the microarray results.  

DNA microarray data analysis

Microarray data analysis can be divided into three major steps:

Step1: Quality control and data transformation procedures

This first part of analysis pertains to the overall technical quality of the microarray data. Local impairments in hybridization performance are queried, documented and eventually marked by manual flagging of affected regions. QC reports, generated by use of FE software algorithms, capturing many different QC-relevant parameters (e.g. “behavior” of exogenous spike-in transcripts, number and local distribution of different outlier spots, overall signal intensity distribution, …) were thoroughly inspected for every microarray hybridization. In case that a particular microarray data set turns out to be of too low quality, an immediate repetition of the respective hybridization will be initiated. Furthermore, it is checked now, if biological positive- or negative controls behave as expected and if consistently altered signal intensity levels of genes, overexpressed, deleted or knocked-down, can be used to univocally proof correct sample identity and a faultless performance of the underlying biological experiment in retrograde. Next, optimal data normalization and transformation procedures are established in a study-specific manner. For single-color mRNA expression microarrays a linear scaling approach (based on the 75th percentiles of each array´s intensity distribution) combined with the introduction of appropriate surrogate values for unreliably low intensity measurements is used by default. However, different data transformation strategies could become necessary in particular cases. Finally, principal component analysis (PCA) is performed to identify “outlier data sets” and to relate the degree of intra-class variability to the extent of inter-class variability (an important criterion to judge if the assessed number of replicates per class is sufficient for “in depth analyses”).

Step 2: Standardizable and initial biological data analysis

After FE-mediated data extraction, followed by QC and data transformation procedures, microarray data are further processed by use of excel macros and R-Scripts. These tools have been developed in the microarray lab to specifically pre-process Agilent microarray data. They are required i) to convert processed raw data into neatly arranged excel forms, ii) to reduce complexicity of data by selecting only the most informative part of data initially acquired per gene, iii) to incorporate adequate sorting keys and supplemental information to facilitate navigation through the complex data, and finally iv) to introduce meaningful ratios of relative gene expression in case of single-color microarray experiments. In summary, data are routinely converted into an excel format (hereafter referred to as “Standardized Data Extract” file or “SDE”), that enables even less-experienced scientists to get a first impression of the most prominent gene expression changes and the overall data reliability within the experimental system under investigation. In addition to the SDE, supplemental excel tools and data visualization formats are generated and will be further improved in the course of this project. Accordingly, explanatory files and manuals will be provided. Along with standardized data files, a report summarizing the most important aspects of the microarray data, analyzed so far, will be routinely generated and provided. All of these files will be deposited in CytoBASE. 

Step 3: Individualized and advanced biological data analyses

According to the outcome of the Microarray analysis part 2 (see above), the study will then be open for individualized “in-depth-analyses”. Due to the diversity of questions to be addressed , or analysis steps to be considered at this stage, it is no longer possible from here on to apply generalized analysis routes. In fact, a huge amount of additional data analysis or visualization opportunities are possible now. Examples are the application of sophisticated filtering strategies, clustering approaches, gene ontology analyses, gene set enrichment analyses, significance tests, or, otherwise, the generation of heatmaps, figures, tables or text sections for publications. Furthermore, different kinds of pathway analyses (e.g. the superimposition of data on pre-determined pathway maps) could be highly instructive regarding the progression of an ongoing study.