Normalization

Affymetrix Power Tools or here

There are two kind of Affymetrix expression arrays:

See all Affymetrix expression arrays.

3' Gene Expression Arrays can be normalized only at gene level while Whole-Transcript Expression Arrays can be normalized at exon or gene level (the arrays called Gene Level can indeed be normalized at exon level)

In general Babelomics has to discover the type of chip the user is sending and to be able to classify it as 3' or Whole-Transcript. To do this we need a list of all array names Babelomics accepts and their classification.

Several "Library Files" are needed to normalize each kind of chip using APT. This is the complete list of Library Files provided by Affymetrix. EN GENERAL TENDREMOS QUE BAJARLOS CASI TODOS.

In general we will need... The Analysis pipeline will be:
  1. Convert CEL files to a text format.
    1. From the text formatted files extract the array-type and check that all arrays in a data set are from the same type.
    2. Check that the type of chip is among those that Babelomics can deal with.
    3. Classify the chip as 3' o Whole-Transcript in order to set the kind of analysis that can be done (in the future Babelomics will offer the possibility of normalizing Whole-Transcript arrays at exon level)
    4. Extract the array-dimension from the text formatted files and check that the array information is complete; ie. the number of lines in the array is the expected one.
  2. Generate a matrix of raw data to be used afterwards (is given at a probe level)
  3. Normalize the data (including present-absent calls if required).The user will be able to choose the options:

See APT complete documentation for more details.

The following sections explain how to carry on each of this three steps using APT.

Affymetrix Power Tools (APT) for CEL file conversion to a text file

The same code is used for all kind of arrays, either 3' or WT

Indicating a directory where to find the CEL files

apt-cel-convert -f text \
                -o data_processed/txt_converted_cel_files \
                data_raw/expression/*.CEL

Indicating a text file with the paths to the CEL files

apt-cel-convert -f text \
                -o data_processed/txt_converted_cel_files \
                --cel-files cell_paths_file.txt

Note:
--cel-files: file specifying cel files to process, one per line with the first line being 'cel_files'.

Affymetrix Power Tools (APT) for raw data extraction

3' arrays

apt-cel-extract -o data_processed/raw_intensities_informed.txt \
                -d data_raw/annotation/HG-U133A_2.cdf \
                data_raw/expression/*.CEL

Note
--cel-files option is also available.

Whole-Transcript arrays

apt-cel-extract -o data_processed/raw_intensities_informed.txt \
                -c data_raw/annotation/MoGene-1_0-st-v1.r4.analysis-lib-files/MoGene-1_0-st-v1.r4.clf \
                -p data_raw/annotation/MoGene-1_0-st-v1.r4.analysis-lib-files/MoGene-1_0-st-v1.r4.pgf \
                -b data_raw/annotation/MoGene-1_0-st-v1.r4.analysis-lib-files/MoGene-1_0-st-v1.r4.bgp \
                data_raw/expression/*.CEL

Note
--cel-files option is also available.

Affymetrix Power Tools (APT) for Gene Expression Level normalization

3' arrays

apt-probeset-summarize -o data_processed/data_normalized/apt/ \
                       -d data_raw/annotation/HG-U133A_2.cdf \
                       -a pm-mm,mas5-detect.calls=1.pairs=1 \
                       -a rma \
                       -a rma-sketch \
                       -a plier-mm \
                       -a plier-mm-sketch \
                       data_raw/expression/*.CEL

Note
--cel-files option is also available.

Whole-Transcript arrays

apt-probeset-summarize -o data_processed/data_normalized/exon_level \
                       -c data_raw/annotation/MoGene-1_0-st-v1.r4.analysis-lib-files/MoGene-1_0-st-v1.r4.clf \
                       -p data_raw/annotation/MoGene-1_0-st-v1.r4.analysis-lib-files/MoGene-1_0-st-v1.r4.pgf \
                       -b data_raw/annotation/MoGene-1_0-st-v1.r4.analysis-lib-files/MoGene-1_0-st-v1.r4.bgp \
           --qc-probesets data_raw/annotation/MoGene-1_0-st-v1.r4.analysis-lib-files/MoGene-1_0-st-v1.r4.qcc \
                       -a dabg \
                       -a rma \
                       -a rma-sketch \
                       -a plier-gcbg \
                       -a plier-gcbg-sketch \
                       data_raw/expression/*.CEL

Note
--cel-files option is also available.

Affymetrix Power Tools (APT) for exon expression... that is a different story.