CLC Genomics Workbench

Digital Gene Expression

CLC bio offers Digital Gene Expression analyses as an integrated part of CLC Genomics Workbench

Based on inputs from existing and potential customers, we have chosen to implement mRNA seq as the first method for supporting Digital Gene Expression.

The screen shot shows a table view of an expression sample generated from an sequence file of NGS mRNA reads. The table gives gene expression level values (read per kilo base of exon model: RPKM), along with statistics to the read counts, exons and transcripts. For each gene the assembly result may be opened, allowing examination of reads for that gene. (beta version)
Figure 1 (click to enlarge):
A table view of an expression sample generated from an sequence file of NGS mRNA reads.
The implemented mRNA seq method uses the statistics described by Mortazavi A, et.al, "Mapping and quantifying mammalian transcriptomes by RNA-Seq", Nat Methods. 2008 Jul;5(7):585-7.

One of the advantages with this model is that the statistics is based on RPKM (Reads Per Kilobase exon Model per million mapped reads), which is good and easy way for normalizing values for the expression level of a gene when using Digital Gene Expression.

An mRNA seq workflow in CLC Genomics Workbench could be this.

  1. All reads are mapped to all known genes in the chromosomes. In the first round all uniquely mapping reads to the genes are counted, and all non unique matches are distributed per ratio to the genes.
  2. For the RPKM, only reads mapping on exon sequences are of interest. The RPMK value is normalized for total exon-length and the total number of matches in an experiment, making it possible to compare different experiments.
  3. Using the general Gene Expression functionality in CLC Genomics Workbench, this mRNA seq experiment can be compared to microarray data to validate the mRNA seq experiments. This functionality is also available in CLC Main Workbench.

Discovery of novel transcript variants

In addition to the above, the mRNA seq functionality can be used for discovering putative exons (intronic regions containing a relative high number of matches). This could indicate that this region is present in a novel transcript variant. Using existing primer design tools in the Workbench, primers can be designed to verify putative exons transcripts.

Integrated annotation functionality

The user can make use of the GTF/GFF file annotation plugin, to annotate the genes on the reference sequences, this provides a flexible way of creating reference sequences using data from various sources (like UCSC genome browser or Ensembl).

There are several places in the user manual where you can read all the details about Digital Gene Expression e.g. here and here