MicroArray Gene Set Practical
In this practical we will re-analyze microarray data from a study on mouse embryoid bodies.
Summary of Experiment as Described in Gene Expression Omnibus:
|
| This experiment was specifically designed to measure neural targets of Shh signaling, we sought to profile the genes upregulated by Hh signaling in the ventral neural tube to obtain a valid dataset. To obtain ventral-specific markers, we generated retinoic acid-treated EBs grown in the presence or absence of HH-Ag. We did not observe induction of ventral Hh markers in RA-treated constitutive Gli1FLAG EBs and used these for the control, baseline set. The presence of FoxA2, Nkx2.9 and Nkx6.1 amongst the top 10 genes based on expression levels suggests that profiling significantly enriches for Hh-dependent cell types. As expected, the benchmark standard Gli1 was not up-regulated in our array, since it is constitutively expressed in the control as well. Keywords: neural progenitors, embryoid bodies, differentiation, Hedgehog, retinoic acid |
The full paper can be found here, PMID:
17442700
The array data was deposited in Gene Expression Omnibus (
http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE4936)
The array data was downloaded and processes with updated probe annotation (using the customCDF annotations) towards
RefSeq transcripts.
Comparing the mean expression level in control embryoid bodies and sonic hedgehog induced embryoid bodies,
we obtain the mean expression difference between the conditions. Since you were specifically interested in the most up-regulated
genes you decide to analyze the top 300 genes (attached as top300upregulated.txt).
To better summarize the set of differentially expressed genes you decide to explore the functional enrichment of gene sets within this list.
1. Go into
DAVID and search for enriched gene sets. You will need to upload the file with the list of refseqs (top300upregulated.txt). Then you will specify that the identifiers present in the list are REFSEQ_MRNA. Finally, you specify that this is a gene list (and not the background list) that you are about to upload.
- how many significant gene sets and clusters of gene sets do you find?
- how is DAVID correcting for the multiple testings performed?
- change the types of gene sets used in the analysis
2. Covert the Refseq transcript ids to Official Gene Symbols using the DAVID conversion tool:
http://david.abcc.ncifcrf.gov/conversion.jsp
You downloaded the results and cleaned it up in excel to only keep a list of gene symbols (attached, top300upregulated.converted.genesymbols.txt).
3. (Optional) Using the list of gene symbols, use
GOminer to find enriched categories, here you will have to provide an email address since the results will be emailed back.
In Gominer you should also use the attached background gene names file (total.txt).
- Are the results similar to DAVID?
- Enable the TF binding sites analysis (
Step 12: TF Binding )
- Which site do you prefer?
--
RickardSandberg - 16 May 2008