9. Pathway Finder¶
Which known pathways play a role in your data?
- In molecular biology the concept of pathways is important; small molecules, proteins, genes, etc. interact, resulting in specific phenotypic outcomes at all levels in biology.
- Quite a lot of this knowledge is stored as pathways in databases. An extensive resource can be found here http://www.pathguide.org/ . To name a few:
- R2 allows you to see whether biological pathways might play a role in your dataset of choice.
- In this tutorial we’ll use array data of a set of 62 Medulloblastoma tumors. In some Medulloblastoma tumors the gene beta-catenin is mutated. This specific dataset has clinical annotation for beta catenin mutations. We’re going to investigate this in a pathway context.
9.2. Step 1: Selecting data¶
Make sure that the Single Dataset option is selected in field 1 of the step by step guide.
In field 2 locate and select the ‘Tumor Medulloblastoma PLoS One- Kool - 62 MAS5.0 -u133p2’ dataset by clicking ‘Change Dataset’
In field 3 select ‘KEGG PathwayFinder by Gene correlation’
You might not know the exact gene symbol for beta catenin. R2 can find the gene symbol by alternative name also, we’ll try ‘catenin’; Figure 1.
9.3. Step 2: Choose the right gene¶
R2 has found several suggestions with the word catenin, hovering over the gene symbols gives additional information. Based on that information choose CTNNB1, take the probeset with the highest average expression, this is most likely the probeset that best represents mRNA concentration.
Scroll down, leave the other options as they are, and click ‘Submit’.
9.4. Step 3: Correlating pathways with a gene¶
R2 calculates for all genes in the KEGG pathways whether their expression correlates with that of CTNNB1. Next it calculates for all pathways whether they contain a significant number of correlating genes; if the genes correlating with CTNNB1 are overrepresented in that pathway (For an in depth discussion see R2 Tutorial; Find genes correlating with your gene of interest. The result is returned as a list of pathways; Figure 3.
An overall explanation is printed above the list; of all genes present in all KEGG pathways, ~ 540 correlate with CTNNB1 with a p value < 0.05. In the table the KEGG pathways are listed ranked by their p-value for overrepresentation (background in red) or under-representation (in green) of these genes. The brightly colored letters in front of the pathway-name are hyperlinked. R links to a list of the genes, K leads to the original KEGG pathway on the Japanese servers, A links to an image of the KEGG pathway that is provided with hover-over information for all genes in the pathway. We’ll discuss the first two later, now click on the A in front of the ‘SNARE interactions in vesicular transport’-pathway.
R2 opens a new window in your browser (Figure 4). In darker green the genes that have a positive correlation with CTNNB1 and in red those having a negative correlation. Hovering over the genes with the mouse pointer presents additional information; some of the gene-boxes represent multiple genes: Figure 5 Although not in this example, it may happen that multiple genes within a box show both positive, as well a s negative correlations. In such case the box is proportionally filled with red and green.
The result however, is not quite convincing, apparently CTNNB1 expression does not correlate with pathways. We’re going to try it the other way around; which pathways correlate with a catenin mutation
Return to list view (still open in another tab of your browser) and go to the R2 main page by clicking the link in the upper left corner of the screen.
9.5. Step 4: Finding pathways relevant in groups of genes¶
In field 3 on the R2 start page select ‘KEGG PathwayFinder by Groups’;
9.6. Step 5: Relating to data annotation¶
This set of tumors is annotated with several clinical and molecular biology parameters in so called tracks. One of them is the presence of a beta catenin mutation; bcat mutation. Select this; Figure 7. Pathway_Select
9.7. Step 6: Determining differentially expressed pathways¶
R2 calculates for all genes in the KEGG pathways whether they are differentially expressed between the groups of tumors having a mutation and those that do not have one. In a subsequent calculation the overrepresentation of these genes in the individual pathways is determined. From the resulting list it is obvious that the Wnt pathway has a strong overrepresentation of genes that are differentially expressed between the two groups.
Click on the R link to let R2 create a list of these genes.
9.8. Step 7: Verifying a pathway¶
A list of hyperlinked genes is returned, sort them by descending R-value by clicking on the R-column-header twice;
Each gene-symbol is hyperlinked to a graph representing the specific results; click the top gene in the list: AXIN2.
9.9. Step 8: Correlating with the expression of a gene¶
The graph shows an excellent correlation of the expression of the Wnt pathway gene AXIN2 with tumors having a Beta Catenin mutation. The same goes for a significantly overrepresented set of genes in this pathway. This specific group of tumors is also known as the Wnt-subtype in the Medulloblastoma field.
9.10. Final remarks / future directions¶
We hope that this tutorial has been helpful, the R2 support team.