2. Using Datasets¶
Selecting or searching for datasets in R2
- Working with datasets.
- R2 allows you to perform all kinds of analyses based on a well annotated single dataset or a selection of datasets at the same time. Different analyses are available based on the selection of one of these options in field 1.
- R2 contains mRNA gene expression profiles for more than 70.000 individual human samples. The samples are grouped in so called datasets. Each dataset has its own characteristics such tissue type, tumor type or from cell-line experiments.
- The Tumor Neuroblastoma public - Versteeg - 88 - MAS5.0 - u133p2 dataset will be used as an example dataset to guide you through most of the tutorial. Later on working with multiple datasets will be discussed.
2.2. Step 1: Selecting a dataset¶
In R2 a large amount of datasets are available for analysis and visualization. The numbered items in the main window will guide you through all the steps necessary to perform a task. In field 1 select “single dataset”, in field 2 choose “change”
A pull-down menu appears containing a large collection of datasets available for all types of analyses R2 is offering
Click on the desired dataset.
Did you know that datasets have an informative naming?
Datasets have a structured naming in R2, using the following rules: type_of_dataset - author number_of_samples - normalization - chiptype. Datasets are listed alphabetically
2.3. Step 2: Advanced selection of datasets¶
Next to the pull down menu you can also choose for the “advanced dataset selection” tool. The advanced dataset selection facilitates searching through datasets using keywords and other filter options such as the minimal size of a dataset , the date a certain dataset was published etc. An example search would be finding all colon samples which are part of a mixed dataset consisting of normal tissue and tumor samples.
Click on the “Advanced” link. A new screen shows a table where the headers can be filled with search entries to fine tune your search for a dataset meeting your search criteria. Enter ‘Neuro’ in the class column and’ 50’ in the ‘#’ column and select ‘ greater than’ from the pull down menu. This returns all the datasets containing the search term ‘Neuro’ and having more than 50 samples.
Clicking on ‘Neuroblastoma’ in the class 3 column containing 88 samples reveals a detailed info box containing additional dataset information from the R2 database. When the dataset is publicly available clicking on the GEO ID link redirects to the GEO repository database where RAW data files are available. A Pubmed link is listed in case the dataset is linked to a publication listed in PubMed. Note: Clicking on an exclamation mark also shows detailed dataset information.
Select “Across Datasets” in field 1. Note that in field 2 different options become available compared to the “single dataset” option.
Analysis methods following selecting the “Across Datasets” option in field 1 will be discussed in tutorial “Working with multiple datasets”.
Did you know that clicking on an exclamation balloon provides additional info?
Clicking on the GEO ID link redirects to the GEO repository databasewhere RAW data files are available. A Pubmed link is listed in case the dataset is linked to a publication listed in PubMed.
2.4. Step 3: Data Scopes¶
- R2 can also be forced to only display a sub selection of all the datasets that are available. These are called data scopes and can be selected from within R2 by the left hand menu items ‘change data scope’. From here you can use one of the preset scopes. This is also the place where you can remove a scope that has been set. One obvious reason why scopes can be handy, is the focussed view on the available data.
- Data scopes can be used directly from the internet address line, which can be handy when a referral needs to be made to R2 from a manuscript. For now, you do need to provide a link directly to the server (usually hgserver1.amc.nl/cgi-bin/r2/main.cgi?&dscope=NRBL).
Did you know that the R2-support team is scanning public repositories for interesting datasets to expand the R2-database on a regular basis
In case you want to see a dataset added to R2 please send an email to firstname.lastname@example.org Such an email should contain a link to the publicly accessible files, such as a Gene Expression Omnibus number (GSE*****). Your own private datasets can also be added to R2 with user/group restricted access. Please send us an email at email@example.com and inquire on the procedure to get your data available in R2 (see also chapter 22).
2.5. Final remarks / future directions¶
Everything described in ths chapter can be performed in the R2: genomics analysis and visualization platform (http://r2platform.com / http://r2.amc.nl)
If you run into any quirks or annoyances don’t hesitate to contact r2 support (firstname.lastname@example.org).
We hope that this tutorial has been helpful,The R2 support team.