Welcome to Granatum! This is a graphical single-cell RNA-seq (scRNA-seq) analysis pipeline for genomics scientists. The pipeline will graphically guide you through the analysis of scRNA-seq data, starting from expression and metadata tables. It uses a comprehensive set of modules for quality control / normalization, clustering, differential gene expression / enrichment analysis, protein network interaction visualization, and cell pseudo-time pathway construction.

Please cite: Zhu, Xun et al. “Granatum: A Graphical Single-Cell RNA-Seq Analysis Pipeline for Genomics Scientists.” Genome Medicine 9.1 (2017)

Note 1: if the browser window (or tab) is accidentally closed, you may resume from where you left off by opening the last page in your broswer history.

Note 2: depending on dataset size, some steps may take time. Please allow computations to complete even if your browser appears to hang.

Visitor Map

Background

Please cite: Zhu, Xun et al. “Granatum: A Graphical Single-Cell RNA-Seq Analysis Pipeline for Genomics Scientists.” Genome Medicine 9.1 (2017)

Video tutorial: link to the video

Survey (suggestions are welcome!): link to the survey

Manuscript: link to the manuscript

Manual: download PDF

License: download text

DIY

To run the server on your own computer, download it from this link:

Download server file

To use the file, have VirtualBox installed:

Download VirtualBox

After starting VirtualBox, click "File" -> "Import Appliance...", provide the file, and perform the import.

Then launch Granatum, wait for it to load, and point your web browser to the following address:

http://localhost:8028/

A video of this can be viewed as well:

View video on YouTube

Thank you! If there are any questions please contact us: lana.garmire.group@gmail.com

Upload


You can upload your own data or try Granatum on our sample data.

Is your data Human or Mouse? Make a selection under "Species". Then provide your Expression and Metadata tables as comma separated value (CSV) files.


Before uploading your data, please refer to our format specification.

Example human data (Kim, et al. 2016):



If you would like to add more datasets, click Add another dataset on the next page.


Summary of datasets uploaded

Last dataset uploaded

Batch-effect removal

Remove confounding effects from data generated in batches. Box plots give expression statistics for a random sampling of up to 96 cells. Select a batch grouping label (factor) then click "Remove batch effect". If multiple datasets were separately uploaded, the "dataset" factor can be used.



Outlier removal

Remove unusual cells, e.g., those damaged by capture. Select cells by clicking points in the plot and/or using "Auto-identify", then click "Remove selected".





Selected cells:

Normalization

Adjust expression levels to correct for artificial differences between cells, e.g., differences in sequencing depth. When a rescaling/normalization button is clicked, the box plot (showing expression statistics for up to 96 randomly selected cells) will reflect changes. For example, clicking "Rescaling to geometric mean" will cause red dots (geometric means) to align. Note that clicking more than one rescaling/normalization button will apply adjustments on already adjusted values (use "Reset" to go back to unadjusted data).


Imputation

The large number of drop-outs might pose potential problems for downstream analyses. It is thus often appropriate to try to infer whether a zero is in the dataset is a drop-out -- that is, a non-zero expression level incorrectly assayed as zero. And if it is a drop-out, to infer its original expression level.



Gene filtering

Remove genes having very low expression and/or those with little variation (dispersion) by moving the sliders. It is recommended to keep at least 2,000 genes.



Starting number of genes:
Post-filtering number of genes:

Clustering

Select a clustering method and enter a number of clusters (or check the box for auto selection), then click "Run clustering".




Differential expression

Identify differentially expressed genes between clusters. The number of cores can be set to 2 and will run for approximately 30 minutes on the Kim, et al. 2016 dataset (116 cells, 3,788 genes, 3 clusters), when using a VirtualBox Appliance having 8 GB RAM and an Intel I7 processor. Note: the progress bar will not accurately reflect progress, please give the calculations time to complete.

Once complete, the enrichment of differentially expressed genes in KEGG pathways and GO terms can be calculated.





Tabs indicate cluster numbers. Genes are sorted by absolute Z-score.

Protein network

Proteins from top differentially expressed genes are visualized with connecting lines indicating documented biochemical interactions. Go to the next step by clicking "Proceed" (bottom right of page).



Pseudo-time construction

Cells are ordered in pseudo-time using similarities between their expression profiles.