SurvExpress Tutorial

SurvExpress: An online biomarker validation tool for cancer gene expression data using survival analysis.


Validation of multi-gene biomarkers for clinical outcomes is one of the most important issues for cancer prognosis. An important source of information for virtual validation is the high number of available cancer datasets. Nevertheless, assessing the prognostic performance of a gene expression signature along datasets is a difficult task for Biologists and Physicians and also time-consuming for Statisticians and Bioinformaticians. Therefore, to facilitate performance comparisons and validations of survival biomarkers for cancer outcomes, we developed SurvExpress, a cancer-wide gene expression database with clinical outcomes and a web-based tool that provides survival analysis and risk assessment of cancer datasets. The main input of SurvExpress is only the biomarker gene list. We generated a cancer database collecting more than 20,000 samples and 130 datasets with censored clinical information covering tumors over 20 tissues. We implemented a web interface to perform biomarker validation and comparisons in this database, where a multivariate survival analysis can be accomplished in about one minute. We show the utility and simplicity of SurvExpress in two biomarker applications for breast and lung cancer. Compared to other tools, SurvExpress is the largest, most versatile, and quickest free tool available. SurvExpress web can be accessed in​xpress (a tutorial is included). The website was implemented in JSP, JavaScript, MySQL, and R.

Citation: Aguirre-Gamboa R, Gomez-Rueda H, Martínez-Ledesma E, Martínez-Torteya A, Chacolla-Huaringa R, et al. (2013) SurvExpress: An Online Biomarker Validation Tool and Database for Cancer Gene Expression Data Using Survival Analysis. PLoS ONE 8(9): e74250. doi:10.1371/journal.pone.0074250

SurvExpress Web Tool & Database

SurvExpress Tutorial
(See attachment section below)

- January 29 2015
* Factors/Strata in "Other Factors" can start with "@" specifying a strata transformation but not model inclusion.
@T.STAGE/I:I,IA,IB/II:II,IIA,IIB,IIC/III:III,3A,3B,4A,4B: This specify that strata partitions (kaplan-meier plots) will be partitioned in I,II and III with the specified factors.

* When stratification is specified, the re-estimated betas are included in the text/table output.

* Bug fix when stratification was specified in which some samples were not considered per strata.

- January 14 2015
"Other Factors" option has been included. This help users to add others factors like tumor stages to the Cox model. Te available list of variables (factors) that can be used are listed in the "Clinical characteristics plot" (see figure 15 in the tutorial).
This field has a special format. Different factors are separated by "+". Each factor can be conveniently re-converted to other values separated by "/", names separated by ":", and values by ",". Numeric values can also be converted using relational and even complex expression.
Examples (only the text between " characters are needed):
(1) "N.STAGE" : Specifies to add the N.STAGE as a variable of the model. No conversion is performed.
(2) "N.STAGE+T.STAGE" : Specifies to add both the N.STAGE and the T.STAGE variables. No conversion is performed in both.
(3) "T.STAGE/I:I,IA,IB/II:II,IIA,IIB,IIC/III:III,3A,3B,4A,4B": Specifies that T.STAGE will be converted to I, II, III depending on the corresponding list of values.
(4) "N.STAGE/NO:==0/YES:==1+T.STAGE" : Specifies to add both the N.STAGE and the T.STAGE variables but using conversion. In this example N.STAGE is assumed to be numeric and will be converted to a factor of NO and YES values. The NO will be set for N.STAGE=0 and the YES for N.STAGE=1. NA values are kept as NA.
(5) "N.STAGE/NO:==0|" : Similar to (3) but additionally, NA values will be included in the NO.
(6) "AGE/KID:<12/TEEN:<=18/ADULT:>18": Specifies that AGE (numeric) will be converted to KID,TEEN, or ADULT values.

"Gene expression by risk group" (Boxplots) has been improved to accomodate p-values labels in top and genes in bottom.
- Previous to 2015:
* A "Network" option has been included.
* Many updates to databases: At Jan 2015: Tissues = 20, Datasets = 180, Samples = 25403

SurvExpress Tutorial final-4.pdf4.22 MB