CNH+ and associated tumor purity, tumor ploidy for samples from TCGA study are found
Source:R/analyze_TCGA_study.R
analyze_TCGA_study.Rd
analyze_TCGA_study
takes data from a TCGA study and for each sample it finds
CNH+ and associated tumor purity, tumor ploidy. Option to find both CNH+ and CNH is provided.
Arguments
- study_name
TCGA study name
- da
data frame containing Sample, Chromosome, Start, End, Num_Probes, Segment_Mean
- grid
grid of purities, ploidies over which to search for CNH+ (matrix)
- k
how many candidate solution to return
- both
whether to find CNH+ as well as CNH (default is F, i.e., CNH+ only)
Details
Example shows how to 1) download RCN profiles for samples from a TCGA study; 2) read in a Supplementary file from van Dijk et al. which contains sample names of the TCGA tumor samples that were analyzed by the authors, as well as other data (CNH, survival data); 3) select from the downloaded TCGA RCN data only those samples that were analyzed by van Dijk et al.; 4) make grid of purities, ploidies over which the CNH+ solution will be searched for; 5) analyze the TCGA study - for each samples find CNH+ and associated purity, ploidy pair 6) compare survival curves of subjects with CNH+ below/above median and make KM plot
References
van Dijk E, van den Bosch T, Lenos KJ, El Makrini K, Nijman LE, van Essen HF, Lansu N, Boekhout M, Hageman JH, Fitzgerald RC, others (2021). “Chromosomal copy number heterogeneity predicts survival rates across cancers.” Nature communications, 12(1), 1--12.
Examples
if (FALSE) {
# Ex:
library('TCGAbiolinks')
library('stringi')
library('openxlsx')
library('dplyr')
#
# TCGA study
study_name = 'READ'
#
query = TCGAbiolinks::GDCquery(legacy = T,
project = paste0('TCGA-', study_name),
data.category = "Copy number variation",
file.type = 'hg19.seg',
platform = "Affymetrix SNP Array 6.0",
sample.type = 'Primary Tumor')
# download segmented_scna_hg19 data
TCGAbiolinks::GDCdownload(query = query,
method = "api", files.per.chunk = 10) # client')
data = TCGAbiolinks::GDCprepare(query = query)
da = as.data.frame(data)
#
# read in the Supplement to van Dijk et al. with results for tumor samples from TCGA studies
vD = openxlsx::read.xlsx("https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-021-23384-6/MediaObjects/41467_2021_23384_MOESM4_ESM.xlsx")
#
# filter samples from the study
vDx = vD %>% filter(Type == study_name)
#
# match vD and TCGA
im = match(vDx$Samplename, unique(da$Sample))
#
# data frame with RCN for the study samples which were considered by van Dijk et al.
da_vD = da %>% filter(Sample %in% unique(da$Sample)[im])
#
# grid
grid = make_grid(purity = seq(0.2, 1, 0.01), ploidy = seq(1.5, 5, 0.01))
#
# analyze TCGA study
oo = analyze_TCGA_study(study_name, da_vD, grid, k=2)
res = read.csv(paste0(study_name, '_results.csv'))
# match samples from results file and the survival data in vDx
im_vDx_res = match(vDx$Samplename, res$sample)
#
# survival analysis (below/above median CNH+)
gg_cnhplus = plot_survival(study_name,
vDx$OS, vDx$OS_event,
res$cnh_plus[im_vDx_res],
type = 'unfiltered CNH+', ylim = c(0, 1))
gg_cnhplus
}