- -

Harmonization of quality metrics and power calculation in multi-omic studies

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

Harmonization of quality metrics and power calculation in multi-omic studies

Show full item record

Tarazona Campos, S.; Balzano-Nogueira, L.; Gómez-Cabrero, D.; Schmidt, A.; Imhof, A.; Hankemeier, T.; Tegnér, J.... (2020). Harmonization of quality metrics and power calculation in multi-omic studies. Nature Communications. 11(1):1-13. https://doi.org/10.1038/s41467-020-16937-8

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/162371

Files in this item

Item Metadata

Title: Harmonization of quality metrics and power calculation in multi-omic studies
Author: Tarazona Campos, Sonia Balzano-Nogueira, Leandro Gómez-Cabrero, David Schmidt, Andreas Imhof, Axel Hankemeier, Thomas Tegnér, Jesper Westerhuis, Johan A. Conesa, Ana
UPV Unit: Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat
Issued date:
Abstract:
[EN] Multi-omic studies combine measurements at different molecular levels to build comprehensive models of cellular systems. The success of a multi-omic data analysis strategy depends largely on the adoption of adequate ...[+]
Copyrigths: Reconocimiento (by)
Source:
Nature Communications. (issn: 2041-1723 )
DOI: 10.1038/s41467-020-16937-8
Publisher:
Nature Publishing Group
Publisher version: https://doi.org/10.1038/s41467-020-16937-8
Project ID:
info:eu-repo/grantAgreement/EC/FP7/306000/EU/User-driven Development of Statistical Methods for Experimental Planning, Data Gathering, and Integrative Analysis of Next Generation Sequencing, Proteomics and Metabolomics data/
info:eu-repo/grantAgreement/MINECO//BIO2012-40244/ES/DESARROLLO DE RECURSOS COMPUTACIONALES PARA LA CARACTERIZACION Y ANOTACION FUNCIONAL DE ARN NO CODIFICANTE./
info:eu-repo/grantAgreement/DFG//SFB 1064/
Thanks:
This work has been funded by FP7 STATegra project agreement 306000 and Spanish MINECO grant BIO2012-40244. In addition, work in the Imhof lab has been funded by the (DFG; CIPSM and SFB1064). The work of L.B.-N. has been ...[+]
Type: Artículo

References

Thingholm, L. B. et al. Strategies for integrated analysis of genetic, epigenetic, and gene expression variation in cancer: addressing the challenges. Front. Genet. 7, 2 (2016).

Blatti, C., Kazemian, M., Wolfe, S., Brodsky, M. & Sinha, S. Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res. 43, 3998–4012 (2015).

Fagan, A., Culhane, A. C. & Higgins, D. G. A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 7, 2162–2171 (2007). [+]
Thingholm, L. B. et al. Strategies for integrated analysis of genetic, epigenetic, and gene expression variation in cancer: addressing the challenges. Front. Genet. 7, 2 (2016).

Blatti, C., Kazemian, M., Wolfe, S., Brodsky, M. & Sinha, S. Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res. 43, 3998–4012 (2015).

Fagan, A., Culhane, A. C. & Higgins, D. G. A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 7, 2162–2171 (2007).

Conesa, A., Prats-Montalbán, J. M., Tarazona, S., Nueda, M. J. & Ferrer, A. A multiway approach to data integration in systems biology based on Tucker3 and N-PLS. Chemometrics Intell. Lab. Syst. 104, 101–111 (2010).

Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016).

Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).

Wei, Z., Zhang, W., Fang, H., Li, Y. & Wang, X. esATAC: an easy-to-use systematic pipeline for ATAC-seq data analysis. Bioinformatics 34, 2664–2665 (2018).

Sun, Z. et al. SAAP-RRBS: streamlined analysis and annotation pipeline for reduced representation bisulfite sequencing. Bioinformatics 28, 2180–2181 (2012).

Xia, J. & Wishart, D. S. Using MetaboAnalyst 3.0 for comprehensive metabolomics data analysis. Curr. Protoc. Bioinformatics 55, 14.10.1:14.10.91 (2016).

Davidson, R. L., Weber, R. J. M., Liu, H., Sharma-Oates, A. & Viant, M. R. Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data. Gigascience 5, 10 (2016).

Goeminne, L. J. E., Gevaert, K. & Clement, L. Experimental design and data-analysis in label-free quantitative LC/MS proteomics: a tutorial with MSqRob. J. Proteom. 171, 23–36 (2018).

Codrea, M. C. & Nahnsen, S. Platforms and pipelines for proteomics data analysis and management. Adv. Exp. Med Biol. 919, 203–215 (2016).

Park, Y., Figueroa, M., Rozek, L. & Sartor, M. MethylSig: a whole genome DNA methylation analysis pipeline. Bioinformatics 30, 2414–2422 (2014).

Andrews S. FASTQC. A Quality Control Tool for High Throughput Sequence Data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2014).

García-Alcalde, F. et al. Qualimap: evaluating next-generation sequencing alignment data. Bioinformatics 28, 2678–2679 (2012).

Okonechnikov, K., Conesa, A. & Garcia-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016).

Lassmann, T., Hayashizaki, Y. & Daub, C. O. SAMStat: monitoring biases in next generation sequencing data. Bioinformatics 27, 130–131 (2011).

Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).

Poplawski, A. & Binder, H. Feasibility of sample size calculation for RNA-seq studies. Brief. Bioinform. 19, 713–720 (2018).

Li, C.-I., Samuels, D. C., Zhao, Y.-Y., Shyr, Y. & Guo, Y. Power and sample size calculations for high-throughput sequencing-based experiments. Brief. Bioinform. 19, 1247–1255 (2018).

Banko, M. & Brill, E. Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics 26–33 (Association for Computational Linguistics, France, 2001).

Figueroa, R. L., Zeng-Treitler, Q., Kandula, S. & Ngo, L. H. Predicting sample size required for classification performance. BMC Med. Inf. Decis. Mak. 12, 8 (2012).

Dunn, W. B. & Ellis, D. I. Metabolomics: current analytical platforms and methodologies. TrAC Trends Anal. Chem. 24, 285–294 (2005).

Chang, C.-Y. et al. Protein significance analysis in selected reaction monitoring (SRM) measurements. Mol. Cell. Proteomics 11, M111.014662 https://doi.org/10.1074/mcp.M111.014662 (2012).

Markley, J. L. et al. The future of NMR-based metabolomics. Curr. Opin. Biotechnol. 43, 34–40 (2017).

Rocke, D. M. & Lorenzato, S. A two-component model for measurement error in analytical chemistry. Technometrics 37, 176–184 (1995).

Van Batenburg, M. F., Coulier, L., van Eeuwijk, F., Smilde, A. K. & Westerhuis, J. A. New figures of merit for comprehensive functional genomics data: the metabolomics case. Anal. Chem. 83, 3267–3274 (2011).

Dunn, W. B. et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 6, 1060–1083 (2011).

Keun, H. C. NMR-based Metabolomics P001–P368 (The Royal Society of Chemistry, 2018).

Tarazona, S., Garcia-Alcalde, F., Dopazo, J., Ferrer, A. & Conesa, A. Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213–2223 (2011).

Kim, S. et al. Evaluation and optimization of metabolome sample preparation methods for Saccharomyces cerevisiae. Anal. Chem. 85, 2169–2176 (2013).

Köcher, T., Swart, R. & Mechtler, K. Ultra-high-pressure RPLC hyphenated to an LTQ-Orbitrap Velos reveals a linear relation between peak capacity and number of identified peptides. Anal. Chem. 83, 2699–2704 (2011).

Boja, E. S. & Rodriguez, H. Mass spectrometry-based targeted quantitative proteomics: achieving sensitive and reproducible detection of proteins. Proteomics 12, 1093–1110 (2012).

Olkhov-Mitsel, E. & Bapat, B. Strategies for discovery and validation of methylated and hydroxymethylated DNA biomarkers. Cancer Med. 1, 237–260 (2012).

Armbruster, D. A. & Pry, T. Limit of blank, limit of detection and limit of quantitation. Clin. Biochem. Rev. 29, S49–S52 (2008).

Arsova, B., Zauber, H. & Schulze, W. X. Precision, proteome coverage, and dynamic range of Arabidopsis proteome profiling using (15)N metabolic labeling and label-free approaches. Mol. Cell. Proteomics 11, 619–628 (2012).

Kuhn, E. et al. Interlaboratory evaluation of automated, multiplexed peptide immunoaffinity enrichment coupled to multiple reaction monitoring mass spectrometry for quantifying proteins in plasma. Mol. Cell. Proteomics 11, M111.013854  https://doi.org/10.1074/mcp.M111.013854 (2012).

Kondrat, R. W., McClusky, G. A. & Cooks, R. G. Multiple reaction monitoring in mass spectrometry/mass spectrometry for direct analysis of complex mixtures. Anal. Chem. 50, 2017–2021 (1978).

Wishart, D. S. et al. HMDB: the human metabolome database. Nucleic Acids Res. 35, D521–D526 (2007).

Kopka, J. et al. GMD@CSB.DB: the golm metabolome database. Bioinformatics 21, 1635–1638 (2005).

Scholz, M. & Fiehn, O. SetupX–a public study design database for metabolomic projects. Pac. Symp. Biocomput. 12, 169–180 (2007).

Bell, A. W. et al. A HUPO test sample study reveals common problems in mass spectrometry-based proteomics. Nat. Methods 6, 423–430 (2009).

Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).

Roberts, A., Feng, H. & Pachter, L. Fragment assignment in the cloud with eXpress-D. BMC Bioinformatics14, 358 (2013).

Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

Gomez-Cabrero, D. et al. STATegra, a comprehensive multi-omics dataset of B-cell differentiation in mouse. Sci. Data 6, 256 (2019).

Verhaak, R. G. W. et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17, 98–110 (2010).

Altmäe, S. et al. Guidelines for the design, analysis and interpretation of ‘omics’ data: focus on human endometrium. Hum. Reprod. Update 20, 12–28 (2014).

Reo, N. V. NMR-based Metabolomics. Drug Chem. Toxicol. 25, 375–382 (2002).

Li, C.-I., Su, P.-F. & Shyr, Y. Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data. BMC Bioinformatics 14, 357–357 (2013).

Storey, J. D. A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B 64, 479–498 (2002).

Jung, S.-H. Sample size for FDR-control in microarray data analysis. Bioinformatics 21, 3097–3104 (2005).

Storey, J. D. & Tibshirani, R. Estimating the positive false discovery rate under dependence, with applications to DNA microarrays. Stanford Stat. Rep. 28 (2001).

Zhao, S., Li, C.-I., Guo, Y., Sheng, Q. & Shyr, Y. RnaSeqSampleSize: real data based sample size estimation for RNA sequencing. BMC Bioinformatics 19, 191 (2018).

Cohen, J. Statistical Power Analysis for the Behavioral Sciences (L. Erlbaum Associates, 1988).

Sawilowsky, S. New effect size rules of thumb. J. Mod. Appl. Stat. Methods 8, 597–599 (2009).

Liu, P. & Hwang, J. T. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23, 739–746 (2007).

Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752–e1005752 (2017).

James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning Vol. 112 (Springer, 2013).

Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

Meyer, M. C. Inference using shape-restricted regression splines. Ann. Appl. Stat. 2, 1013–1033 (2008).

Ramsay, J. O. Monotone regression splines in action. Stat. Sci. 3, 425–441 (1988).

[-]

recommendations

 

This item appears in the following Collection(s)

Show full item record