- -

Harmonization of quality metrics and power calculation in multi-omic studies

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Harmonization of quality metrics and power calculation in multi-omic studies

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Tarazona Campos, Sonia es_ES
dc.contributor.author Balzano-Nogueira, Leandro es_ES
dc.contributor.author Gómez-Cabrero, David es_ES
dc.contributor.author Schmidt, Andreas es_ES
dc.contributor.author Imhof, Axel es_ES
dc.contributor.author Hankemeier, Thomas es_ES
dc.contributor.author Tegnér, Jesper es_ES
dc.contributor.author Westerhuis, Johan A. es_ES
dc.contributor.author Conesa, Ana es_ES
dc.date.accessioned 2021-02-25T04:49:37Z
dc.date.available 2021-02-25T04:49:37Z
dc.date.issued 2020-06-18 es_ES
dc.identifier.issn 2041-1723 es_ES
dc.identifier.uri http://hdl.handle.net/10251/162371
dc.description.abstract [EN] Multi-omic studies combine measurements at different molecular levels to build comprehensive models of cellular systems. The success of a multi-omic data analysis strategy depends largely on the adoption of adequate experimental designs, and on the quality of the measurements provided by the different omic platforms. However, the field lacks a comparative description of performance parameters across omic technologies and a formulation for experimental design in multi-omic data scenarios. Here, we propose a set of harmonized Figures of Merit (FoM) as quality descriptors applicable to different omic data types. Employing this information, we formulate the MultiPower method to estimate and assess the optimal sample size in a multi-omics experiment. MultiPower supports different experimental settings, data types and sample sizes, and includes graphical for experimental design decision-making. MultiPower is complemented with MultiML, an algorithm to estimate sample size for machine learning classification problems based on multi-omic data. es_ES
dc.description.sponsorship This work has been funded by FP7 STATegra project agreement 306000 and Spanish MINECO grant BIO2012-40244. In addition, work in the Imhof lab has been funded by the (DFG; CIPSM and SFB1064). The work of L.B.-N. has been funded by the University of Florida Startup funds. es_ES
dc.language Inglés es_ES
dc.publisher Nature Publishing Group es_ES
dc.relation.ispartof Nature Communications es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject.classification ESTADISTICA E INVESTIGACION OPERATIVA es_ES
dc.title Harmonization of quality metrics and power calculation in multi-omic studies es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1038/s41467-020-16937-8 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/306000/EU/User-driven Development of Statistical Methods for Experimental Planning, Data Gathering, and Integrative Analysis of Next Generation Sequencing, Proteomics and Metabolomics data/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//BIO2012-40244/ES/DESARROLLO DE RECURSOS COMPUTACIONALES PARA LA CARACTERIZACION Y ANOTACION FUNCIONAL DE ARN NO CODIFICANTE./ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/DFG//SFB 1064/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat es_ES
dc.description.bibliographicCitation Tarazona Campos, S.; Balzano-Nogueira, L.; Gómez-Cabrero, D.; Schmidt, A.; Imhof, A.; Hankemeier, T.; Tegnér, J.... (2020). Harmonization of quality metrics and power calculation in multi-omic studies. Nature Communications. 11(1):1-13. https://doi.org/10.1038/s41467-020-16937-8 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1038/s41467-020-16937-8 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 13 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 11 es_ES
dc.description.issue 1 es_ES
dc.identifier.pmid 32555183 es_ES
dc.identifier.pmcid PMC7303201 es_ES
dc.relation.pasarela S\414882 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder University of Florida es_ES
dc.contributor.funder Deutsche Forschungsgemeinschaft es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.contributor.funder Center for Integrated Protein Science Munich es_ES
dc.description.references Thingholm, L. B. et al. Strategies for integrated analysis of genetic, epigenetic, and gene expression variation in cancer: addressing the challenges. Front. Genet. 7, 2 (2016). es_ES
dc.description.references Blatti, C., Kazemian, M., Wolfe, S., Brodsky, M. & Sinha, S. Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res. 43, 3998–4012 (2015). es_ES
dc.description.references Fagan, A., Culhane, A. C. & Higgins, D. G. A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 7, 2162–2171 (2007). es_ES
dc.description.references Conesa, A., Prats-Montalbán, J. M., Tarazona, S., Nueda, M. J. & Ferrer, A. A multiway approach to data integration in systems biology based on Tucker3 and N-PLS. Chemometrics Intell. Lab. Syst. 104, 101–111 (2010). es_ES
dc.description.references Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016). es_ES
dc.description.references Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012). es_ES
dc.description.references Wei, Z., Zhang, W., Fang, H., Li, Y. & Wang, X. esATAC: an easy-to-use systematic pipeline for ATAC-seq data analysis. Bioinformatics 34, 2664–2665 (2018). es_ES
dc.description.references Sun, Z. et al. SAAP-RRBS: streamlined analysis and annotation pipeline for reduced representation bisulfite sequencing. Bioinformatics 28, 2180–2181 (2012). es_ES
dc.description.references Xia, J. & Wishart, D. S. Using MetaboAnalyst 3.0 for comprehensive metabolomics data analysis. Curr. Protoc. Bioinformatics 55, 14.10.1:14.10.91 (2016). es_ES
dc.description.references Davidson, R. L., Weber, R. J. M., Liu, H., Sharma-Oates, A. & Viant, M. R. Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data. Gigascience 5, 10 (2016). es_ES
dc.description.references Goeminne, L. J. E., Gevaert, K. & Clement, L. Experimental design and data-analysis in label-free quantitative LC/MS proteomics: a tutorial with MSqRob. J. Proteom. 171, 23–36 (2018). es_ES
dc.description.references Codrea, M. C. & Nahnsen, S. Platforms and pipelines for proteomics data analysis and management. Adv. Exp. Med Biol. 919, 203–215 (2016). es_ES
dc.description.references Park, Y., Figueroa, M., Rozek, L. & Sartor, M. MethylSig: a whole genome DNA methylation analysis pipeline. Bioinformatics 30, 2414–2422 (2014). es_ES
dc.description.references Andrews S. FASTQC. A Quality Control Tool for High Throughput Sequence Data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2014). es_ES
dc.description.references García-Alcalde, F. et al. Qualimap: evaluating next-generation sequencing alignment data. Bioinformatics 28, 2678–2679 (2012). es_ES
dc.description.references Okonechnikov, K., Conesa, A. & Garcia-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016). es_ES
dc.description.references Lassmann, T., Hayashizaki, Y. & Daub, C. O. SAMStat: monitoring biases in next generation sequencing data. Bioinformatics 27, 130–131 (2011). es_ES
dc.description.references Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016). es_ES
dc.description.references Poplawski, A. & Binder, H. Feasibility of sample size calculation for RNA-seq studies. Brief. Bioinform. 19, 713–720 (2018). es_ES
dc.description.references Li, C.-I., Samuels, D. C., Zhao, Y.-Y., Shyr, Y. & Guo, Y. Power and sample size calculations for high-throughput sequencing-based experiments. Brief. Bioinform. 19, 1247–1255 (2018). es_ES
dc.description.references Banko, M. & Brill, E. Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics 26–33 (Association for Computational Linguistics, France, 2001). es_ES
dc.description.references Figueroa, R. L., Zeng-Treitler, Q., Kandula, S. & Ngo, L. H. Predicting sample size required for classification performance. BMC Med. Inf. Decis. Mak. 12, 8 (2012). es_ES
dc.description.references Dunn, W. B. & Ellis, D. I. Metabolomics: current analytical platforms and methodologies. TrAC Trends Anal. Chem. 24, 285–294 (2005). es_ES
dc.description.references Chang, C.-Y. et al. Protein significance analysis in selected reaction monitoring (SRM) measurements. Mol. Cell. Proteomics 11, M111.014662 https://doi.org/10.1074/mcp.M111.014662 (2012). es_ES
dc.description.references Markley, J. L. et al. The future of NMR-based metabolomics. Curr. Opin. Biotechnol. 43, 34–40 (2017). es_ES
dc.description.references Rocke, D. M. & Lorenzato, S. A two-component model for measurement error in analytical chemistry. Technometrics 37, 176–184 (1995). es_ES
dc.description.references Van Batenburg, M. F., Coulier, L., van Eeuwijk, F., Smilde, A. K. & Westerhuis, J. A. New figures of merit for comprehensive functional genomics data: the metabolomics case. Anal. Chem. 83, 3267–3274 (2011). es_ES
dc.description.references Dunn, W. B. et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 6, 1060–1083 (2011). es_ES
dc.description.references Keun, H. C. NMR-based Metabolomics P001–P368 (The Royal Society of Chemistry, 2018). es_ES
dc.description.references Tarazona, S., Garcia-Alcalde, F., Dopazo, J., Ferrer, A. & Conesa, A. Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213–2223 (2011). es_ES
dc.description.references Kim, S. et al. Evaluation and optimization of metabolome sample preparation methods for Saccharomyces cerevisiae. Anal. Chem. 85, 2169–2176 (2013). es_ES
dc.description.references Köcher, T., Swart, R. & Mechtler, K. Ultra-high-pressure RPLC hyphenated to an LTQ-Orbitrap Velos reveals a linear relation between peak capacity and number of identified peptides. Anal. Chem. 83, 2699–2704 (2011). es_ES
dc.description.references Boja, E. S. & Rodriguez, H. Mass spectrometry-based targeted quantitative proteomics: achieving sensitive and reproducible detection of proteins. Proteomics 12, 1093–1110 (2012). es_ES
dc.description.references Olkhov-Mitsel, E. & Bapat, B. Strategies for discovery and validation of methylated and hydroxymethylated DNA biomarkers. Cancer Med. 1, 237–260 (2012). es_ES
dc.description.references Armbruster, D. A. & Pry, T. Limit of blank, limit of detection and limit of quantitation. Clin. Biochem. Rev. 29, S49–S52 (2008). es_ES
dc.description.references Arsova, B., Zauber, H. & Schulze, W. X. Precision, proteome coverage, and dynamic range of Arabidopsis proteome profiling using (15)N metabolic labeling and label-free approaches. Mol. Cell. Proteomics 11, 619–628 (2012). es_ES
dc.description.references Kuhn, E. et al. Interlaboratory evaluation of automated, multiplexed peptide immunoaffinity enrichment coupled to multiple reaction monitoring mass spectrometry for quantifying proteins in plasma. Mol. Cell. Proteomics 11, M111.013854  https://doi.org/10.1074/mcp.M111.013854 (2012). es_ES
dc.description.references Kondrat, R. W., McClusky, G. A. & Cooks, R. G. Multiple reaction monitoring in mass spectrometry/mass spectrometry for direct analysis of complex mixtures. Anal. Chem. 50, 2017–2021 (1978). es_ES
dc.description.references Wishart, D. S. et al. HMDB: the human metabolome database. Nucleic Acids Res. 35, D521–D526 (2007). es_ES
dc.description.references Kopka, J. et al. GMD@CSB.DB: the golm metabolome database. Bioinformatics 21, 1635–1638 (2005). es_ES
dc.description.references Scholz, M. & Fiehn, O. SetupX–a public study design database for metabolomic projects. Pac. Symp. Biocomput. 12, 169–180 (2007). es_ES
dc.description.references Bell, A. W. et al. A HUPO test sample study reveals common problems in mass spectrometry-based proteomics. Nat. Methods 6, 423–430 (2009). es_ES
dc.description.references Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007). es_ES
dc.description.references Roberts, A., Feng, H. & Pachter, L. Fragment assignment in the cloud with eXpress-D. BMC Bioinformatics14, 358 (2013). es_ES
dc.description.references Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008). es_ES
dc.description.references Gomez-Cabrero, D. et al. STATegra, a comprehensive multi-omics dataset of B-cell differentiation in mouse. Sci. Data 6, 256 (2019). es_ES
dc.description.references Verhaak, R. G. W. et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17, 98–110 (2010). es_ES
dc.description.references Altmäe, S. et al. Guidelines for the design, analysis and interpretation of ‘omics’ data: focus on human endometrium. Hum. Reprod. Update 20, 12–28 (2014). es_ES
dc.description.references Reo, N. V. NMR-based Metabolomics. Drug Chem. Toxicol. 25, 375–382 (2002). es_ES
dc.description.references Li, C.-I., Su, P.-F. & Shyr, Y. Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data. BMC Bioinformatics 14, 357–357 (2013). es_ES
dc.description.references Storey, J. D. A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B 64, 479–498 (2002). es_ES
dc.description.references Jung, S.-H. Sample size for FDR-control in microarray data analysis. Bioinformatics 21, 3097–3104 (2005). es_ES
dc.description.references Storey, J. D. & Tibshirani, R. Estimating the positive false discovery rate under dependence, with applications to DNA microarrays. Stanford Stat. Rep. 28 (2001). es_ES
dc.description.references Zhao, S., Li, C.-I., Guo, Y., Sheng, Q. & Shyr, Y. RnaSeqSampleSize: real data based sample size estimation for RNA sequencing. BMC Bioinformatics 19, 191 (2018). es_ES
dc.description.references Cohen, J. Statistical Power Analysis for the Behavioral Sciences (L. Erlbaum Associates, 1988). es_ES
dc.description.references Sawilowsky, S. New effect size rules of thumb. J. Mod. Appl. Stat. Methods 8, 597–599 (2009). es_ES
dc.description.references Liu, P. & Hwang, J. T. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23, 739–746 (2007). es_ES
dc.description.references Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001). es_ES
dc.description.references Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752–e1005752 (2017). es_ES
dc.description.references James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning Vol. 112 (Springer, 2013). es_ES
dc.description.references Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010). es_ES
dc.description.references Meyer, M. C. Inference using shape-restricted regression splines. Ann. Appl. Stat. 2, 1013–1033 (2008). es_ES
dc.description.references Ramsay, J. O. Monotone regression splines in action. Stat. Sci. 3, 425–441 (1988). es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem