- -

MultiBaC: A strategy to remove batch effects between different omic data types

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

MultiBaC: A strategy to remove batch effects between different omic data types

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Ugidos, Manuel es_ES
dc.contributor.author Tarazona Campos, Sonia es_ES
dc.contributor.author Prats-Montalbán, José Manuel es_ES
dc.contributor.author Ferrer, Alberto es_ES
dc.contributor.author Conesa, Ana es_ES
dc.date.accessioned 2021-03-05T04:32:24Z
dc.date.available 2021-03-05T04:32:24Z
dc.date.issued 2020-10 es_ES
dc.identifier.issn 0962-2802 es_ES
dc.identifier.uri http://hdl.handle.net/10251/163188
dc.description.abstract [EN] Diversity of omic technologies has expanded in the last years together with the number of omic data integration strategies. However, multiomic data generation is costly, and many research groups cannot afford research projects where many different omic techniques are generated, at least at the same time. As most researchers share their data in public repositories, different omic datasets of the same biological system obtained at different labs can be combined to construct a multiomic study. However, data obtained at different labs or moments in time are typically subjected to batch effects that need to be removed for successful data integration. While there are methods to correct batch effects on the same data types obtained in different studies, they cannot be applied to correct lab or batch effects across omics. This impairs multiomic meta-analysis. Fortunately, in many cases, at least one omics platform-i.e. gene expression- is repeatedly measured across labs, together with the additional omic modalities that are specific to each study. This creates an opportunity for batch analysis. We have developed MultiBaC (multiomic Multiomics Batch-effect Correction correction), a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. Our strategy is based on the existence of at least one shared data type which allows data prediction across omics. We validate this approach both on simulated data and on a case where the multiomic design is fully shared by two labs, hence batch effect correction within the same omic modality using traditional methods can be compared with the MultiBaC correction across data types. Finally, we apply MultiBaC to a true multiomic data integration problem to show that we are able to improve the detection of meaningful biological effects. es_ES
dc.description.sponsorship The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is part of a research project that is totally funded by Conselleria d'Educacio, Cultura i Esport (Generalitat Valenciana) through PROMETEO grants program for excellence research groups. es_ES
dc.language Inglés es_ES
dc.publisher SAGE Publications es_ES
dc.relation.ispartof Statistical Methods in Medical Research es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Batch effect correction es_ES
dc.subject Multiomic integration es_ES
dc.subject Multivariate methods es_ES
dc.subject Biostatistics es_ES
dc.subject.classification ESTADISTICA E INVESTIGACION OPERATIVA es_ES
dc.title MultiBaC: A strategy to remove batch effects between different omic data types es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1177/0962280220907365 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//PROMETEO%2F2016%2F093/ES/The Next Systems Biology: desarrollo de métodos estadísticos para la biología de sistemas multiómica/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat es_ES
dc.description.bibliographicCitation Ugidos, M.; Tarazona Campos, S.; Prats-Montalbán, JM.; Ferrer, A.; Conesa, A. (2020). MultiBaC: A strategy to remove batch effects between different omic data types. Statistical Methods in Medical Research. 29(10):2851-2864. https://doi.org/10.1177/0962280220907365 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1177/0962280220907365 es_ES
dc.description.upvformatpinicio 2851 es_ES
dc.description.upvformatpfin 2864 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 29 es_ES
dc.description.issue 10 es_ES
dc.identifier.pmid 32131696 es_ES
dc.relation.pasarela S\408088 es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.description.references Kupfer, P., Guthke, R., Pohlers, D., Huber, R., Koczan, D., & Kinne, R. W. (2012). Batch correction of microarray data substantially improves the identification of genes differentially expressed in Rheumatoid Arthritis and Osteoarthritis. BMC Medical Genomics, 5(1). doi:10.1186/1755-8794-5-23 es_ES
dc.description.references Gregori, J., Villarreal, L., Méndez, O., Sánchez, A., Baselga, J., & Villanueva, J. (2012). Batch effects correction improves the sensitivity of significance tests in spectral counting-based comparative discovery proteomics. Journal of Proteomics, 75(13), 3938-3951. doi:10.1016/j.jprot.2012.05.005 es_ES
dc.description.references Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47-e47. doi:10.1093/nar/gkv007 es_ES
dc.description.references Gagnon-Bartsch, J. A., & Speed, T. P. (2011). Using control genes to correct for unwanted variation in microarray data. Biostatistics, 13(3), 539-552. doi:10.1093/biostatistics/kxr034 es_ES
dc.description.references Nueda, M. j., Ferrer, A., & Conesa, A. (2011). ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments. Biostatistics, 13(3), 553-566. doi:10.1093/biostatistics/kxr042 es_ES
dc.description.references Jansen, J. J., Hoefsloot, H. C. J., van der Greef, J., Timmerman, M. E., Westerhuis, J. A., & Smilde, A. K. (2005). ASCA: analysis of multivariate data obtained from an experimental design. Journal of Chemometrics, 19(9), 469-481. doi:10.1002/cem.952 es_ES
dc.description.references Nueda, M. J., Conesa, A., Westerhuis, J. A., Hoefsloot, H. C. J., Smilde, A. K., Talón, M., & Ferrer, A. (2007). Discovering gene expression patterns in time course microarray experiments by ANOVA–SCA. Bioinformatics, 23(14), 1792-1800. doi:10.1093/bioinformatics/btm251 es_ES
dc.description.references Giordan, M. (2013). A Two-Stage Procedure for the Removal of Batch Effects in Microarray Studies. Statistics in Biosciences, 6(1), 73-84. doi:10.1007/s12561-013-9081-1 es_ES
dc.description.references Nyamundanda, G., Poudel, P., Patil, Y., & Sadanandam, A. (2017). A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies. Scientific Reports, 7(1). doi:10.1038/s41598-017-11110-6 es_ES
dc.description.references Reese, S. E., Archer, K. J., Therneau, T. M., Atkinson, E. J., Vachon, C. M., de Andrade, M., … Eckel-Passow, J. E. (2013). A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics, 29(22), 2877-2883. doi:10.1093/bioinformatics/btt480 es_ES
dc.description.references Papiez, A., Marczyk, M., Polanska, J., & Polanski, A. (2018). BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm. Bioinformatics, 35(11), 1885-1892. doi:10.1093/bioinformatics/bty900 es_ES
dc.description.references Keel, B. N., Zarek, C. M., Keele, J. W., Kuehn, L. A., Snelling, W. M., Oliver, W. T., … Lindholm-Perry, A. K. (2018). RNA-Seq Meta-analysis identifies genes in skeletal muscle associated with gain and intake across a multi-season study of crossbred beef steers. BMC Genomics, 19(1). doi:10.1186/s12864-018-4769-8 es_ES
dc.description.references Li, M. D., Burns, T. C., Morgan, A. A., & Khatri, P. (2014). Integrated multi-cohort transcriptional meta-analysis of neurodegenerative diseases. Acta Neuropathologica Communications, 2(1). doi:10.1186/s40478-014-0093-y es_ES
dc.description.references Andres-Terre, M., McGuire, H. M., Pouliot, Y., Bongen, E., Sweeney, T. E., Tato, C. M., & Khatri, P. (2015). Integrated, Multi-cohort Analysis Identifies Conserved Transcriptional Signatures across Multiple Respiratory Viruses. Immunity, 43(6), 1199-1211. doi:10.1016/j.immuni.2015.11.003 es_ES
dc.description.references Sandhu, V., Labori, K. J., Borgida, A., Lungu, I., Bartlett, J., Hafezi-Bakhtiari, S., … Haibe-Kains, B. (2019). Meta-Analysis of 1,200 Transcriptomic Profiles Identifies a Prognostic Model for Pancreatic Ductal Adenocarcinoma. JCO Clinical Cancer Informatics, (3), 1-16. doi:10.1200/cci.18.00102 es_ES
dc.description.references Huang, H., Liu, C.-C., & Zhou, X. J. (2010). Bayesian approach to transforming public gene expression repositories into disease diagnosis databases. Proceedings of the National Academy of Sciences, 107(15), 6823-6828. doi:10.1073/pnas.0912043107 es_ES
dc.description.references Pelechano, V., & Pérez-Ortín, J. E. (2010). There is a steady-state transcriptome in exponentially growing yeast cells. Yeast, 27(7), 413-422. doi:10.1002/yea.1768 es_ES
dc.description.references Garcı́a-Martı́nez, J., Aranda, A., & Pérez-Ortı́n, J. E. (2004). Genomic Run-On Evaluates Transcription Rates for All Yeast Genes and Identifies Gene Regulatory Mechanisms. Molecular Cell, 15(2), 303-313. doi:10.1016/j.molcel.2004.06.004 es_ES
dc.description.references Pelechano, V., Chávez, S., & Pérez-Ortín, J. E. (2010). A Complete Set of Nascent Transcription Rates for Yeast Genes. PLoS ONE, 5(11), e15442. doi:10.1371/journal.pone.0015442 es_ES
dc.description.references Zid, B. M., & O’Shea, E. K. (2014). Promoter sequences direct cytoplasmic localization and translation of mRNAs during starvation in yeast. Nature, 514(7520), 117-121. doi:10.1038/nature13578 es_ES
dc.description.references Freeberg, M. A., Han, T., Moresco, J. J., Kong, A., Yang, Y.-C., Lu, Z., … Kim, J. K. (2013). Pervasive and dynamic protein binding sites of the mRNA transcriptome in Saccharomyces cerevisiae. Genome Biology, 14(2), R13. doi:10.1186/gb-2013-14-2-r13 es_ES
dc.description.references McKinlay, A., Araya, C. L., & Fields, S. (2011). Genome-Wide Analysis of Nascent Transcription in Saccharomyces cerevisiae. G3 Genes|Genomes|Genetics, 1(7), 549-558. doi:10.1534/g3.111.000810 es_ES
dc.description.references Castells-Roca, L., García-Martínez, J., Moreno, J., Herrero, E., Bellí, G., & Pérez-Ortín, J. E. (2011). Heat Shock Response in Yeast Involves Changes in Both Transcription Rates and mRNA Stabilities. PLoS ONE, 6(2), e17272. doi:10.1371/journal.pone.0017272 es_ES
dc.description.references Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109-130. doi:10.1016/s0169-7439(01)00155-1 es_ES
dc.description.references Folch-Fortuny, A., Vitale, R., de Noord, O. E., & Ferrer, A. (2017). Calibration transfer between NIR spectrometers: New proposals and a comparative study. Journal of Chemometrics, 31(3), e2874. doi:10.1002/cem.2874 es_ES
dc.description.references García Muñoz, S., MacGregor, J. F., & Kourti, T. (2005). Product transfer between sites using Joint-Y PLS. Chemometrics and Intelligent Laboratory Systems, 79(1-2), 101-114. doi:10.1016/j.chemolab.2005.04.009 es_ES
dc.description.references Andrade, J. M., Gómez-Carracedo, M. P., Krzanowski, W., & Kubista, M. (2004). Procrustes rotation in analytical chemistry, a tutorial. Chemometrics and Intelligent Laboratory Systems, 72(2), 123-132. doi:10.1016/j.chemolab.2004.01.007 es_ES
dc.description.references Hurley, J. R., & Cattell, R. B. (2007). The procrustes program: Producing direct rotation to test a hypothesized factor structure. Behavioral Science, 7(2), 258-262. doi:10.1002/bs.3830070216 es_ES
dc.description.references Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering Algorithm. Applied Statistics, 28(1), 100. doi:10.2307/2346830 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem