- -

WGANVO: odometría visual monocular basada en redes adversarias generativas

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

WGANVO: odometría visual monocular basada en redes adversarias generativas

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Cremona, Javier es_ES
dc.contributor.author Uzal, Lucas es_ES
dc.contributor.author Pire, Taihú es_ES
dc.date.accessioned 2022-05-24T07:18:52Z
dc.date.available 2022-05-24T07:18:52Z
dc.date.issued 2022-04-01
dc.identifier.issn 1697-7912
dc.identifier.uri http://hdl.handle.net/10251/182820
dc.description.abstract [EN] Traditional Visual Odometry (VO) systems, direct or feature-based, are susceptible to matching errors between images. Furthermore, monocular configurations are only capable of estimating localization up to a scale factor, making impossible to use them out-of-the-box in robotics or virtual reality application. Recently, several Computer Vision problems have been successfully tackled by Deep Learning algorithms. In this paper we introduce a Deep Learning-based monocular Visual Odometry system called WGANVO. Specifically, we train a GAN-based neural network to regress a motion estimate. The resulting model receives a pair of images and estimates the relative motion between them. We train the neural network using a semi-supervised approach. In contrast to traditional geometry-based monocular systems, our Deep Learning-based method is able to estimate the absolute scale of thescene without extra information and prior knowledge. We evaluate WGANVO on the well-known KITTI dataset. We show that our system works in real time and the accuracy obtained encourages further development of Deep Learning-based localization systems. es_ES
dc.description.abstract [ES] Los sistemas tradicionales de odometría visual (VO), directos o basados en características visuales, son susceptibles de cometer errores de correspondencia entre imágenes. Además, las configuraciones monoculares sólo son capaces de estimar la localización sujeto a un factor de escala, lo que hace imposible su uso inmediato en aplicaciones de robótica o realidad virtual. Recientemente, varios problemas de Visión por Computadora han sido abordados con éxito por algoritmos de Aprendizaje Profundo. En este trabajo presentamos un sistema de odometría visual monocular basado en Aprendizaje Profundo llamado WGANVO. Específicamente, entrenamos una red neuronal basada en GAN para regresionar una estimación de movimiento. El modelo resultante recibe un par de imágenes y estima el movimiento relativo entre ellas. Entrenamos la red neuronal utilizando un enfoque semi-supervisado. A diferencia de los sistemas monoculares tradicionales basados en geometría, nuestro método basado en Deep Learning es capaz de estimar la escala absoluta de la escena sin información extra ni conocimiento previo. Evaluamos WGANVO en el conocido conjunto de datos KITTI. Demostramos que nuestro sistema funciona en tiempo real y la precisión obtenida alienta a seguir desarrollando sistemas de localización basados en Aprendizaje Profundo. es_ES
dc.description.sponsorship Este trabajo fue financiado por el CIFASIS, Centro Franco Argentino de Ciencias de la Información y de Sistemas (CONICET-UNR), con el proyecto de Unidad Ejecutora PUE 0015-2016. es_ES
dc.language Español es_ES
dc.publisher Universitat Politècnica de València es_ES
dc.relation.ispartof Revista Iberoamericana de Automática e Informática industrial es_ES
dc.rights Reconocimiento - No comercial - Compartir igual (by-nc-sa) es_ES
dc.subject Localization es_ES
dc.subject Neural networks es_ES
dc.subject Mobile robots es_ES
dc.subject Localización es_ES
dc.subject Redes Neuronales es_ES
dc.subject Robots Móviles es_ES
dc.title WGANVO: odometría visual monocular basada en redes adversarias generativas es_ES
dc.title.alternative WGANVO: monocular visual odometry based on generative adversarial networks es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.4995/riai.2022.16113
dc.relation.projectID info:eu-repo/grantAgreement/CONICET-UNR//PUE 0015-2016 es_ES
dc.rights.accessRights Abierto es_ES
dc.description.bibliographicCitation Cremona, J.; Uzal, L.; Pire, T. (2022). WGANVO: odometría visual monocular basada en redes adversarias generativas. Revista Iberoamericana de Automática e Informática industrial. 19(2):144-153. https://doi.org/10.4995/riai.2022.16113 es_ES
dc.description.accrualMethod OJS es_ES
dc.relation.publisherversion https://doi.org/10.4995/riai.2022.16113 es_ES
dc.description.upvformatpinicio 144 es_ES
dc.description.upvformatpfin 153 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 19 es_ES
dc.description.issue 2 es_ES
dc.identifier.eissn 1697-7920
dc.relation.pasarela OJS\16113 es_ES
dc.contributor.funder Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina es_ES
dc.contributor.funder Universidad Nacional de Rosario, Argentina es_ES
dc.description.references Agrawal, P., Carreira, J., Malik, J., 2015. Learning to See by Moving. In: Proceedings of the International Conference on Computer Vision. pp. 37-45. https://doi.org/10.1109/ICCV.2015.13 es_ES
dc.description.references Almalioglu, Y., Saputra, M. R. U., de Gusmao, P. P. B., Markham, A., Trigoni, N., 2019. GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks. In: Proceedings of the IEEE International Conference on Robotics and Automation. pp. 5474-5480. https://doi.org/10.1109/ICRA.2019.8793512 es_ES
dc.description.references Comport, A. I., Malis, E., Rives, P., 2010. Real-time quadrifocal visual odometry. International Journal of Robotics Research, 245-266. https://doi.org/10.1177/0278364909356601 es_ES
dc.description.references Cremona, J., Uzal, L., Pire, T., 2021. WGANVO Repository.https://github.com/CIFASIS/wganvo, [Online; accessed 19-August-2021]. es_ES
dc.description.references Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., Roberts, A.,2019. GANSynth: Adversarial Neural Audio Synthesis. In: Proceedings of the International Conference on Learning Representations. URL: https://openreview.net/pdf?id=H1xQVn09FX es_ES
dc.description.references Engel, J., Koltun, V., Cremers, D., 2018. Direct Sparse Odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 611-625. https://doi.org/10.1109/TPAMI.2017.2658577 es_ES
dc.description.references Engel, J., Schöps, T., Cremers, D., 2014. LSD-SLAM: Large-Scale Direct Monocular SLAM. In: Proceedings of the European Conference on Computer Vision. pp. 834-849. https://doi.org/10.1007/978-3-319-10605-2_54 es_ES
dc.description.references Facil, J. M., Ummenhofer, B., Zhou, H., Montesano, L., Brox, T., Civera, J.,2019. CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11818-11827. https://doi.org/10.1109/CVPR.2019.01210 es_ES
dc.description.references Forster, C., Pizzoli, M., Scaramuzza, D., 2014. SVO: Fast semi-direct monocular visual odometry. In: Proceedings of the IEEE International Conference on Robotics and Automation. pp. 15-22. https://doi.org/10.1109/ICRA.2014.6906584 es_ES
dc.description.references Geiger, A., Lenz, P., Stiller, C., Urtasun, R., 2013. Vision Meets Robotics: The KITTI Dataset. International Journal of Robotics Research, 1231-1237. https://doi.org/10.1177/0278364913491297 es_ES
dc.description.references Geiger, A., Lenz, P., Urtasun, R., 2012. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3354-3361. https://doi.org/10.1109/CVPR.2012.6248074 es_ES
dc.description.references Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,S., Courville, A., Bengio, Y., 2014. Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems. pp. 2672-2680. es_ES
dc.description.references Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A. C., 2017. Improved Training of Wasserstein GANs. In: Guyon, I., Luxburg, U. V.,Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (Eds.),Proceedings of the Advances in Neural Information Processing Systems.Vol. 30. Curran Associates, Inc. URL: https://proceedings.neurips.cc/paper/2017/file/892c3b1c6dccd52936e27cbd0ff683d6-Paper.pdf es_ES
dc.description.references Hartley, R., Zisserman, A., 2003. Multiple View Geometry in Computer Vision. Cambridge University Press, New York, USA. https://doi.org/10.1017/CBO9780511811685 es_ES
dc.description.references Karras, T., Laine, S., Aila, T., 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4396-4405. https://doi.org/10.1109/CVPR.2019.00453 es_ES
dc.description.references Kendall, A., Cipolla, R., 2017. Geometric Loss Functions for Camera Pose Regression with Deep Learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6555-6564. https://doi.org/10.1109/CVPR.2017.694 es_ES
dc.description.references Kendall, A., Grimes, M., Cipolla, R., 2015. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. In: Proceedings of the International Conference on Computer Vision. pp. 2938-2946. https://doi.org/10.1109/ICCV.2015.336 es_ES
dc.description.references Krizhevsky, A., Sutskever, I., Hinton, G. E., 2012. ImageNet Classification with Deep Convolutional Neural Networks. In: Pereira, F., Burges, C.J. C., Bottou, L., Weinberger, K. Q. (Eds.), Proceedings of the Advances in Neural Information Processing Systems. Vol. 25. Curran Associates, Inc.URL:https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf es_ES
dc.description.references Krombach, N., Droeschel, D., Behnke, S., 2016. Combining Feature-based and Direct Methods for Semi-dense Real-time Stereo Visual Odometry. In: Proceedings of the International Conference on Intelligent Autonomous Systems. pp. 855-868. https://doi.org/10.1007/978-3-319-48036-7_62 es_ES
dc.description.references LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553),436. URL: https://www.nature.com/articles/nature14539 https://doi.org/10.1038/nature14539 es_ES
dc.description.references Li, R., Wang, S., Long, Z., Gu, D., 2018. UnDeepVO: Monocular Visual Odometry through Unsupervised Deep Learning. In: Proceedings of the IEEE International Conference on Robotics and Automation. pp. 7286-7291. https://doi.org/10.1109/ICRA.2018.8461251 es_ES
dc.description.references Li, S., Xue, F., Wang, X., Yan, Z., Zha, H., 2019. Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry. In: Proceedings of the International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2019.00294 es_ES
dc.description.references Lowe, D. G., 1999. Object recognition from local scale-invariant features. In:Proceedings of the International Conference on Computer Vision. pp. 1150-1157. https://doi.org/10.1109/ICCV.1999.790410 es_ES
dc.description.references Min, Z., Yang, Y., Dunn, E., 2020. VOLDOR: Visual Odometry From LogLogistic Dense Optical Flow Residuals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4897-4908. https://doi.org/10.1109/CVPR42600.2020.00495 es_ES
dc.description.references Mur-Artal, R., Montiel, J. M. M., Tardós, J. D., 2015. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics,1147-1163. https://doi.org/10.1109/TRO.2015.2463671 es_ES
dc.description.references Mur-Artal, R., Tardós, J. D., 2017. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Transactions on Robotics, 1255-1262. https://doi.org/10.1109/TRO.2017.2705103 es_ES
dc.description.references Nistér, D., Naroditsky, O., Bergen, J., 2004. Visual odometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp.652-659. https://doi.org/10.1109/CVPR.2004.1315094 es_ES
dc.description.references Pire, T., Fischer, T., Castro, G., De Cristóforis, P., Civera, J., Jacobo Berlles, J.,2017. S-PTAM: Stereo Parallel Tracking and Mapping. Journal of Robotics and Autonomous Systems, 27-42. https://doi.org/10.1016/j.robot.2017.03.019 es_ES
dc.description.references Radford, A., Metz, L., Chintala, S., 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In: Computing Research Repository (CoRR).URL:http://arxiv.org/abs/1511.06434 es_ES
dc.description.references Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.,2016. Improved Techniques for Training GANs. In: Proceedings of the International Conference on Neural Information Processing Systems. pp. 2234-2242. es_ES
dc.description.references Scaramuzza, D., Fraundorfer, F., 2011. Visual Odometry [Tutorial]. IEEE Robotics and Automation Magazine, 80-92. https://doi.org/10.1109/MRA.2011.943233 es_ES
dc.description.references Siciliano, B., Khatib, O., 2016. Springer Handbook of Robotics. Springer Publishing Company, Incorporated. https://doi.org/10.1007/978-3-319-32552-1 es_ES
dc.description.references Tateno, K., Tombari, F., Laina, I., Navab, N., 2017. CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp.6565-6574. https://doi.org/10.1109/CVPR.2017.695 es_ES
dc.description.references Thrun, S., Burgard, W., Fox, D., 2005. Probabilistic Robotics. The MIT Press. es_ES
dc.description.references Tulyakov, S., Liu, M.-Y., Yang, X., Kautz, J., 2018. MoCoGAN: Decomposing motion and content for video generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1526-1535. https://doi.org/10.1109/CVPR.2018.00165 es_ES
dc.description.references Umeyama, S., 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 376-380. https://doi.org/10.1109/34.88573 es_ES
dc.description.references Wang, S., Clark, R., Wen, H., Trigoni, N., 2017. DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks. In:Proceedings of the IEEE International Conference on Robotics and Automation. pp. 2043-2050. https://doi.org/10.1109/ICRA.2017.7989236 es_ES
dc.description.references Yang, N., Wang, R., Stückler, J., Cremers, D., 2018. Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry. In: Proceedings of the European Conference on Computer Vision. pp. 835-852. https://doi.org/10.1007/978-3-030-01237-3_50 es_ES
dc.description.references Yi, X., Walia, E., Babyn, P., 2019. Generative adversarial network in medical imaging: A review. Medical Image Analysis 58, 101552.URL:https://www.sciencedirect.com/science/article/pii/S1361841518308430 https://doi.org/10.1016/j.media.2019.101552 es_ES
dc.description.references Yin, Z., Shi, J., 2018. GeoNet: Unsupervised Learning of Dense Depth, OpticalFlow and Camera Pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1983-1992. https://doi.org/10.1109/CVPR.2018.00212 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem