- -

The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing

Show full item record

Castro-Bleda, MJ.; España Boquera, S.; Pastor Pellicer, J.; Zamora Martínez, FJ. (2020). The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing. The Computer Journal. 63(11):1658-1667. https://doi.org/10.1093/comjnl/bxz098

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/156936

Files in this item

Item Metadata

Title: The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing
Author: Castro-Bleda, Maria Jose España Boquera, Salvador Pastor Pellicer, Joan ZAMORA MARTÍNEZ, FRANCISCO JULIÁN
UPV Unit: Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
Issued date:
Abstract:
[EN] This paper presents the `NoisyOffice¿ database. It consists of images of printed text documents with noise mainly caused by uncleanliness from a generic office, such as coffee stains and footprints on documents or ...[+]
Subjects: Optical character recognition , Image processing , Binarization , Denoising , Super resolution , Machine learning , Neural networks , Deep learning
Copyrigths: Reserva de todos los derechos
Source:
The Computer Journal. (issn: 0010-4620 )
DOI: 10.1093/comjnl/bxz098
Publisher:
Oxford University Press
Publisher version: https://doi.org/10.1093/comjnl/bxz098
Project ID:
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-85854-C4-2-R/ES/AMIC-UPV: ANALISIS AFECTIVO DE INFORMACION MULTIMEDIA CON COMUNICACION INCLUSIVA Y NATURAL/
Thanks:
This research was undertaken as part of the project TIN2017-85854-C4-2-R, jointly funded by the Spanish MINECO and FEDER founds.
Type: Artículo

References

Bozinovic, R. M., & Srihari, S. N. (1989). Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1), 68-83. doi:10.1109/34.23114

Plamondon, R., & Srihari, S. N. (2000). Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 63-84. doi:10.1109/34.824821

Vinciarelli, A. (2002). A survey on off-line Cursive Word Recognition. Pattern Recognition, 35(7), 1433-1446. doi:10.1016/s0031-3203(01)00129-7 [+]
Bozinovic, R. M., & Srihari, S. N. (1989). Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1), 68-83. doi:10.1109/34.23114

Plamondon, R., & Srihari, S. N. (2000). Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 63-84. doi:10.1109/34.824821

Vinciarelli, A. (2002). A survey on off-line Cursive Word Recognition. Pattern Recognition, 35(7), 1433-1446. doi:10.1016/s0031-3203(01)00129-7

Impedovo, S. (2014). More than twenty years of advancements on Frontiers in handwriting recognition. Pattern Recognition, 47(3), 916-928. doi:10.1016/j.patcog.2013.05.027

Baird, H. S. (2007). The State of the Art of Document Image Degradation Modelling. Advances in Pattern Recognition, 261-279. doi:10.1007/978-1-84628-726-8_12

Egmont-Petersen, M., de Ridder, D., & Handels, H. (2002). Image processing with neural networks—a review. Pattern Recognition, 35(10), 2279-2301. doi:10.1016/s0031-3203(01)00178-9

Marinai, S., Gori, M., & Soda, G. (2005). Artificial neural networks for document analysis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1), 23-35. doi:10.1109/tpami.2005.4

Rehman, A., & Saba, T. (2012). Neural networks for document image preprocessing: state of the art. Artificial Intelligence Review, 42(2), 253-273. doi:10.1007/s10462-012-9337-z

Lazzara, G., & Géraud, T. (2013). Efficient multiscale Sauvola’s binarization. International Journal on Document Analysis and Recognition (IJDAR), 17(2), 105-123. doi:10.1007/s10032-013-0209-0

Fischer, A., Indermühle, E., Bunke, H., Viehhauser, G., & Stolz, M. (2010). Ground truth creation for handwriting recognition in historical documents. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems - DAS ’10. doi:10.1145/1815330.1815331

Belhedi, A., & Marcotegui, B. (2016). Adaptive scene‐text binarisation on images captured by smartphones. IET Image Processing, 10(7), 515-523. doi:10.1049/iet-ipr.2015.0695

Kieu, V. C., Visani, M., Journet, N., Mullot, R., & Domenger, J. P. (2013). An efficient parametrization of character degradation model for semi-synthetic image generation. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing - HIP ’13. doi:10.1145/2501115.2501127

Fischer, A., Visani, M., Kieu, V. C., & Suen, C. Y. (2013). Generation of learning samples for historical handwriting recognition using image degradation. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing - HIP ’13. doi:10.1145/2501115.2501123

Journet, N., Visani, M., Mansencal, B., Van-Cuong, K., & Billy, A. (2017). DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images. Journal of Imaging, 3(4), 62. doi:10.3390/jimaging3040062

Walker, D., Lund, W., & Ringger, E. (2012). A synthetic document image dataset for developing and evaluating historical document processing methods. Document Recognition and Retrieval XIX. doi:10.1117/12.912203

Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 295-307. doi:10.1109/tpami.2015.2439281

Suzuki, K., Horiba, I., & Sugie, N. (2003). Neural edge enhancer for supervised edge enhancement from noisy images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1582-1596. doi:10.1109/tpami.2003.1251151

Hidalgo, J. L., España, S., Castro, M. J., & Pérez, J. A. (2005). Enhancement and Cleaning of Handwritten Data by Using Neural Networks. Lecture Notes in Computer Science, 376-383. doi:10.1007/11492429_46

Pastor-Pellicer, J., España-Boquera, S., Zamora-Martínez, F., Afzal, M. Z., & Castro-Bleda, M. J. (2015). Insights on the Use of Convolutional Neural Networks for Document Image Binarization. Lecture Notes in Computer Science, 115-126. doi:10.1007/978-3-319-19222-2_10

España-Boquera, S., Zamora-Martínez, F., Castro-Bleda, M. J., & Gorbe-Moya, J. (s. f.). Efficient BP Algorithms for General Feedforward Neural Networks. Lecture Notes in Computer Science, 327-336. doi:10.1007/978-3-540-73053-8_33

Zamora-Martínez, F., España-Boquera, S., & Castro-Bleda, M. J. (s. f.). Behaviour-Based Clustering of Neural Networks Applied to Document Enhancement. Lecture Notes in Computer Science, 144-151. doi:10.1007/978-3-540-73007-1_18

Graves, A., Fernández, S., & Schmidhuber, J. (2007). Multi-dimensional Recurrent Neural Networks. Artificial Neural Networks – ICANN 2007, 549-558. doi:10.1007/978-3-540-74690-4_56

Sauvola, J., & Pietikäinen, M. (2000). Adaptive document image binarization. Pattern Recognition, 33(2), 225-236. doi:10.1016/s0031-3203(99)00055-2

Pastor-Pellicer, J., Castro-Bleda, M. J., & Adelantado-Torres, J. L. (2015). esCam: A Mobile Application to Capture and Enhance Text Images. Lecture Notes in Computer Science, 601-604. doi:10.1007/978-3-319-19222-2_50

[-]

recommendations

 

This item appears in the following Collection(s)

Show full item record