基于卷积神经网络的中文文本基于文本结构建模的检测算法
基于卷积神经网络的中文文本基于文本结构建模的检测算法(中文6000字,英文3000字)
第一章 导言
随着便携式多媒体记录设备(例如智能手机和平板电脑)的普及,多媒体内容在图像和视频共享网站中激增,例如, Youtube和Flickr。从那些自然图像和视频中提取文本信息有助于广泛的应用,例如图像分类,场景识别和视频检索。尽管传统的光学字符识别(OCR)系统在从扫描文档中提取文本信息方面取得了良好的性能,但它们在自然图像和视频上的性能可能会显着下降。在自然环境中使用OCR系统的最大挑战是检测文本区域,因为自然图像和视频中的背景尺寸更大,纹理更复杂。为了量化和跟踪自然图像中文本位置的进展,近年来已经举办了几次比赛,包括2003年,2005年,2011年和2013年的4次ICDAR文本位置比赛[1] - [4]。然而,即使是ICDAR 2013中报告的最佳性能算法,也只能对数据集中的66%的单词进行本地化[4],这清楚地表明仍有很大的性能提升空间。
A Convolutional Neural Network-Based Chinese TextDetection Algorithm via Text Structure Modeling
I. INTRODUCTION
with increasing penetration of portable multimedia recording devices (such as smart phones and tablets),multimedia contents proliferate in image and video sharing websites, e.g. Youtube and Flickr. Extracting text information from those natural images and videos are conducive to a wide range of applications such as image classification, scene recognition and video retrieval. Although traditional optical character recognition (OCR) systems have achieved good performance in extracting text information from scanned documents, their performance on natural images and videos could drop significantly. The biggest challenge of using OCR systems in natural environment is detecting text regions, as the background in natural images and videos is much larger in size and much more complex in texture. To quantify and track the progress of text location in natural images, several competitions, including four ICDAR Text Location Competitions in 2003, 2005, 2011 and 2013 [1]–[4] have been held in recent years. However, even the best performing algorithm reported in ICDAR 2013 can localize only 66% of words in the dataset [4], which clearly shows that there is still a large room for performance improvement. [资料来源:www.doc163.com]