Abstract
In this working note, we introduce our participation at the ImageCLEF 2016 Handwritten Document Retrieval Task. We mainly focused on hyphenation detection using line images and information retrieval using n-best results. The hyphenation detection step utilizes extracted image features from beginning and end of a line and a binary classifier to determine if a line contains hyphenation. Then the spell correction step is used to eliminate spelling errors from the concatenation of a broken word from the end of a line and the beginning of the next line. The final text retrieval step employs a sufix stripping algorithm to normalize the word tense and form and TF-IDF scheme to rank the retrieved relevant segment results of our submission.
Original language | English (US) |
---|---|
Pages (from-to) | 347-355 |
Number of pages | 9 |
Journal | CEUR Workshop Proceedings |
Volume | 1609 |
State | Published - 2016 |
Event | 2016 Working Notes of Conference and Labs of the Evaluation Forum, CLEF 2016 - Evora, Portugal Duration: Sep 5 2016 → Sep 8 2016 |
Keywords
- Handwriting recognition
- Hyphenation detection
- Text retrieval
ASJC Scopus subject areas
- General Computer Science