character base tesseract

Hi again,

In this wrapper, I wonder why for some language besides English, Tesseract API with tessdata-best.traineddata gives the result in the format of **character base** not _word base_ like English. For example:

**Thai**
```
69 confidence: 93.2952651977539 - [63, 74, 74, 85]; ห
70 confidence: 93.29107666015625 - [77, 74, 83, 85]; า
71 confidence: 93.30585479736328 - [75, 64, 100, 93]; ให
72 confidence: 93.0483627319336 - [101, 70, 105, 85]; ้
73 confidence: 93.2821044921875 - [111, 69, 116, 85]; ร
```
**Eng**

```
0 confidence: 96.37889099121094 - [358, 42, 443, 66]; FOCUS
1 confidence: 95.37885284423828 - [147, 263, 328, 294]; LEADERS
2 confidence: 95.37885284423828 - [341, 266, 653, 294]; CONCENTRATE
3 confidence: 90.43708801269531 - [116, 315, 506, 342]; SINGLE-MINDEDLY
```
Do you have any suggestions of setting configs in order to making the result in _word base_ or _text line base_ format? 

[image sample](http://letstalkthai.com/2015/04/15/talking-thai-english-phrasebook-by-paiboon-publishing/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

character base tesseract #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

character base tesseract #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions