Skip to content

How to train the SVM for connected components in CDAInput processing  #8

@zywangzy

Description

@zywangzy

We're using SVM (Supporting Vector Machine) to do the classification to optimize the input of CDA (one of the steps in our current pipeline of text detection). Here are some tips on how to train the SVM:

My current solution is:

  1. Create a directory called SVM_training under directory Strabo.CommandLine\data\, so its path is Strabo.CommandLine\data\SVM_training.

  2. Then create two subdirectories in it: pos and neg, which contain the positive and negative training sample images for the SVM.

  3. Usually, you would get some small chunk images consisting of connected components (some are numbers or characters, others are noises) in the directory of Strabo.CommandLine\data\intermediate\ after running Strabo. You can manually copy these images which only contain numbers to the pos directory, and copy the images that only contain noise to the neg directory.
    Be careful, the images in pos should ONLY contain numbers and characters, and should NOT contain any noise. Similarly, the images in neg should ONLY contain noise, and should NOT contain any numbers or characters. Any images that have the mixture of numbers and noises should NOT be contained in both directories.

  4. To get these copied images pre-processed by our code, we have to get a text file containing the file list of each directory. We can do that by ls command in these directories.
    cd pos
    ls *.png -name > pos_list.txt
    cd ..\neg
    ls *.png -name > neg_list.txt

  5. Then run Strabo with setting the preprocess to true, the code will automatically compute the features of each CC (connected component) in each training sample listed in pos_list.txt and neg_list.txt, and mark each CC by label 1 for true (is a number) and 0 for false (not a number), then save the features in Strabo.CommandLine\data\SVM_training\features.txt, with each line representing a CC in the format of [feature #1] [feature #2] [class (0 or 1)].

  6. After preprocessing, the code will train the SVM model with the training data saved in features.txt, and save the model in Strabo.CommandLine\data\SVM_training\SVM_model.txt. When the code runs later and you set the loadLocalModel variable to true, then the code will load this file as a pre-trained model.

Now you should be capable to do these operations successfully, and the SVM for classifying connected components in CDAInput could be trained by you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions