-
Notifications
You must be signed in to change notification settings - Fork 1
Description
We're using SVM (Supporting Vector Machine) to do the classification to optimize the input of CDA (one of the steps in our current pipeline of text detection). Here are some tips on how to train the SVM:
My current solution is:
-
Create a directory called
SVM_trainingunder directoryStrabo.CommandLine\data\, so its path isStrabo.CommandLine\data\SVM_training. -
Then create two subdirectories in it:
posandneg, which contain the positive and negative training sample images for the SVM. -
Usually, you would get some small chunk images consisting of connected components (some are numbers or characters, others are noises) in the directory of
Strabo.CommandLine\data\intermediate\after running Strabo. You can manually copy these images which only contain numbers to theposdirectory, and copy the images that only contain noise to thenegdirectory.
Be careful, the images inposshould ONLY contain numbers and characters, and should NOT contain any noise. Similarly, the images innegshould ONLY contain noise, and should NOT contain any numbers or characters. Any images that have the mixture of numbers and noises should NOT be contained in both directories. -
To get these copied images pre-processed by our code, we have to get a text file containing the file list of each directory. We can do that by ls command in these directories.
cd pos
ls *.png -name > pos_list.txt
cd ..\neg
ls *.png -name > neg_list.txt -
Then run Strabo with setting the preprocess to true, the code will automatically compute the features of each CC (connected component) in each training sample listed in pos_list.txt and neg_list.txt, and mark each CC by label 1 for true (is a number) and 0 for false (not a number), then save the features in
Strabo.CommandLine\data\SVM_training\features.txt, with each line representing a CC in the format of[feature #1] [feature #2] [class (0 or 1)]. -
After preprocessing, the code will train the SVM model with the training data saved in features.txt, and save the model in
Strabo.CommandLine\data\SVM_training\SVM_model.txt. When the code runs later and you set theloadLocalModelvariable to true, then the code will load this file as a pre-trained model.
Now you should be capable to do these operations successfully, and the SVM for classifying connected components in CDAInput could be trained by you!