Skip to content
This repository was archived by the owner on May 14, 2020. It is now read-only.
This repository was archived by the owner on May 14, 2020. It is now read-only.

1.0.0-M01 Functional Specification #19

@kbastani

Description

@kbastani

#1.0.0-M01 Functional Specification

The Graphify alpha release implements its own classifier, in the form of cosine similarity. The cosine similarity classifier will be pulled out and replaced by an assortment of classifiers from Apache Spark MLlib.

Decouple Machine Learning from Feature Extraction

There will be two types of classifiers available, binary classifiers and multi-class classifiers. For binary classifiers, logistic regression will be used. For multi-class classifiers, Naive Bayes will be used.

Graphify's core will become a multi-dimensional feature extraction and selection library. "Training" Graphify's graph model will be disambiguated and replaced with the term "Feature Extraction".

The training of learning models will now be done exclusively in the Apache Spark module of Graphify.

Feature Extraction Module

The feature extraction module learns features using hierarchical pattern recognition, described here: http://www.kennybastani.com/2014/07/using-3d-visualization-to-debug-graph.html

Users will be able to extract features by ingesting text of any length through a REST API endpoint.

Example:

Extract features

URL: http://localhost:7474/services/graphify/features/extract

POST:

{
   "label":[
      "Document classification"
   ],
   "text":[
      "Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: The content based approach and the request based approach."
   ]
}

The result of feature extraction builds upon previously extracted features, stored in Neo4j. Running this on an empty database yields the following model:

Graphify Data Model 1

Feature selection module

The feature selection module is used to aggregate features and build feature vectors that will be used to create machine learning models. In the alpha release this was relatively easy to do. You passed in your data, the learning model was automatically updated. While this was easy, it didn't provide much configuration. The feature selection module is meant to allow you to specify which feature vectors you'd like to prepare for training a classifier.

Select features

Once features have been extracted, the next step is to select those features into targets in order to build either a binary classifier or a multi-class classifier.

The result of feature selection produces either a binary target or a multi-class target. Targets are used to select features and to build feature vectors that will be used for building learning models.

Create binary feature target

URL: http://localhost:7474/services/graphify/targets/create

POST:

{
  "labels":[
    "positive",
    "negative"
  ],
  "type": "binary"
}

Response:

{
  "targetId": 1
}

Create multi-class feature target

URL: http://localhost:7474/services/graphify/targets/create

POST:

{
  "labels":[
    "invoice",
    "purchase order",
    "credit memo"
  ],
  "type": "multi"
}

Response:

{
  "targetId": 2
}

Training module

Once feature targets have been built, those targets are used to generate machine learning classifiers.

Build learning models

There are two types of learning models that ingest a feature target and generate a model that is used to classify text. Those types are specified when creating a target. The targetId preserves information on the type of classification algorithm to use. For binary classification, logistic regression is used. For multi-class classification, Naive Bayes is used.

Train learning model

URL: http://localhost:7474/services/graphify/models/train/{targetId}

Example POST to http://localhost:7474/services/graphify/models/train/1:

{
  "trainingRatio": .5
}
  • trainingRatio: The percent of the training data from feature extraction you would like to train on, the remaining data will be used to score the model's accuracy.

Response:

{
  "modelId": 1,
  "accuracy": 0.9652241686460808
}

Update learning model

URL: http://localhost:7474/services/graphify/models/update/{modelId}

POST:

{
  "trainingRatio": .6
}

Response:

{
  "modelId": 1,
  "accuracy": 0.981237734532345
}

Classification module

The classification module is used to predict a class from a machine learning model on some unlabled input.

Classify text

URL: http://localhost:7474/services/graphify/classify/{modelId}

Example POST to http://localhost:7474/services/graphify/classify/1:

{
  "text":"it is movies like these that make a jaded movie viewer thankful for the invention of the timex indiglo watch"
}

Response:

[
  {
  "label":"positive",
  "confidence":0.76
  },
  {
    "label":"negative",
    "confidence":0.24
  }
]

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions