1.0.0-M01 Functional Specification

#1.0.0-M01 Functional Specification

The Graphify alpha release implements its own classifier, in the form of cosine similarity. The cosine similarity classifier will be pulled out and replaced by an assortment of classifiers from Apache Spark MLlib.
## Decouple Machine Learning from Feature Extraction

There will be two types of classifiers available, binary classifiers and multi-class classifiers. For binary classifiers, logistic regression will be used. For multi-class classifiers, Naive Bayes will be used.

Graphify's core will become a multi-dimensional feature extraction and selection library. "Training" Graphify's graph model will be disambiguated and replaced with the term "Feature Extraction".

The training of learning models will now be done exclusively in the Apache Spark module of Graphify.
## Feature Extraction Module

The feature extraction module learns features using hierarchical pattern recognition, described here: http://www.kennybastani.com/2014/07/using-3d-visualization-to-debug-graph.html

Users will be able to extract features by ingesting text of any length through a REST API endpoint.

Example:
### Extract features

URL: `http://localhost:7474/services/graphify/features/extract`

POST:

```
{
   "label":[
      "Document classification"
   ],
   "text":[
      "Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: The content based approach and the request based approach."
   ]
}
```

The result of feature extraction builds upon previously extracted features, stored in Neo4j. Running this on an empty database yields the following model:

![Graphify Data Model 1](http://i.imgur.com/x4WeGtG.png?1)
## Feature selection module

The feature selection module is used to aggregate features and build feature vectors that will be used to create machine learning models. In the alpha release this was relatively easy to do. You passed in your data, the learning model was automatically updated. While this was easy, it didn't provide much configuration. The feature selection module is meant to allow you to specify which feature vectors you'd like to prepare for training a classifier.
### Select features

Once features have been extracted, the next step is to select those features into targets in order to build either a binary classifier or a multi-class classifier.

The result of feature selection produces either a `binary target` or a `multi-class target`. Targets are used to select features and to build feature vectors that will be used for building learning models.
#### Create binary feature target

URL: `http://localhost:7474/services/graphify/targets/create`

POST:

```
{
  "labels":[
    "positive",
    "negative"
  ],
  "type": "binary"
}
```

Response:

```
{
  "targetId": 1
}
```
#### Create multi-class feature target

URL: `http://localhost:7474/services/graphify/targets/create`

POST:

```
{
  "labels":[
    "invoice",
    "purchase order",
    "credit memo"
  ],
  "type": "multi"
}
```

Response:

```
{
  "targetId": 2
}
```
## Training module

Once feature targets have been built, those targets are used to generate machine learning classifiers.
### Build learning models

There are two types of learning models that ingest a feature target and generate a model that is used to classify text. Those types are specified when creating a target. The `targetId` preserves information on the type of classification algorithm to use. For binary classification, logistic regression is used. For multi-class classification, Naive Bayes is used.
#### Train learning model

URL: `http://localhost:7474/services/graphify/models/train/{targetId}`

Example POST to `http://localhost:7474/services/graphify/models/train/1`:

```
{
  "trainingRatio": .5
}
```
- `trainingRatio`: The percent of the training data from feature extraction you would like to train on, the remaining data will be used to score the model's accuracy.

Response:

```
{
  "modelId": 1,
  "accuracy": 0.9652241686460808
}
```
#### Update learning model

URL: `http://localhost:7474/services/graphify/models/update/{modelId}`

POST:

```
{
  "trainingRatio": .6
}
```

Response:

```
{
  "modelId": 1,
  "accuracy": 0.981237734532345
}
```
## Classification module

The classification module is used to predict a class from a machine learning model on some unlabled input.
### Classify text

URL: `http://localhost:7474/services/graphify/classify/{modelId}`

Example POST to `http://localhost:7474/services/graphify/classify/1`:

```
{
  "text":"it is movies like these that make a jaded movie viewer thankful for the invention of the timex indiglo watch"
}
```

Response:

```
[
  {
  "label":"positive",
  "confidence":0.76
  },
  {
    "label":"negative",
    "confidence":0.24
  }
]
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.0.0-M01 Functional Specification #19

Decouple Machine Learning from Feature Extraction

Feature Extraction Module

Extract features

Feature selection module

Select features

Create binary feature target

Create multi-class feature target

Training module

Build learning models

Train learning model

Update learning model

Classification module

Classify text

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

1.0.0-M01 Functional Specification #19

Description

Decouple Machine Learning from Feature Extraction

Feature Extraction Module

Extract features

Feature selection module

Select features

Create binary feature target

Create multi-class feature target

Training module

Build learning models

Train learning model

Update learning model

Classification module

Classify text

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions