-
Notifications
You must be signed in to change notification settings - Fork 0
Writing An Analytic Module
Analytic modules are one of two ways to extend MiddGuard for your investigation. They transform input data and save the changes to a database. The other type of extension is a visualization module, which creates a view of some data in the browser. As a general rule, if you're not creating a visualization, you want to write an analytic module.
Analytic modules are the backing code for analytic nodes. In MiddGuard's front-end environment, we create nodes from modules and connect nodes to one another. Multiple nodes can be based on the same module. A node uses a combination of its module and its connections to other nodes to perform a transformation on some input and save it as output. That output becomes the input for other nodes. Each node has a one-to-one relationship to a table in MiddGuard's database.
Analytic modules only need one JavaScript file, but can be as complex as
necessary. Although you don't need to, it's useful to place that file in a
directory with the module's name to allow the module to grow to multiple files
and stay organized. That directory lives adjacent to the main app.js.
.
├── app.js
└── tweet-timestamp-difference
└── index.js
Our module tweet-timestamp-difference will take in information from two
Twitter users' timelines and calculate the difference between the number of
tweets the two people sent at each day of week/hour of the day.
We can assume we have already written modules to retrieve the tweets and aggregate them by day of the week and hour.
inputs declare what the module takes in. Each element of the array, an input
group, refers to the output from an analytic node. Input groups are named so we
can refer to them later on. Each input group also enumerates the attributes it
uses. Each attribute is a column in the corresponding output from the connected
node.
We have two input groups, tweets1 and tweets2. Each has attributes day,
hour, and count: the day of week, hour of day, and number of tweets a person
sent at that day and hour.
exports.inputs = [
{name: 'tweets1', inputs: ['day', 'hour', 'count']},
{name: 'tweets2', inputs: ['day', 'hour', 'count']}
];Here's an example of what the incoming data for tweets1 might look like in
tabular form:
| day | hour | count |
|---|---|---|
| 0 | 0 | 56 |
| 0 | 1 | 34 |
| 0 | 2 | 78 |
outputs enumerates the attributes for each element that this module outputs.
Each element is a row inserted directly into a node's database table. Each node
only has one output (its table), so there is no need here for multiple "output
groups". In the front-end interface we line up input attributes (like day,
hour, and count in the inputs) and output attributes like the ones below.
Our output attributes are a day of the week, an hour of the day, the number of
tweets sent by two different people at that day and hour (count1 and
count2), and the difference between count1 and count2.
exports.outputs = [
'day',
'hour',
'count1',
'count2',
'difference'
];An example of the output data for our module in tabular form is:
| day | hour | count1 | count2 | difference |
|---|---|---|---|---|
| 0 | 0 | 56 | 23 | 33 |
| 0 | 1 | 34 | 20 | 14 |
| 0 | 2 | 78 | 54 | 24 |
displayName is a string containing a prettier version of the module's name in
the file system. This is how the module and its nodes will be identified
throughout the front-end interface.
Good display names should be short and descriptive. We'll name ours by replacing hyphens with spaces and capitalizing the words.
exports.displayName = 'Tweet Timestamp Difference';createTable is a function used to create the tables for nodes based on this
module. The function is always passed two arguments, tableName and knex.
tableName is the name MiddGuard has assigned the node's table at runtime.
Modifying it here before creating the table will make MiddGuard unable to find
the table later.
knex is an instance of Knex, a SQL generator.
With Knex, we can use many relational databases and write the same code to
perform SQL statements. This instance of Knex is already connected to the
MiddGuard database.
Its schema-building functions, like knex.schema.createTable return
Promises. Creating tables in
the database is an asynchronous operation so createTable must return a
Promise so MiddGuard knows Knex is done creating the table.
The createTable function should closely resemble the module's outputs, since
each output needs a column in the table.
exports.createTable = function(tableName, knex) {
return knex.schema.createTable(tableName, function(table) {
table.integer('day');
table.integer('hour');
table.integer('count1');
table.integer('count2');
table.integer('difference');
});
};handle is the function called to transform input data into output data. It is
the core of our module.
Like createTable, handle is passed a variable defined by MiddGuard,
context. context contains all the information about how a node based on this
module is connected to other nodes in the same graph. See the
context guide for more details and examples.
The important parts of context for our function are context.inputs and
context.table. For each input our module accepts, context.inputs has a key
with that input's name. These are context.inputs.tweets1 and
context.inputs.tweets2. tweets1 and tweets2 have Knex database connections
already assigned to the tables where tweets1 and tweets2 are stored,
respectively.
context.table is the module's output. Like each of the inputs in
context.inputs it has a Knex connection used to insert rows into a node's
table (or run any other query on the table).
It's useful to see our inputs and outputs alongside the handle function as a reference for the context.
exports.inputs = [
{name: 'tweets1', inputs: ['day', 'hour', 'count']},
{name: 'tweets2', inputs: ['day', 'hour', 'count']}
];
exports.outputs = ['day', 'hour', 'count1', 'count2', 'difference'];And the handle function, annotated.
exports.handle = function(context) {
// We'll insert objects with each of the output attributes into
// the `week` array, then insert `week` into `context.table`.
var tweets1 = context.inputs.tweets1,
tweets2 = context.inputs.tweets2,
week = [];
// Select everything from each of the inputs (tweets1 and tweets2).
return Promise.join(tweets1.knex.select('*'), tweets2.knex.select('*'),
function(tweets1, tweets2) {
// Iterate through the hours of the day and days of the week
_.range(24).forEach(function(hour) {
_.range(7).forEach(function(day) {
// Get the count of tweets from tweets1 and tweets2 at that hour and day
// and add it to the `week` array.
var count1 = _.find(tweets1, {hour: hour, day: day}).count;
var count2 = _.find(tweets2, {hour: hour, day: day}).count;
week.push({
day: day,
hour: hour,
count1: count1,
count2: count2,
difference: Math.abs(count1 - count2)
});
});
});
// Insert everything into the table.
return context.table.knex.insert(week);
});
};Here's the complete contents of index.js in tweet-timestamp-difference.
.
├── app.js
└── tweet-timestamp-difference
└── index.js
Note that we use two external dependencies, lodash and
bluebird. We can
require these in the
module just like in any other Node.js module.
var _ = require('lodash');
var Promise = require('bluebird');
exports.inputs = [
{name: 'tweets1', inputs: ['day', 'hour', 'count']},
{name: 'tweets2', inputs: ['day', 'hour', 'count']}
];
exports.outputs = [
'day',
'hour',
'count1',
'count2',
'difference'
];
exports.displayName = 'Tweet Timestamp Difference';
exports.createTable = function(tableName, knex) {
return knex.schema.createTable(tableName, function(table) {
table.integer('day');
table.integer('hour');
table.integer('count1');
table.integer('count2');
table.integer('difference');
});
};
exports.handle = function(context) {
var tweets1 = context.inputs.tweets1,
tweets2 = context.inputs.tweets2,
week = [];
return Promise.join(tweets1.knex.select('*'), tweets2.knex.select('*'),
function(tweets1, tweets2) {
_.range(24).forEach(function(hour) {
_.range(7).forEach(function(day) {
var count1 = _.find(tweets1, {hour: hour, day: day}).count;
var count2 = _.find(tweets2, {hour: hour, day: day}).count;
week.push({
day: day,
hour: hour,
count1: count1,
count2: count2,
difference: Math.abs(count1 - count2)
});
});
});
return context.table.knex.insert(week);
});
};