Introduction to visualization

vg allows you to visualize graphs by generating a graphviz-formatted output stream. Here, I'll show you some techniques that are available for the visualization of various elements in the vg universe.

Visualization example

Basic visualization in vg can be accomplished by processing the graphviz output from vg view. For example, starting in the test/ directory, we can use the following to visualize the graph:

vg construct -v tiny/tiny.vcf.gz -r tiny/tiny.fa \
    | vg view -d - \
    | dot -Tsvg -o x.svg

pangenome graph with a few sites of variation

You will need to install graphviz tools (such as via sudo apt-get install graphviz on linux).

Understand bidirectional sequence graph representation

Variation graphs in vg are "train track graphs"---

They implicitly include their reverse complement, and also edges can "reverse" and go from the forward to reverse strand, akin to the way that two rails on a train track work.

This lets us represent inversions without duplicating the inverted sequence, which achieves one of the goals of using variation graphs: that annotations and information about variation can be represented with minimal duplication. Similarly, it means we can avoid the multiple mapping problem that might result if there were two disparate positions in the graph that encode a particular sequence. Finally, this is a standard way of modeling graphs and exactly matches the model encoded in GFA, so this ensures we can use graphs from any source that produces GFA.

If we refer to the two sides of our nodes as "start" and "end", if we go from the start to the end in the forward direction and from the end to the start in the reverse, and if we allow edges to connect either of the two ways then we get four types of edges.

We record the edge type in protobuf/json by indicating which ends are in the non-standard orientation using the from_start and to_end flags.

The default goes "from the end to the start", or "from_start": false, "to_end": false in our serialization format:

Where if we want to express an transition from the forward strand of one node to the reverse of the next, we'd say "from_start": false, "to_end": true:

And a transition from the reverse strand of one node to the forward of the next, would be "from_start": true, "to_end": false:

We render the reverse strand version of the default ("from_start": true, "to_end": true) the same way as the forward strand, as it doesn't provide any additional information. It's just an alternative way of saying the same thing.

These edge types can all be represented in vg format, in GFA, and also in the graphviz output which was used to render these images. In the graphviz output, the different types of edges are modeled using graphviz ports, which let us attach an edge to a particular corner of a node.

Note that in practice we don't usually need to render the node arrows, although this can sometimes help with ambiguous visualizations as in the preceding examples. You can add them back in by piping the graphviz output from vg through sed s/arrowhead=none/arrowhead=normal/g.

Some complex examples

Now that you've learned what these graph representations mean, you can understand these examples. There are a few test cases which we've used during the extension of vg to handle cyclic graphs and bidirectional edges. They exhibit a mixture of cases that were initially problematic, and here can provide an example of what's possible to express and visualize with vg:

weird looking twisty graph

Introduction to visualization

Visualization example

Understand bidirectional sequence graph representation

Some complex examples

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally