Improved HTML formatter #118

alajovic · 2025-10-22T17:35:39Z

This PR fixes several problems with the HTML formatter. The main issue with the current formatter is that the order of some element classes is reversed. To illustrate: save the following code to example.txt and run blark format -of html example.txt.

TYPE ST_MyStruct :
    STRUCT
        foo : BOOL;
    END_STRUCT
END_TYPE

In the output, you get (line breaks added for clarity)

<span class="term IDENTIFIER">
<span class="rule structure_type_declaration">
ST_MyStruct

This is incorrect: IDENTIFIER should have been nested within structure_type_declaration, not the other way around. The same happens with variable definitions:

<span class="rule var1_list">
<span class="rule var1">
<span class="term IDENTIFIER">
<span class="rule variable_name">
foo

Order of var1_list and var1 is correct, but IDENTIFIER again comes before variable_name.

In addition to that, the current HTML formatter has a few other undesirable properties:

It converts all spaces to  . This significantly increases the output length, and it's not even necessary: the HTML output needs to be styled with CSS anyway, and in CSS, literal space preservation can be simply achieved with white-space: pre.
All HTML classes include either rule or term at the beginning. In my opinion, this just inflates the output with no real benefit.
Reserved characters like <, >, and & are passed through instead of being converted to entity names (< etc.). This is not in accordance with the HTML spec.
Code origins (block.origin.identifier in python terms) are not included in the output. This becomes a problem when styling a HTML produced from an entire TcPOU, as there are no HTML entities that could be used as separators between different sections in the TcPOU, so it's hard to distinguish what belongs where.

There may be more; I don't really recall because I initially started working on this a while ago.

The code in this PR fixes all of that. Since lxml is already one of blark's main dependencies and thus comes "for free", I chose to use it internally. The resulting HTML is thus guaranteed to be syntactically correct. I have added three new HTML classes to make styling easier:

class blark: This wraps the entire HTML output.
class blark_origin: Contains the code origin. Can be styled with display: none if one doesn't want to see it.
class blark_code: Contains the code itself.

Please review and let me know if you identify any potential issues.

klauer · 2025-10-28T23:56:46Z

This looks great overall! Thanks for the nice contribution. I agree with your assessment and your careful choices in the refactor.

Would you be willing to add a couple simple tests to the suite to get a bit of coverage on this? My recollection is that there wasn't any (sorry, checking on the phone isn't so easy)

alajovic · 2025-10-29T12:27:27Z

HTML formatter does not currently seem to have any tests. I can try adding some. Any wishes/pointers?

klauer · 2025-10-30T19:10:39Z

A simple smoke test that runs some basic structured text through the formatter would be perfectly acceptable. It would be primarily just to get coverage on the lines.

If you want to take it a step further to ensure that there aren't any regressions in the process, validating a couple pairs of source code against the expected XML objects (or output source I suppose) would be great!

I'll leave it up to your ambitions. Regardless of which you choose it's much appreciated.

klauer · 2025-12-03T16:16:16Z

Turns out the existing blark CLI tests did cover these lines by way of --output-format that I had forgotten about.

Testing this in isolation wasn't as simple as I recalled either, so I did some updates adding explicit html output smoke tests in a separate branch. I'll merge it (along with your changes) shortly.

Did you by chance come up with some decent CSS to display the output of this?

Separately, I was wondering if a "compact" output mode of sorts that shows the top-level and bottom-level class might be useful. That is to say, seeing expression, assignment_expression, ..., expression_term, unary_expression, UNARY_OPERATOR: - (minus) is - while accurate with respect to the grammar - overly verbose. Perhaps being able to squash that to expression, unary_expression could be an alternative? Or is it useful as-is in its current form to you?

CI maintenance and adding testing to #118

alajovic · 2025-12-04T23:10:30Z

Thanks for bringing this to a closure. It was still somewhere in the back of my mind, but the depths of The Backlog of Little Things are deceptive and perilous...

I experimented with CSS for a while, but never really pushed things past the cobbled-together level of quality. So I don't think it's worth sharing in the current form.

Syntax highlighting was actually my initial motivation for playing with blark. True story: when I discovered in Beckhoff's documentation that it was possible to turn on syntax highlighting in TwinCAT, I rushed into the settings menu, found the setting – and realized that it was already enabled! The highlighting was so bleak that I had been genuinely convinced that it wasn't on. xD

As for the compacting of HTML classes, I don't think it's necessary or even too desirable. The current granularity does inflate the output size to some extent, but I don't think it's too bad. And in CSS, it's generally better to have more options to style over than less.

klauer · 2025-12-08T18:01:53Z

Thanks again for the contribution!

the depths of The Backlog of Little Things are deceptive and perilous...

I find myself lost there more often than not, too...

The highlighting was so bleak that I had been genuinely convinced that it wasn't on. xD

It really is bad in the TwinCAT IDE, isn't it? There's so much potential for quality-of-life improvements there that never seem to make it in new releases, sadly.

And in CSS, it's generally better to have more options to style over than less.

Works for me! I shy away from front-end web stuff for the most part, so your perspective here is much appreciated.

Improved HTML formatter

b61bfae

klauer added a commit that referenced this pull request Dec 3, 2025

Merge pull request #120 from klauer/pr118_update

7834929

CI maintenance and adding testing to #118

klauer merged commit b61bfae into klauer:master Dec 3, 2025
4 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved HTML formatter #118

Improved HTML formatter #118

alajovic commented Oct 22, 2025

Uh oh!

klauer commented Oct 28, 2025

Uh oh!

alajovic commented Oct 29, 2025

Uh oh!

klauer commented Oct 30, 2025

Uh oh!

klauer commented Dec 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

alajovic commented Dec 4, 2025

Uh oh!

klauer commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improved HTML formatter #118

Improved HTML formatter #118

Conversation

alajovic commented Oct 22, 2025

Uh oh!

klauer commented Oct 28, 2025

Uh oh!

alajovic commented Oct 29, 2025

Uh oh!

klauer commented Oct 30, 2025

Uh oh!

klauer commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alajovic commented Dec 4, 2025

Uh oh!

klauer commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

klauer commented Dec 3, 2025 •

edited

Loading