Skip to content

Conversation

@alajovic
Copy link
Contributor

This PR fixes several problems with the HTML formatter. The main issue with the current formatter is that the order of some element classes is reversed. To illustrate: save the following code to example.txt and run blark format -of html example.txt.

TYPE ST_MyStruct :
    STRUCT
        foo : BOOL;
    END_STRUCT
END_TYPE

In the output, you get (line breaks added for clarity)

<span class="term IDENTIFIER">
<span class="rule structure_type_declaration">
ST_MyStruct

This is incorrect: IDENTIFIER should have been nested within structure_type_declaration, not the other way around. The same happens with variable definitions:

<span class="rule var1_list">
<span class="rule var1">
<span class="term IDENTIFIER">
<span class="rule variable_name">
foo

Order of var1_list and var1 is correct, but IDENTIFIER again comes before variable_name.

In addition to that, the current HTML formatter has a few other undesirable properties:

  • It converts all spaces to &nbsp;. This significantly increases the output length, and it's not even necessary: the HTML output needs to be styled with CSS anyway, and in CSS, literal space preservation can be simply achieved with white-space: pre.
  • All HTML classes include either rule or term at the beginning. In my opinion, this just inflates the output with no real benefit.
  • Reserved characters like <, >, and & are passed through instead of being converted to entity names (&lt; etc.). This is not in accordance with the HTML spec.
  • Code origins (block.origin.identifier in python terms) are not included in the output. This becomes a problem when styling a HTML produced from an entire TcPOU, as there are no HTML entities that could be used as separators between different sections in the TcPOU, so it's hard to distinguish what belongs where.

There may be more; I don't really recall because I initially started working on this a while ago.

The code in this PR fixes all of that. Since lxml is already one of blark's main dependencies and thus comes "for free", I chose to use it internally. The resulting HTML is thus guaranteed to be syntactically correct. I have added three new HTML classes to make styling easier:

  • class blark: This wraps the entire HTML output.
  • class blark_origin: Contains the code origin. Can be styled with display: none if one doesn't want to see it.
  • class blark_code: Contains the code itself.

Please review and let me know if you identify any potential issues.

@klauer
Copy link
Owner

klauer commented Oct 28, 2025

This looks great overall! Thanks for the nice contribution. I agree with your assessment and your careful choices in the refactor.

Would you be willing to add a couple simple tests to the suite to get a bit of coverage on this? My recollection is that there wasn't any (sorry, checking on the phone isn't so easy)

@alajovic
Copy link
Contributor Author

HTML formatter does not currently seem to have any tests. I can try adding some. Any wishes/pointers?

@klauer
Copy link
Owner

klauer commented Oct 30, 2025

A simple smoke test that runs some basic structured text through the formatter would be perfectly acceptable. It would be primarily just to get coverage on the lines.

If you want to take it a step further to ensure that there aren't any regressions in the process, validating a couple pairs of source code against the expected XML objects (or output source I suppose) would be great!

I'll leave it up to your ambitions. Regardless of which you choose it's much appreciated.

@klauer
Copy link
Owner

klauer commented Dec 3, 2025

Turns out the existing blark CLI tests did cover these lines by way of --output-format that I had forgotten about.

Testing this in isolation wasn't as simple as I recalled either, so I did some updates adding explicit html output smoke tests in a separate branch. I'll merge it (along with your changes) shortly.

Did you by chance come up with some decent CSS to display the output of this?

Separately, I was wondering if a "compact" output mode of sorts that shows the top-level and bottom-level class might be useful. That is to say, seeing expression, assignment_expression, ..., expression_term, unary_expression, UNARY_OPERATOR: - (minus) is - while accurate with respect to the grammar - overly verbose. Perhaps being able to squash that to expression, unary_expression could be an alternative? Or is it useful as-is in its current form to you?

klauer added a commit that referenced this pull request Dec 3, 2025
CI maintenance and adding testing to #118
@klauer klauer merged commit b61bfae into klauer:master Dec 3, 2025
4 of 19 checks passed
@alajovic
Copy link
Contributor Author

alajovic commented Dec 4, 2025

Thanks for bringing this to a closure. It was still somewhere in the back of my mind, but the depths of The Backlog of Little Things are deceptive and perilous...

I experimented with CSS for a while, but never really pushed things past the cobbled-together level of quality. So I don't think it's worth sharing in the current form.

Syntax highlighting was actually my initial motivation for playing with blark. True story: when I discovered in Beckhoff's documentation that it was possible to turn on syntax highlighting in TwinCAT, I rushed into the settings menu, found the setting – and realized that it was already enabled! The highlighting was so bleak that I had been genuinely convinced that it wasn't on. xD

As for the compacting of HTML classes, I don't think it's necessary or even too desirable. The current granularity does inflate the output size to some extent, but I don't think it's too bad. And in CSS, it's generally better to have more options to style over than less.

@klauer
Copy link
Owner

klauer commented Dec 8, 2025

Thanks again for the contribution!

the depths of The Backlog of Little Things are deceptive and perilous...

I find myself lost there more often than not, too...

The highlighting was so bleak that I had been genuinely convinced that it wasn't on. xD

It really is bad in the TwinCAT IDE, isn't it? There's so much potential for quality-of-life improvements there that never seem to make it in new releases, sadly.

And in CSS, it's generally better to have more options to style over than less.

Works for me! I shy away from front-end web stuff for the most part, so your perspective here is much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants