Skip to content

Conversation

@bbannier
Copy link
Member

@bbannier bbannier commented Jan 6, 2026

This is basically what if zeek-cut and btest-diff had a baby. The intended use case is to give users an easy to use tool which helps avoid baselining full Zeek logs.

@bbannier bbannier self-assigned this Jan 6, 2026
@bbannier bbannier force-pushed the topic/bbannier/btest-diff-columns branch 5 times, most recently from bc8846f to 76cc235 Compare January 6, 2026 19:30
@bbannier bbannier marked this pull request as ready for review January 6, 2026 19:34
@bbannier bbannier force-pushed the topic/bbannier/btest-diff-columns branch 6 times, most recently from 3ac6a15 to 96bc2e5 Compare January 6, 2026 21:18
@bbannier bbannier changed the title Add btest-diff-columns Add btest-diff-cut Jan 6, 2026
@bbannier bbannier force-pushed the topic/bbannier/btest-diff-columns branch from 96bc2e5 to c8bd835 Compare January 6, 2026 21:19
Copy link
Contributor

@awelzel awelzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main feedback is for removing the -f requirement and naming the env variable TEST_DIFF_CUTTER so it looks good when used together with TEST_DIFF_CANONIFIER in one line.

The other comments are pretty nitty: EAFP and naming.

@bbannier bbannier force-pushed the topic/bbannier/btest-diff-columns branch from c8bd835 to 5604d67 Compare January 7, 2026 10:04
@bbannier bbannier requested a review from awelzel January 7, 2026 10:07
@bbannier bbannier force-pushed the topic/bbannier/btest-diff-columns branch from 5604d67 to 703680b Compare January 7, 2026 10:10
@awelzel
Copy link
Contributor

awelzel commented Jan 7, 2026

Very nitty: The commit message mentiones btest-diff-columns :-)

@bbannier bbannier force-pushed the topic/bbannier/btest-diff-columns branch 5 times, most recently from a6ec62f to 95bc091 Compare January 7, 2026 11:39
Copy link
Contributor

@awelzel awelzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but should maybe have a look from @rsmmr or @ckreibich . I think this will help a lot. Thanks for doing the work!

This is basically _what if `zeek-cut` and `btest-diff` had a baby_. The
intended use case is to give users an easy to use tool which helps avoid
baselining full Zeek logs.
I also fixed deprecation warnings around how we specify the license in
`pyproject.toml`.
@bbannier bbannier force-pushed the topic/bbannier/btest-diff-columns branch from 95bc091 to 975789f Compare January 7, 2026 11:58
@rsmmr
Copy link
Member

rsmmr commented Jan 7, 2026

I think it would be better for this to live inside the Zeek tree. The use case is very Zeek-specific, and then we cut just hardcode it for what it's needed there (i.e., using zeek-cut).

@bbannier
Copy link
Member Author

bbannier commented Jan 7, 2026

I think it would be better for this to live inside the Zeek tree. The use case is very Zeek-specific, and then we cut just hardcode it for what it's needed there (i.e., using zeek-cut).

As written this is actually more generic than the name btest-diff-cut suggest and implements support for arbitrary preprocessors for btest-diff, e.g., it can work with zeek-cut for Zeek TSV logs, or jq for JSON. That said, I don't feel strongly about were this lives, and most users probably use BTest together with Zeek anyway. Since I always want the latest I install BTest separately, but that might not be very common.

@ckreibich
Copy link
Member

ckreibich commented Jan 8, 2026

Looking at this again I have a question — take this example from one of our btests:

# @TEST-EXEC: zeek-cut -m ts uid id.orig_h id.orig_p id.resp_h id.resp_p id.ctx.inits id.inits proto service orig_pkts resp_pkts < conn.log > conn.log.cut
# @TEST-EXEC: btest-diff conn.log.cut

The idea is that we'd save the detour through the .cut file as follows, yes?

# @TEST-EXEC: TEST_DIFF_CUTTER=zeek-cut btest-diff-cut -m ts uid id.orig_h id.orig_p id.resp_h id.resp_p id.ctx.inits id.inits proto service orig_pkts resp_pkts conn.log

If so, this seems very similar to saying

# @TEST-EXEC: TEST_DIFF_CANONIFIER="zeek-cut -m ts uid id.orig_h id.orig_p id.resp_h id.resp_p id.ctx.inits id.inits proto service orig_pkts resp_pkts" btest-diff conn.log

so I wonder whether this is worth it. If you guys think so, it's fine with me, but then I'm inclined to agree with Robin to make this fully about zeek-cut, which seems will likely be the bulk of the use.

@bbannier
Copy link
Member Author

bbannier commented Jan 8, 2026

If so, this seems very similar to saying

# @TEST-EXEC: TEST_DIFF_CANONIFIER="zeek-cut -m ts uid id.orig_h id.orig_p id.resp_h id.resp_p id.ctx.inits id.inits proto service orig_pkts resp_pkts" btest-diff conn.log

Yes, exactly that. The only real difference is that this script would allow putting TEST_DIFF_CUTTER=zeek-cut into btest.cfg once and then use this script consistently instead of naked btest-diff.

so I wonder whether this is worth it. If you guys think so, it's fine with me, but then I'm inclined to agree with Robin to make this fully about zeek-cut, which seems will likely be the bulk of the use.

The real question is why even with this available users are still tripped up by us adding columns to logs since their tests baseline arbitrary logs. The idea here was to make being explicit about "interesting data" (and by extension: the expected schema) more "natural" so users are nudged towards writing more robust tests. I will close this, but we should probably still think about how to address this (more documentation, updated templates, ...?).

@bbannier bbannier closed this Jan 8, 2026
@bbannier bbannier deleted the topic/bbannier/btest-diff-columns branch January 8, 2026 07:10
@awelzel
Copy link
Contributor

awelzel commented Jan 8, 2026

I will close this

😞

The usage in Zeek or packages would be more like this:

# @TEST-EXEC: btest-diff-cut -m uid service conn.log

# @TEST-EXEC: TEST_DIFF_CANNONIFIER='zeek-cut -m uid service' btest-diff conn.log

And yes, I think the reduction in noise makes a difference if used all over the place - particularly if you want a second cannonifier and pipes. I'd actually rather have users come up with their own helpers than us recommending the second way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants