Add GitLab and GitHub CI tests by rpgoldman · Pull Request #133 · logic-and-learning-lab/Popper

rpgoldman · 2026-01-05T19:56:13Z

These two tests aim to do the same thing. The GitLab actions run on SIFT's local GitLab repo, and the GitHub actions are intended to run on GitHub.

The tests use SIFT's prt regression test framework and run on the examples distributed with Popper. prt is had in the Docker image. The Docker image is built from the popper-prt repo, the source for which we will make available.

andrewcropper · 2026-02-02T09:14:37Z

is this different to #126 ?

rpgoldman · 2026-02-02T15:55:58Z

This is the better one. I need to check and see why adding setuptools as a dependency caused a test breakage here

[Notes mostly for myself:]

Here's the log with error and backtrace:
synthesis-length.log

Failure is here:

                    if size >= settings.max_literals:
                        assert(False)

I'm not sure that this is deterministically repeatable.

rpgoldman · 2026-02-02T17:42:02Z

I reran the tests and now they all pass. That implies there's something non-deterministic here. The open question is: "did I break something and make it non-deterministic, or was it always non-deterministic?"

If this is non-deterministic, and there's an error condition here, does that mean that this condition (the size being too large) should be handled differently? I.e., should Popper treat this as a normal kind of learning failure, backtrack, and try a different hypothesis (possibly with new constraints), or should Popper be tweaked so that this error condition cannot happen?

The secondary question is whether this behavior invalidates the test structure here. I.e., do you want to go ahead and merge this, and perhaps add a separate issue for fixing the size overflow? It's possible that we should treat this test as an expected failure, until that can be fixed (since we would like to be able to use the tests as a way to vet Popper changes).

You are obviously the person to make these judgment calls.

andrewcropper · 2026-02-03T04:57:16Z

The open question is: "did I break something and make it non-deterministic, or was it always non-deterministic?"
I do not know, but I have not seen this error before. And the code should never reach that condition.

I think some of your code has changed.

Looking at the log, when Popper generates this hypothesis:

2.8 s INFO: f(V0,V1):- zero(V1),empty(V0).
2.8 s INFO: f(V0,V1):- tail(V0,V3),f(V3,V2),succ(V2,V1).

Then it should have teach this code point and returned:
https://github.com/rpgoldman/Popper/blob/598a84b9567008da6eaf13fabd433741023d4274/popper/loop.py#L203

I do not see how any non-determinism could prevent that.

I just tried your code but I cannot replicate the crash.

This needed to get pkg_resources, which is used in Popper.

rpgoldman · 2026-02-04T01:08:59Z

@andrewcropper I've just run this 20 times locally, and the test on synthesis-length passes every time. I have just rebased this PR onto your updated main. The only differences between this PR and main now are in the github and gitlab actions and setup.py. So I cannot have done anything to break Popper itself.

I note that your fixes to alan.pl were not in place when the test failed. Perhaps that was related? At any rate, this looks safe to merge (see the "Files changed" to verify).

rpgoldman · 2026-02-04T02:29:42Z

OK, I just ran all the tests locally and everything looks fine. But I got a different failure on GitHub. I wonder if this has to do with running the tests in parallel on the GH runner? I will see if tweaking the way things run helps. So for now, I am marking this as draft again.

andrewcropper · 2026-02-04T11:13:20Z

I have no idea and I do not know what CI/GH runner etc are, but I am confident that the bug is not a standalone Popper one.

rpgoldman · 2026-02-04T13:22:29Z

I have no idea and I do not know what CI/GH runner etc are, but I am confident that the bug is not a standalone Popper one.

I agree — I’m not seeing this issue locally or in the GitLab CI.

GitHub runner has only 4 cores, so lower maxparallel from 16 to 4.

rpgoldman · 2026-02-04T23:14:48Z

@andrewcropper I think that I have fixed the problem on GitHub: the problem appears to have been related to the fact that I was trying to run 16 concurrent jobs on a machine (the GitHub "runner") that has only 4 cores.

I dropped the number of cores and now the tests pass reliably.

I hope that this will help you in maintaining and improving Popper, because it provides a way to detect changes that degrade the behavior of Popper. Of course, this kind of end-to-end test can only do so much, but it should make it easier to move 3 steps forward without moving 2 steps back!

andrewcropper · 2026-02-23T09:15:57Z

Super basic questions, which I are related to this message:

The tests use SIFT's prt regression test framework and run on the examples distributed with Popper. prt is had in the Docker image. The Docker image is built from the popper-prt repo, the source for which we will make available.

Q1. Where are the actual tests? In a docker image that is not yet available?
Q2. How are the tests actually ran?
Q3. How will this all be maintained? Does the docker image need revising when Popper changes?

rpgoldman force-pushed the main-with-tests branch 2 times, most recently from 31d060c to 598a84b Compare January 6, 2026 04:26

rpgoldman mentioned this pull request Feb 2, 2026

GitHub workflow for testing. #126

Closed

rpgoldman added 3 commits February 3, 2026 19:04

Add GitLab CI.

8426ae3

GitHub workflow for testing.

1a0508f

Add setuptools to dependencies.

82c531a

This needed to get pkg_resources, which is used in Popper.

rpgoldman force-pushed the main-with-tests branch from 598a84b to 82c531a Compare February 4, 2026 01:04

rpgoldman marked this pull request as ready for review February 4, 2026 01:11

Check for CPUs.

6f387cc

rpgoldman marked this pull request as draft February 4, 2026 02:30

Reduce number of cores.

fcfb5e3

GitHub runner has only 4 cores, so lower maxparallel from 16 to 4.

andrewcropper marked this pull request as ready for review February 23, 2026 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GitLab and GitHub CI tests#133

Add GitLab and GitHub CI tests#133
rpgoldman wants to merge 5 commits intologic-and-learning-lab:mainfrom
rpgoldman:main-with-tests

rpgoldman commented Jan 5, 2026

Uh oh!

andrewcropper commented Feb 2, 2026

Uh oh!

rpgoldman commented Feb 2, 2026

Uh oh!

rpgoldman commented Feb 2, 2026

Uh oh!

andrewcropper commented Feb 3, 2026

Uh oh!

rpgoldman commented Feb 4, 2026

Uh oh!

rpgoldman commented Feb 4, 2026

Uh oh!

andrewcropper commented Feb 4, 2026

Uh oh!

rpgoldman commented Feb 4, 2026

Uh oh!

rpgoldman commented Feb 4, 2026

Uh oh!

andrewcropper commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rpgoldman commented Jan 5, 2026

Uh oh!

andrewcropper commented Feb 2, 2026

Uh oh!

rpgoldman commented Feb 2, 2026

Uh oh!

rpgoldman commented Feb 2, 2026

Uh oh!

andrewcropper commented Feb 3, 2026

Uh oh!

rpgoldman commented Feb 4, 2026

Uh oh!

rpgoldman commented Feb 4, 2026

Uh oh!

andrewcropper commented Feb 4, 2026

Uh oh!

rpgoldman commented Feb 4, 2026

Uh oh!

rpgoldman commented Feb 4, 2026

Uh oh!

andrewcropper commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants