Skip to content

Add GitLab and GitHub CI tests#133

Open
rpgoldman wants to merge 5 commits intologic-and-learning-lab:mainfrom
rpgoldman:main-with-tests
Open

Add GitLab and GitHub CI tests#133
rpgoldman wants to merge 5 commits intologic-and-learning-lab:mainfrom
rpgoldman:main-with-tests

Conversation

@rpgoldman
Copy link
Contributor

These two tests aim to do the same thing. The GitLab actions run on SIFT's local GitLab repo, and the GitHub actions are intended to run on GitHub.

The tests use SIFT's prt regression test framework and run on the examples distributed with Popper. prt is had in the Docker image. The Docker image is built from the popper-prt repo, the source for which we will make available.

@rpgoldman rpgoldman force-pushed the main-with-tests branch 2 times, most recently from 31d060c to 598a84b Compare January 6, 2026 04:26
@andrewcropper
Copy link
Collaborator

is this different to #126 ?

@rpgoldman
Copy link
Contributor Author

This is the better one. I need to check and see why adding setuptools as a dependency caused a test breakage here

[Notes mostly for myself:]

Here's the log with error and backtrace:
synthesis-length.log

Failure is here:

                    if size >= settings.max_literals:
                        assert(False)

I'm not sure that this is deterministically repeatable.

@rpgoldman
Copy link
Contributor Author

I reran the tests and now they all pass. That implies there's something non-deterministic here. The open question is: "did I break something and make it non-deterministic, or was it always non-deterministic?"

If this is non-deterministic, and there's an error condition here, does that mean that this condition (the size being too large) should be handled differently? I.e., should Popper treat this as a normal kind of learning failure, backtrack, and try a different hypothesis (possibly with new constraints), or should Popper be tweaked so that this error condition cannot happen?

The secondary question is whether this behavior invalidates the test structure here. I.e., do you want to go ahead and merge this, and perhaps add a separate issue for fixing the size overflow? It's possible that we should treat this test as an expected failure, until that can be fixed (since we would like to be able to use the tests as a way to vet Popper changes).

You are obviously the person to make these judgment calls.

@andrewcropper
Copy link
Collaborator

The open question is: "did I break something and make it non-deterministic, or was it always non-deterministic?"
I do not know, but I have not seen this error before. And the code should never reach that condition.

I think some of your code has changed.

Looking at the log, when Popper generates this hypothesis:

2.8 s INFO: f(V0,V1):- zero(V1),empty(V0).
2.8 s INFO: f(V0,V1):- tail(V0,V3),f(V3,V2),succ(V2,V1).

Then it should have teach this code point and returned:
https://github.com/rpgoldman/Popper/blob/598a84b9567008da6eaf13fabd433741023d4274/popper/loop.py#L203

I do not see how any non-determinism could prevent that.

I just tried your code but I cannot replicate the crash.

@rpgoldman
Copy link
Contributor Author

@andrewcropper I've just run this 20 times locally, and the test on synthesis-length passes every time. I have just rebased this PR onto your updated main. The only differences between this PR and main now are in the github and gitlab actions and setup.py. So I cannot have done anything to break Popper itself.

I note that your fixes to alan.pl were not in place when the test failed. Perhaps that was related? At any rate, this looks safe to merge (see the "Files changed" to verify).

@rpgoldman rpgoldman marked this pull request as ready for review February 4, 2026 01:11
@rpgoldman
Copy link
Contributor Author

OK, I just ran all the tests locally and everything looks fine. But I got a different failure on GitHub. I wonder if this has to do with running the tests in parallel on the GH runner? I will see if tweaking the way things run helps. So for now, I am marking this as draft again.

@rpgoldman rpgoldman marked this pull request as draft February 4, 2026 02:30
@andrewcropper
Copy link
Collaborator

I have no idea and I do not know what CI/GH runner etc are, but I am confident that the bug is not a standalone Popper one.

@rpgoldman
Copy link
Contributor Author

I have no idea and I do not know what CI/GH runner etc are, but I am confident that the bug is not a standalone Popper one.

I agree — I’m not seeing this issue locally or in the GitLab CI.

GitHub runner has only 4 cores, so lower maxparallel from 16 to 4.
@rpgoldman
Copy link
Contributor Author

@andrewcropper I think that I have fixed the problem on GitHub: the problem appears to have been related to the fact that I was trying to run 16 concurrent jobs on a machine (the GitHub "runner") that has only 4 cores.

I dropped the number of cores and now the tests pass reliably.

I hope that this will help you in maintaining and improving Popper, because it provides a way to detect changes that degrade the behavior of Popper. Of course, this kind of end-to-end test can only do so much, but it should make it easier to move 3 steps forward without moving 2 steps back!

@andrewcropper andrewcropper marked this pull request as ready for review February 23, 2026 09:07
@andrewcropper
Copy link
Collaborator

Super basic questions, which I are related to this message:

The tests use SIFT's prt regression test framework and run on the examples distributed with Popper. prt is had in the Docker image. The Docker image is built from the popper-prt repo, the source for which we will make available.

Q1. Where are the actual tests? In a docker image that is not yet available?
Q2. How are the tests actually ran?
Q3. How will this all be maintained? Does the docker image need revising when Popper changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants