-
Notifications
You must be signed in to change notification settings - Fork 5
Add Nonogram Task #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Nonogram Task #183
Conversation
Sarajir
commented
Dec 20, 2025
- Implement Nonogram puzzle task for VMEvalKit
- Support 7 pattern types: cross, square, circle, checkerboard, letter_t, diagonal, random
- Difficulty scaling: easy (5-6x6), medium (7-10x10), hard (12-15x15)
- Ensure uniqueness and no empty rows/columns
- Register task in TASK_CATALOG.py
- Complete documentation in NONOGRAM.md
- Implement Nonogram puzzle task for VMEvalKit - Support 7 pattern types: cross, square, circle, checkerboard, letter_t, diagonal, random - Difficulty scaling: easy (5-6x6), medium (7-10x10), hard (12-15x15) - Ensure uniqueness and no empty rows/columns - Register task in TASK_CATALOG.py - Complete documentation in NONOGRAM.md
LukeLIN-web
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, it contains a lot of cases, please do a comprehensive test, check different cases.
| "Keep the camera view fixed in the top-down perspective and maintain the grid structure unchanged. " | ||
| "Stop the video when all cells are correctly filled and the complete pattern is revealed." | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write a evaluator prompt
- Add nonogram_task evaluation prompt to eval_prompt.py - Fix missing comma after subway_pathfinding_task entry - Add comprehensive test script for nonogram_task (test_nonogram.py) - Resolve merge conflict in TASK_CATALOG.py (keep nonogram task)
- Keep both symmetry_completion_task and nonogram_task evaluation prompts - Resolve merge conflict by including both entries
| "tower_of_hanoi_task": "Check if exactly one disk moved between frames. Verify the move is legal (top disk moved to empty peg or larger disk). Compare final disk positions to expected.", | ||
| "symmetry_completion_task": "Verify that the right half of the grid in the final frame is correctly mirrored from the left half, creating a symmetric pattern. Check that all missing cells have been filled correctly to complete the vertical symmetry." | ||
| "symmetry_completion_task": "Verify that the right half of the grid in the final frame is correctly mirrored from the left half, creating a symmetric pattern. Check that all missing cells have been filled correctly to complete the vertical symmetry.", | ||
| "nonogram_task": "Verify that all cells in the final frame are correctly filled according to the row and column hints. Check that the filled cells match the expected pattern and that all hints are satisfied. Each row and column must have the correct sequence of filled blocks as indicated by the hints." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does these hints can be known by evaluator?
| @@ -0,0 +1,357 @@ | |||
| """ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this file, I mean you need to generate dataset, and use video model to generate video, and check whether the prompts/ first image is reasonable. not create a test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these test is useless, actually same with you run generate dataset