Skip to content

Conversation

@swar00pduthks
Copy link

@swar00pduthks swar00pduthks commented Jan 15, 2026

Problem

👋 Thanks for opening a pull request! Please include a brief summary of the problem your change is trying to solve, or bug fix. If your change fixes a bug or you'd like to provide context on why you're making the change, please link the issue as follows:

Closes: #3054 , #2772

Solution

New Feature: This PR introduces run-level graph visualization - enabling users to visualize relationships between dataset versions, job versions, and run nodes to simplify data troubleshooting and impact analysis.

Scalable Architecture: To ensure this feature can scale from day one, we implement CQRS (Command Query Responsibility Segregation) pattern with two denormalized tables optimized for different graph views:

  • Parent-level run graph (run_parent_lineage_denormalized) - Business users see streamlined graphs with only top-level pipeline run nodes (~50 nodes for typical workloads)
  • Full detailed run graph (run_lineage_denormalized) - Engineers access comprehensive graphs with all child run nodes for deep debugging (~13,500 nodes for detailed analysis)
  • Monthly partitioning - Ensures performance remains fast as data grows, with automatic retention management

Performance: Designed for speed at scale - parent run graphs render in ~0.1s, detailed run graphs in ~0.8s, even with millions of historical runs. The denormalized design pre-computes 5-8 table joins, enabling sub-second graph traversal and rendering.

Note: All database schema changes require discussion. Please link the issue for context.

One-line summary: Introduces run-level graph visualization with denormalized tables and partitioning for scalable, sub-second lineage traversal supporting both parent-level (business) and detailed (technical) graph views.

Checklist

  • [ X] You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • [ X] Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • [ X] You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • [ X] You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • [ X] You've included a header in any source code files (if relevant)

merobi-hub and others added 27 commits May 18, 2025 18:24
* Arcade demo in about page of docs.

Signed-off-by: merobi-hub <merobi@gmail.com>

* Move demo to homepage component.

Signed-off-by: merobi-hub <merobi@gmail.com>

---------

Signed-off-by: merobi-hub <merobi@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…19.1 (MarquezProject#3005)

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…arquezProject#3004)

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…test configurations

Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…urity

Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…19.1 (MarquezProject#3005)

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…arquezProject#3004)

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
- Added PartitionManagementService to handle creation and cleanup of database partitions.
- Implemented DatasetVersionData and RunData models for lineage data representation.
- Created migration scripts (V75, V76) for partitioned denormalized tables and management functions

Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
@boring-cyborg boring-cyborg bot added the api API layer changes label Jan 15, 2026
- Added PartitionManagementService to handle creation and cleanup of database partitions.
- Implemented DatasetVersionData and RunData models for lineage data representation.
- Created migration scripts (V75, V76) for partitioned denormalized tables and management functions

Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 74.71042% with 131 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.90%. Comparing base (a89b89c) to head (291ef28).

Files with missing lines Patch % Lines
...ons/V77__backfill_denormalized_lineage_tables.java 23.46% 72 Missing and 3 partials ⚠️
...va/marquez/service/DenormalizedLineageService.java 73.28% 33 Missing and 2 partials ⚠️
...rc/main/java/marquez/db/mappers/RunDataMapper.java 88.50% 7 Missing and 3 partials ⚠️
...a/marquez/db/mappers/DatasetVersionDataMapper.java 90.90% 3 Missing and 1 partial ⚠️
...va/marquez/service/PartitionManagementService.java 94.23% 3 Missing ⚠️
api/src/main/java/marquez/MarquezApp.java 80.00% 1 Missing ⚠️
...c/main/java/marquez/api/ColumnLineageResource.java 97.36% 0 Missing and 1 partial ⚠️
api/src/main/java/marquez/db/LineageDao.java 0.00% 1 Missing ⚠️
...rc/main/java/marquez/tracing/TracingSQLLogger.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3084      +/-   ##
============================================
- Coverage     81.18%   80.90%   -0.29%     
- Complexity     1506     1604      +98     
============================================
  Files           268      276       +8     
  Lines          7356     7850     +494     
  Branches        325      353      +28     
============================================
+ Hits           5972     6351     +379     
- Misses         1226     1332     +106     
- Partials        158      167       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ineage services

Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…ineage services

Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

runtime lineage graph similar to job & dataset

3 participants