Skip to content

Conversation

@jrgemignani
Copy link
Contributor

@jrgemignani jrgemignani commented Dec 18, 2025

An AI coding tool was used to make these changes. A human has reviewed and double checked the work and regression tests along the way. Still, this doesn't guarantee correctness.

Note: This is building the ground work for enabling multiple labels.
      Some of the columns introduced will likely change. The label oids
      are likely to change as we need to eliminate the dependence on the
      specific label tables. Right now these tables are stubs for maintaining
      the same ids - this is so the regression tests won't have too many
      differences.

Note: This is experimental and may not go anywhere. If you like it or want it, please let us know!

Note: I will leave this PR open for a few days before merging it into the branch.

Key changes:

  • Replaced per-label vertex tables with a single _ag_label_vertex table containing
    columns: id (graphid), properties (agtype), labels (oid).

  • Fixed vertex operations (CREATE, MATCH, SET, DELETE, VLE) to use unified
    _ag_label_vertex table instead of label-specific tables.

  • Fixed vertex label display: Added _label_name_from_table_oid() function to
    extract label names from the labels column OID rather than parsing graphid
    values (which is table-per-label specific)

  • Fixed graphid generation: Updated transform_cypher_node() in cypher_clause.c to
    use label_relation instead of default_vertex_relation when building id
    expressions, ensuring labeled vertices receive label-specific sequence values in
    their id.

  • Disconnected all label id extraction from the vertex ids. This makes all
    operations independent of the vertex id, except for as an id.

  • Fixed regression tests.

Differences with regression tests are due to -

  • Non-ordered output being reordered due to the new unified table.

  • Different query plans with explain due to the new unified table.

  • Displaying specific vertex label tables instead of the unified table.

modified: regress/expected/age_global_graph.out
modified: regress/expected/age_load.out
modified: regress/expected/cypher_create.out
modified: regress/expected/cypher_delete.out
modified: regress/expected/cypher_match.out
modified: regress/expected/cypher_merge.out
modified: regress/expected/cypher_remove.out
modified: regress/expected/cypher_set.out
modified: regress/expected/cypher_subquery.out
modified: regress/expected/cypher_vle.out
modified: regress/expected/expr.out
modified: regress/expected/graph_generation.out
modified: regress/expected/index.out
modified: regress/expected/list_comprehension.out
modified: regress/sql/age_global_graph.sql
modified: regress/sql/age_load.sql
modified: regress/sql/cypher_create.sql
modified: regress/sql/cypher_set.sql
modified: regress/sql/graph_generation.sql
modified: regress/sql/index.sql
modified: sql/age_main.sql
modified: src/backend/catalog/ag_label.c
modified: src/backend/commands/label_commands.c
modified: src/backend/executor/cypher_create.c
modified: src/backend/executor/cypher_merge.c
modified: src/backend/executor/cypher_set.c
modified: src/backend/executor/cypher_utils.c
modified: src/backend/nodes/cypher_copyfuncs.c
modified: src/backend/nodes/cypher_outfuncs.c
modified: src/backend/nodes/cypher_readfuncs.c
modified: src/backend/parser/cypher_clause.c
modified: src/backend/utils/adt/age_global_graph.c
modified: src/backend/utils/adt/age_vle.c
modified: src/backend/utils/adt/agtype.c
modified: src/backend/utils/load/ag_load_labels.c
modified: src/backend/utils/load/age_load.c
modified: src/include/catalog/ag_label.h
modified: src/include/commands/label_commands.h
modified: src/include/nodes/cypher_nodes.h
modified: regress/expected/pgvector.out
modified: regress/sql/pgvector.sql

@github-actions github-actions bot added the override-stale To keep issues/PRs untouched from stale action label Dec 18, 2025
@uhayat
Copy link
Contributor

uhayat commented Dec 19, 2025

Its nice that someone is working on multi-lable support. I see this is quite invasive change in architure of of AGE storage layer. I have not review the code. I will stick to few design level questions first:

  1. Is this supposed to be merged to master branch in future or it would stay a separate project?
  2. What are pros and cons of this new storage architecture as compare to previous?

@jrgemignani
Copy link
Contributor Author

jrgemignani commented Dec 19, 2025

@uhayat

This is just a test branch for multiple labels atm. The plan is to keep it in sync with the master. But, whether it stays its own feature branch or gets merged back into the master, is to be seen. The end goal is to enable multiple labels with sane performance and make it as compatible with the single label version as possible to allow easy migration to it.

This first PR is just a test to unify the vertex labels and gage the impact on performance. There is another PR, that may or may not happen, that follows this one, to eliminate the other vertex tables completely. Although, this one isn't necessary to move forward.

The pros/cons is a good question. There are the specific pros/cons - 1 vertex table versus many approach. Then the pros/cons - multi-label versus 1 label. And finally, follows cypher specification versus doesn't.

My focus is on making the most effective and flexible approach. Many times what appears good on paper, falls short in reality. So, this is to answer that question.

@jrgemignani jrgemignani force-pushed the unified_vertex_table branch 3 times, most recently from 16461e6 to e6c142c Compare December 20, 2025 10:09
NOTE: This PR was built with AI tools and a human.

Implemented a unified vertex table architecture to be built off of.

Key changes:

Replaced per-label vertex tables with a single _ag_label_vertex table containing
columns: id (graphid), properties (agtype), labels (oid).

* Fixed vertex operations (CREATE, MATCH, SET, DELETE, VLE) to use unified
_ag_label_vertex table instead of label-specific tables.

* Fixed vertex label display: Added _label_name_from_table_oid() function to
extract label names from the labels column OID rather than parsing graphid
values (which is table-per-label specific)

* Fixed graphid generation: Updated transform_cypher_node() in cypher_clause.c to
use label_relation instead of default_vertex_relation when building id
expressions, ensuring labeled vertices receive label-specific sequence values in
their id.

* Disconnected all label id extraction from the vertex ids. This makes all
operations independent of the vertex id, except for as an id.

* Fixed regression tests.

Differences with regression tests are due to -

* Non-ordered output being reordered due to the new unified table.

* Different query plans with explain due to the new unified table.

* Displaying specific vertex label tables instead of the unified table.

modified:   Makefile
modified:   regress/expected/age_global_graph.out
modified:   regress/expected/age_load.out
modified:   regress/expected/cypher_create.out
modified:   regress/expected/cypher_delete.out
modified:   regress/expected/cypher_match.out
modified:   regress/expected/cypher_merge.out
modified:   regress/expected/cypher_remove.out
modified:   regress/expected/cypher_set.out
modified:   regress/expected/cypher_subquery.out
modified:   regress/expected/cypher_vle.out
modified:   regress/expected/expr.out
modified:   regress/expected/graph_generation.out
modified:   regress/expected/index.out
modified:   regress/expected/list_comprehension.out
new file:   regress/expected/unified_vertex_table.out
modified:   regress/sql/age_global_graph.sql
modified:   regress/sql/age_load.sql
modified:   regress/sql/cypher_create.sql
modified:   regress/sql/cypher_set.sql
modified:   regress/sql/graph_generation.sql
modified:   regress/sql/index.sql
new file:   regress/sql/unified_vertex_table.sql
modified:   sql/age_main.sql
modified:   src/backend/catalog/ag_label.c
modified:   src/backend/commands/label_commands.c
modified:   src/backend/executor/cypher_create.c
modified:   src/backend/executor/cypher_delete.c
modified:   src/backend/executor/cypher_merge.c
modified:   src/backend/executor/cypher_set.c
modified:   src/backend/executor/cypher_utils.c
modified:   src/backend/nodes/cypher_copyfuncs.c
modified:   src/backend/nodes/cypher_outfuncs.c
modified:   src/backend/nodes/cypher_readfuncs.c
modified:   src/backend/parser/cypher_clause.c
modified:   src/backend/utils/adt/age_global_graph.c
modified:   src/backend/utils/adt/age_vle.c
modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/load/ag_load_labels.c
modified:   src/backend/utils/load/age_load.c
modified:   src/include/catalog/ag_label.h
modified:   src/include/commands/label_commands.h
modified:   src/include/executor/cypher_utils.h
modified:   src/include/nodes/cypher_nodes.h
modified:   regress/expected/pgvector.out
modified:   regress/sql/pgvector.sql
@jrgemignani jrgemignani merged commit fee0fa2 into apache:Dev_Multiple_Labels Dec 20, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

override-stale To keep issues/PRs untouched from stale action

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants