Skip to content

Conversation

@benbellick
Copy link
Member

There are lots of places where the validity of anchor/reference 0 is not explicitly mentioned. This PR adds explicitly clarification in the proto comments and the site docs to clarify that anchor/reference 0 is valid.

Closes #899

@benbellick benbellick force-pushed the benbellick/consistent-0-anchor branch from c851ebc to e95927f Compare November 24, 2025 19:49
// to a scalar function in the associated YAML file. Required; avoid
// using anchor/reference zero.
// to a scalar function in the associated YAML file.
// Required; 0 is a valid anchor/reference.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only one I am not sure about... I find the language here a bit confusing. Is it saying the two separate things:

  1. this field is required, and
  2. you must not use anchor/reference 0
    ?

Or is it saying "it is required not to use anchor/reference 0"?

I think it is the former. In which case there is nothing special about ScalarFunction.function_reference that prevents it from being 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the intent of this is:

  1. It is required that this field be set.
  2. 0 is a valid anchor / reference

@benbellick benbellick force-pushed the benbellick/consistent-0-anchor branch 2 times, most recently from 0c0da1c to 098e7f9 Compare November 24, 2025 19:58
@benbellick benbellick force-pushed the benbellick/consistent-0-anchor branch from 098e7f9 to f32460f Compare November 24, 2025 20:11

// Note: type_variation_reference fields within Type messages reference a
// type_variation_anchor defined in the plan's extension declarations. The value
// 0 represents the system-preferred variation and is a valid reference value.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure where a good place to put this was considering type_variation_reference shows up a ton of times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For type variation anchor defined in the extension, it is better to not use 0 because it is reserved IMO. Perhaps, we can document this in the type extension proto message as well as type variation documentation?

@benbellick benbellick marked this pull request as ready for review November 24, 2025 20:13
@nielspardon
Copy link
Member

Given that the proposed documentation changes say "non-negative" would it make sense to always say >= 0 in the proto comments to be super specific?

I do remember the substrait-validator having a check for the 0 anchors reporting a warning. Should probably be also updated at some point in time.

@vbarua vbarua added the documentation Improvements or additions to documentation label Dec 2, 2025
@vbarua
Copy link
Member

vbarua commented Dec 2, 2025

I can provide some guidance on why the non-zero language exists for references. Effectively, if a uint32 field in a protobuf message is not set explicilty, the value defaults to 0. This makes it difficult to distinguish unset references from those intentionally set to 0. When serializing protobuf for debugging/logging, by default uint32 fields set to 0 are not output either which can make plans difficult read.

The stance of the project has generally been

0 is a valid anchor/reference, but prefer non-zero values for ergonomics

What do you think about including the explicit guidance to avoid zero values?

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this. Left on minor comment, but otherwise looks good to me.

@drin
Copy link
Member

drin commented Dec 13, 2025

Should we also update tutorials for URN anchors to start from 0?
https://github.com/substrait-io/substrait/blob/main/site/docs/tutorial/final_plan.json

Edit: I just reread this to understand:

0 is a valid anchor/reference, but prefer non-zero values for ergonomics

I guess the idea is that anchors are unique, unordered values and they're all valid as long as they're unique (within a plan) at any given time?


// Note: type_variation_reference fields within Type messages reference a
// type_variation_anchor defined in the plan's extension declarations. The value
// 0 represents the system-preferred variation and is a valid reference value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For type variation anchor defined in the extension, it is better to not use 0 because it is reserved IMO. Perhaps, we can document this in the type extension proto message as well as type variation documentation?

// A surrogate key used in the context of a single plan to reference a
// specific type variation
// specific type variation.
// 0 is a valid anchor/reference, but prefer non-zero values for ergonomics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we be explicit about using non-zero here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

consistently clarify that 0 is allowed value in anchors / references

6 participants