Skip to content

Auto-classify saved queries using DataFusion logical plan #116

@zfarrell

Description

@zfarrell

Builds on #108.

When a saved query is created or a new version is added, we should automatically classify it by analyzing the SQL through DataFusion's logical plan. This gives us structured metadata about query shape without any manual tagging.

Classification fields

Derive from the logical plan at save time and store on saved_query_versions:

  • category — one of: full_scan, point_lookup, filtered_scan, aggregation, join, projection
  • num_tables — number of table references in the plan
  • has_predicate — plan contains a Filter node
  • has_join — plan contains a Join node
  • has_aggregation — plan contains an Aggregate node
  • has_group_by — aggregate node has group-by expressions
  • has_order_by — plan contains a Sort node
  • has_limit — plan contains a Limit node

User-provided metadata

Store on saved_queries (not per-version):

  • tags — freeform text array for user-defined labels (e.g. baseline, regression_candidate)
  • description — human-readable description

Override columns

Store on saved_query_versions alongside auto-classified fields:

  • category_override — nullable, takes precedence over auto-detected category when set
  • table_size_override — nullable, takes precedence over auto-detected table size when set

Read APIs should return COALESCE(category_override, category) as the effective category.

Implementation approach

Classification runs in Rust where DataFusion's logical plan is already available. The flow is: parse SQL → build logical plan → walk the plan tree → populate classification columns. No external dependencies needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions