ottogroup · gutzbenj · Jan 20, 2026 · Jan 20, 2026 · Jan 20, 2026 · Jan 20, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -21,6 +21,13 @@ Types of changes:
 - Allow setting `monitor_only` on check bundles and on individual checks
 - Map provider/table-not-found errors to a unified `table_exists` metric
 - Support identifier-type filters without explicit column/value for naming; add `identifier_format` option (`identifier`, `filter_name`, `column_name`) to control the result identifier column
+- Treat identifier filters with missing/null `value` as a configurable placeholder for logging and naming (defaults to `ALL`)
+- Add `identifier_placeholder` option to configure the placeholder value used when identifier filters lack a value; defaults to `ALL` and is applied to the result IDENTIFIER column and logging for clearer partition naming.
+
+### Fixed
+
+- Quote table identifiers in bulk SELECTs when loading data into DuckDB memory to avoid BigQuery binder errors for identifiers that look like project IDs (e.g., `EC0601`). Added an integration test covering the quoting behavior.
+- Ensure MatchRateCheck only requires the check column from the left table; the right table now only contributes join and filter columns to avoid unnecessary column selection and errors.
 
 ## [0.9.0] - 2026-01-16
 

diff --git a/README.md b/README.md
@@ -193,6 +193,12 @@ Koality supports an `identifier` filter type which can be used to mark the field
 
 If an identifier-type filter is defined without a concrete `column` or `value` (for example in global `defaults`), it is treated as a naming-only hint and will not be turned into a WHERE clause; this is useful when you only want to control the result identifier column name (e.g., `SHOP_ID`) across checks.
 
+Behavior for missing identifier values
+
+When an identifier-type filter is present but its `value` is missing or explicitly `null`, Koality substitutes a configurable placeholder for logging and naming (`defaults.identifier_placeholder`, default: `ALL`) to avoid `None` appearing in metric messages. You can override the placeholder at bundle or check level by setting `identifier_placeholder` in the corresponding defaults.
+
+Additional docs: see docs/identifier_placeholder.md for usage examples and configuration details.
+
 ### Filter Properties
 
 | Property        | Description                                                                    |

diff --git a/docs/checks/matchrate.md b/docs/checks/matchrate.md
@@ -0,0 +1,33 @@
+# MatchRateCheck — check_column placement
+
+Guidance:
+
+- The `check_column` for `MatchRateCheck` must be a column present in the left-hand table only.
+- The right-hand table is only expected to provide join columns and filter columns; it must not be required to contain `check_column`.
+
+Configuration example:
+
+```yaml
+- defaults:
+    check_type: MatchRateCheck
+    check_column: product_number                   # must exist on the left table
+    join_columns_left:
+      - BQ_PARTITIONTIME
+      - shopId
+      - product_number
+    join_columns_right:
+      - BQ_PARTITIONTIME
+      - value.shopId
+      - product_number
+  checks:
+    - left_table: project.dataset.left_table
+      right_table: project.dataset.right_table
+      filters:
+        shop_id:
+          value: SHOP01
+```
+
+Notes:
+
+- The executor only requests the `check_column` from the left table during bulk loading; the right table will only be queried for its join and filter columns.
+- This avoids errors when the right table does not contain the `check_column` or when its identifier resembles a BigQuery project ID.
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -406,6 +406,8 @@ filters:
 
 The identifier value appears in check results and failure messages. How it's formatted depends on the `identifier_format` global setting.
 
+For more details about naming-only identifier filters and the `identifier_placeholder` option, see the guide: [Identifier filters and naming](../identifier_filters.md).
+
 ### Date Filters
 
 When `type: date` is set, the value is automatically parsed as a date. Supported formats:

diff --git a/docs/identifier_placeholder.md b/docs/identifier_placeholder.md
@@ -0,0 +1,66 @@
+Identifier Placeholder
+
+When an identifier-type filter (type: identifier) is defined but the filter's value is missing or explicitly `null`, Koality uses a configurable placeholder string to fill the result IDENTIFIER column and logging messages. This avoids `None` or empty identifiers in persisted results and monitoring UIs.
+
+Configuration
+
+You can set the placeholder in the following locations (more specific levels override less specific ones):
+
+- `defaults.identifier_placeholder` (global default)
+- `check_bundles.<bundle>.defaults.identifier_placeholder` (bundle-level)
+- `check_bundles.<bundle>.checks.<i>.identifier_placeholder` (check-level)
+
+If not set, the placeholder defaults to `ALL`.
+
+Examples
+
+1) Global default placeholder:
+
+```yaml
+defaults:
+  identifier_placeholder: UNKNOWN
+  filters:
+    shop_id:
+      column: shop_code
+      type: identifier
+      value: null
+```
+
+Result: IDENTIFIER uses `UNKNOWN` for checks that rely on the `shop_id` identifier when no concrete value is provided.
+
+2) Bundle-level override:
+
+```yaml
+check_bundles:
+  - name: my_bundle
+    defaults:
+      identifier_placeholder: ALL_SHOPS
+      filters:
+        shop_id:
+          column: shop_code
+          type: identifier
+          value: null
+```
+
+3) Check-level override:
+
+```yaml
+check_bundles:
+  - name: my_bundle
+    defaults:
+      filters:
+        shop_id:
+          column: shop_code
+          type: identifier
+    checks:
+      - check_type: CountCheck
+        identifier_placeholder: SHOP_UNKNOWN
+        filters:
+          shop_id:
+            value: null
+```
+
+Notes
+
+- The placeholder is only applied for naming/logging and does not produce a WHERE clause when the identifier filter lacks a `value` and no column is provided.
+- Use a descriptive placeholder (e.g., `ALL`, `UNKNOWN`, `ALL_SHOPS`) to make results easier to interpret in dashboards and logs.
diff --git a/src/koality/checks.py b/src/koality/checks.py
@@ -41,6 +41,7 @@ def __init__(
         *,
         filters: dict[str, Any] | None = None,
         identifier_format: str = "identifier",
+        identifier_placeholder: str = "ALL",
         date_info: str | None = None,
         extra_info: str | None = None,
         monitor_only: bool = False,
@@ -62,6 +63,7 @@ def __init__(
 
         # Identifier format configuration
         self.identifier_format = identifier_format
+        self.identifier_placeholder = identifier_placeholder
 
         # for where filter handling
         self.filters = self.get_filters(filters or {})
@@ -70,7 +72,11 @@ def __init__(
         identifier_filter_result = self.get_identifier_filter(self.filters)
         if identifier_filter_result:
             filter_name, filter_config = identifier_filter_result
-            value = filter_config.get("value", "ALL")
+            # If value key is missing or explicitly None, treat as identifier_placeholder (meaning "no specific value")
+            if "value" in filter_config and filter_config["value"] is not None:
+                value = filter_config["value"]
+            else:
+                value = self.identifier_placeholder
             column = filter_config.get("column", "")
 
             if self.identifier_format == "identifier":
@@ -465,6 +471,7 @@ def __init__(
         *,
         filters: dict[str, Any] | None = None,
         identifier_format: str = "identifier",
+        identifier_placeholder: str = "ALL",
         date_info: str | None = None,
         extra_info: str | None = None,
         monitor_only: bool = False,
@@ -481,6 +488,7 @@ def __init__(
             upper_threshold=upper_threshold,
             filters=filters,
             identifier_format=identifier_format,
+            identifier_placeholder=identifier_placeholder,
             date_info=date_info,
             extra_info=extra_info,
             monitor_only=monitor_only,
@@ -563,6 +571,7 @@ def __init__(
         *,
         filters: dict[str, Any] | None = None,
         identifier_format: str = "identifier",
+        identifier_placeholder: str = "ALL",
         date_info: str | None = None,
         extra_info: str | None = None,
         monitor_only: bool = False,
@@ -578,6 +587,7 @@ def __init__(
             upper_threshold=upper_threshold,
             filters=filters,
             identifier_format=identifier_format,
+            identifier_placeholder=identifier_placeholder,
             date_info=date_info,
             extra_info=extra_info,
             monitor_only=monitor_only,
@@ -628,6 +638,7 @@ def __init__(
         *,
         filters: dict[str, Any] | None = None,
         identifier_format: str = "identifier",
+        identifier_placeholder: str = "ALL",
         date_info: str | None = None,
         extra_info: str | None = None,
         monitor_only: bool = False,
@@ -645,6 +656,7 @@ def __init__(
             upper_threshold=upper_threshold,
             filters=filters,
             identifier_format=identifier_format,
+            identifier_placeholder=identifier_placeholder,
             date_info=date_info,
             extra_info=extra_info,
             monitor_only=monitor_only,
@@ -699,6 +711,7 @@ def __init__(
         extra_info: str | None = None,
         filters: dict[str, Any] | None = None,
         identifier_format: str = "identifier",
+        identifier_placeholder: str = "ALL",
         date_info: str | None = None,
     ) -> None:
         """Initialize the values in set check."""
@@ -718,6 +731,7 @@ def __init__(
             upper_threshold=upper_threshold,
             filters=filters,
             identifier_format=identifier_format,
+            identifier_placeholder=identifier_placeholder,
             date_info=date_info,
             extra_info=extra_info,
             monitor_only=monitor_only,
@@ -855,6 +869,7 @@ def __init__(
         *,
         filters: dict[str, Any] | None = None,
         identifier_format: str = "identifier",
+        identifier_placeholder: str = "ALL",
         date_info: str | None = None,
         extra_info: str | None = None,
         monitor_only: bool = False,
@@ -870,6 +885,7 @@ def __init__(
             upper_threshold=upper_threshold,
             filters=filters,
             identifier_format=identifier_format,
+            identifier_placeholder=identifier_placeholder,
             date_info=date_info,
             extra_info=extra_info,
             monitor_only=monitor_only,
@@ -921,6 +937,7 @@ def __init__(
         distinct: bool = False,
         filters: dict[str, Any] | None = None,
         identifier_format: str = "identifier",
+        identifier_placeholder: str = "ALL",
         date_info: str | None = None,
         extra_info: str | None = None,
         monitor_only: bool = False,
@@ -942,6 +959,7 @@ def __init__(
             upper_threshold=upper_threshold,
             filters=filters,
             identifier_format=identifier_format,
+            identifier_placeholder=identifier_placeholder,
             date_info=date_info,
             extra_info=extra_info,
             monitor_only=monitor_only,
@@ -1235,6 +1253,7 @@ def __init__(
         filters_left: dict[str, Any] | None = None,
         filters_right: dict[str, Any] | None = None,
         identifier_format: str = "identifier",
+        identifier_placeholder: str = "ALL",
         date_info: str | None = None,
     ) -> None:
         """Initialize the match rate check."""
@@ -1270,6 +1289,7 @@ def __init__(
             upper_threshold=upper_threshold,
             filters=filters,
             identifier_format=identifier_format,
+            identifier_placeholder=identifier_placeholder,
             date_info=date_info,
             extra_info=extra_info,
             monitor_only=monitor_only,
@@ -1405,6 +1425,7 @@ def __init__(
         *,
         filters: dict[str, Any] | None = None,
         identifier_format: str = "identifier",
+        identifier_placeholder: str = "ALL",
         date_info: str | None = None,
         extra_info: str | None = None,
         monitor_only: bool = False,
@@ -1434,6 +1455,7 @@ def __init__(
             upper_threshold=upper_threshold,
             filters=filters,
             identifier_format=identifier_format,
+            identifier_placeholder=identifier_placeholder,
             date_info=date_info,
             extra_info=extra_info,
             monitor_only=monitor_only,
@@ -1569,6 +1591,7 @@ def __init__(
         *,
         filters: dict[str, Any] | None = None,
         identifier_format: str = "identifier",
+        identifier_placeholder: str = "ALL",
         date_info: str | None = None,
         extra_info: str | None = None,
         monitor_only: bool = False,
@@ -1611,6 +1634,7 @@ def __init__(
             upper_threshold=math.inf,
             filters=filters,
             identifier_format=identifier_format,
+            identifier_placeholder=identifier_placeholder,
             date_info=date_info,
             extra_info=extra_info,
             monitor_only=monitor_only,

diff --git a/src/koality/executor.py b/src/koality/executor.py
@@ -215,6 +215,33 @@ def get_data_requirements(self) -> defaultdict[str, defaultdict[str, set]]:  # n
         """
         data_requirements = defaultdict(lambda: defaultdict(set))
         for check in self.checks:
+            # Skip synthetic JOIN table entries created for MatchRateCheck; handle left/right tables explicitly
+            if isinstance(check, MatchRateCheck):
+                # Add columns and filter columns for left table
+                if check.check_column and check.check_column != "*":
+                    data_requirements[check.left_table]["columns"].add(check.check_column)
+                for _filter in check.filters_left.values():
+                    if "column" in _filter:
+                        data_requirements[check.left_table]["columns"].add(_filter["column"])
+                data_requirements[check.left_table]["columns"].update(check.join_columns_left)
+
+                # Add only filter and join columns for right table (check_column is only from left table)
+                for _filter in check.filters_right.values():
+                    if "column" in _filter:
+                        data_requirements[check.right_table]["columns"].add(_filter["column"])
+                data_requirements[check.right_table]["columns"].update(check.join_columns_right)
+
+                # Store unique filter configurations for both tables
+                filter_key_left = frozenset(
+                    (name, frozenset(config.items())) for name, config in check.filters_left.items()
+                )
+                filter_key_right = frozenset(
+                    (name, frozenset(config.items())) for name, config in check.filters_right.items()
+                )
+                data_requirements[check.left_table]["filters"].add(filter_key_left)
+                data_requirements[check.right_table]["filters"].add(filter_key_right)
+                continue
+
             table_name = check.table
             check_filters = check.filters
             # Add check-specific columns and filter columns to the requirements
@@ -224,16 +251,12 @@ def get_data_requirements(self) -> defaultdict[str, defaultdict[str, set]]:  # n
                 if "column" in _filter:
                     data_requirements[table_name]["columns"].add(_filter["column"])
 
-            # For MatchRateCheck, add columns from both left and right tables
-            if isinstance(check, MatchRateCheck):
-                data_requirements[check.left_table]["columns"].update(check.join_columns_left)
-                data_requirements[check.right_table]["columns"].update(check.join_columns_right)
-                for _filter in check.filters_left.values():
-                    if "column" in _filter:
-                        data_requirements[check.left_table]["columns"].add(_filter["column"])
-                for _filter in check.filters_right.values():
-                    if "column" in _filter:
-                        data_requirements[check.right_table]["columns"].add(_filter["column"])
+            if isinstance(check, IqrOutlierCheck):
+                check_filters = {k: v for k, v in check.filters.items() if v.get("type") != "date"}
+
+            # Store unique filter configurations for each table
+            filter_key = frozenset((name, frozenset(config.items())) for name, config in check_filters.items())
+            data_requirements[table_name]["filters"].add(filter_key)
 
             if isinstance(check, IqrOutlierCheck):
                 check_filters = {k: v for k, v in check.filters.items() if v.get("type") != "date"}
@@ -271,10 +294,16 @@ def fetch_data_into_memory(self, data_requirements: defaultdict[str, defaultdict
             if all_filters_sql:
                 final_where_clause = "WHERE " + " OR ".join(all_filters_sql)
 
+            # Determine appropriate table quoting depending on database provider
+            if self.database_provider and getattr(self.database_provider, "type", "").lower() == "bigquery":
+                table_ref = f"`{table}`"
+            else:
+                table_ref = f'"{table}"'
+
             # Construct the bulk SELECT query
             select_query = f"""
             SELECT {columns}
-            FROM {table}
+            FROM {table_ref}
             {final_where_clause}
             """  # noqa: S608
 
@@ -318,6 +347,7 @@ def execute_checks(self) -> None:
                 check_kwargs["database_accessor"] = self.config.database_accessor
                 check_kwargs["database_provider"] = self.database_provider
                 check_kwargs["identifier_format"] = self.config.defaults.identifier_format
+                check_kwargs["identifier_placeholder"] = self.config.defaults.identifier_placeholder
                 check_instance = check_factory(**check_kwargs)
                 self.checks.append(check_instance)