Skip to content

Feature: City/Postal Code Separator & Country Detection Cleaners #171

@bosd

Description

@bosd

Problem

Legacy ERP systems often store city and postal code in a single field, leading to mixed data during migrations. This is a very common data quality issue.

Examples encountered in migrations:

  • "104 Reykjavík" (Icelandic: postal prefix)
  • "75001 Paris" (French: postal prefix)
  • "Amsterdam 1012 AB" (Dutch: postal suffix)
  • "3080-055 Figueira Da Foz" (Portuguese: hyphenated postal)
  • "13014 Kuwait" (Middle East: numeric prefix)
  • "London SW1A 1AA" (UK: alphanumeric suffix)

Proposed Solution

1. City/Postal Separator

Add a separate_city_postal() cleaner function that:

  1. Detects postal code patterns by country/region
  2. Separates city name from postal code
  3. Returns both values (tuple or dict)
from odoo_data_flow.clean import separate_city_postal

# Returns (city, postal_code) tuple
city, postal = separate_city_postal("75001 Paris", country="FR")
# city = "Paris", postal = "75001"

# Or with country detection from postal pattern
city, postal = separate_city_postal("Amsterdam 1012 AB")
# city = "Amsterdam", postal = "1012 AB"

# Handle edge cases gracefully
city, postal = separate_city_postal("Some City")
# city = "Some City", postal = ""

2. Country Code Detection/Recovery

When country code is missing, attempt to recover it from other available data:

from odoo_data_flow.clean import detect_country

# From phone number (international prefix)
country = detect_country(phone="+31 6 12345678")
# country = "NL"

# From postal code pattern
country = detect_country(postal="1012 AB")
# country = "NL"

# From city name (known major cities)
country = detect_country(city="Amsterdam")
# country = "NL"

# Combined detection (uses all available hints)
country = detect_country(
    phone="+33 1 23456789",
    postal="75001",
    city="Paris"
)
# country = "FR"

Country-specific postal patterns to support

Country Format Position Example
NL 1234 AB suffix Amsterdam 1012 AB
BE 1234 prefix 1000 Brussel
DE 12345 prefix 10115 Berlin
FR 75001 prefix 75001 Paris
UK SW1A 1AA suffix London SW1A 1AA
US 12345(-6789) suffix New York 10001
PT 1234-567 prefix 3080-055 Figueira Da Foz
IS 123 prefix 104 Reykjavík
ES 12345 prefix 28001 Madrid
IT 12345 prefix 00100 Roma

Phone prefix patterns for country detection

Prefix Country
+31 NL
+32 BE
+33 FR
+44 GB
+49 DE
+1 US/CA
etc. ...

Use Case

Partner/address imports where:

  • Source data has combined city+postal fields
  • Country code is missing but can be inferred from phone/postal/city

This would complement the existing clean.zip_code() and clean.city() functions in the feature/production-ready-etl branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions