forked from tfrancoi/odoo_csv_import
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Problem
Legacy ERP systems often store city and postal code in a single field, leading to mixed data during migrations. This is a very common data quality issue.
Examples encountered in migrations:
"104 Reykjavík"(Icelandic: postal prefix)"75001 Paris"(French: postal prefix)"Amsterdam 1012 AB"(Dutch: postal suffix)"3080-055 Figueira Da Foz"(Portuguese: hyphenated postal)"13014 Kuwait"(Middle East: numeric prefix)"London SW1A 1AA"(UK: alphanumeric suffix)
Proposed Solution
1. City/Postal Separator
Add a separate_city_postal() cleaner function that:
- Detects postal code patterns by country/region
- Separates city name from postal code
- Returns both values (tuple or dict)
from odoo_data_flow.clean import separate_city_postal
# Returns (city, postal_code) tuple
city, postal = separate_city_postal("75001 Paris", country="FR")
# city = "Paris", postal = "75001"
# Or with country detection from postal pattern
city, postal = separate_city_postal("Amsterdam 1012 AB")
# city = "Amsterdam", postal = "1012 AB"
# Handle edge cases gracefully
city, postal = separate_city_postal("Some City")
# city = "Some City", postal = ""2. Country Code Detection/Recovery
When country code is missing, attempt to recover it from other available data:
from odoo_data_flow.clean import detect_country
# From phone number (international prefix)
country = detect_country(phone="+31 6 12345678")
# country = "NL"
# From postal code pattern
country = detect_country(postal="1012 AB")
# country = "NL"
# From city name (known major cities)
country = detect_country(city="Amsterdam")
# country = "NL"
# Combined detection (uses all available hints)
country = detect_country(
phone="+33 1 23456789",
postal="75001",
city="Paris"
)
# country = "FR"Country-specific postal patterns to support
| Country | Format | Position | Example |
|---|---|---|---|
| NL | 1234 AB |
suffix | Amsterdam 1012 AB |
| BE | 1234 |
prefix | 1000 Brussel |
| DE | 12345 |
prefix | 10115 Berlin |
| FR | 75001 |
prefix | 75001 Paris |
| UK | SW1A 1AA |
suffix | London SW1A 1AA |
| US | 12345(-6789) |
suffix | New York 10001 |
| PT | 1234-567 |
prefix | 3080-055 Figueira Da Foz |
| IS | 123 |
prefix | 104 Reykjavík |
| ES | 12345 |
prefix | 28001 Madrid |
| IT | 12345 |
prefix | 00100 Roma |
Phone prefix patterns for country detection
| Prefix | Country |
|---|---|
| +31 | NL |
| +32 | BE |
| +33 | FR |
| +44 | GB |
| +49 | DE |
| +1 | US/CA |
| etc. | ... |
Use Case
Partner/address imports where:
- Source data has combined city+postal fields
- Country code is missing but can be inferred from phone/postal/city
This would complement the existing clean.zip_code() and clean.city() functions in the feature/production-ready-etl branch.
Metadata
Metadata
Assignees
Labels
No labels