-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
APIv2: visit:country_name, visit:region_name, visit:city_name dimensions #4328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
5cf59c6
Add data migration for creating and syncing location_data table and d…
macobo 17b93c0
Migration to populate location data
macobo f64d499
Daily cron to refresh location dataset if changed
macobo 6aed886
Add support for visit:country_name, visit:region_name and visit:city_…
macobo 981577f
Update queue name
macobo 4e34c6c
Update documentation
macobo 7391de6
Explicit structs
macobo ca2ab6d
Improve docs further
macobo 9f31843
Migration comment
macobo 5179dae
Add queues
macobo fe20726
Add error when already loaded
macobo f6a76ca
Test for filtering by new dimensions
macobo ac2c71d
Update deps
macobo e14241f
Merge remote-tracking branch 'origin/master' into location-sync
macobo a0ac147
dimension -> select_dimension
macobo 632d62f
Update a test
macobo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| defmodule Plausible.ClickhouseLocationData do | ||
| @moduledoc """ | ||
| Schema for storing location id <-> translation mappings in ClickHouse | ||
|
|
||
| Indirectly read via dictionary `location_data_dictionary` in ALIAS columns in | ||
| `events_v2`, `sessions_v2` and `imported_locations` table. | ||
| """ | ||
| use Ecto.Schema | ||
|
|
||
| @primary_key false | ||
| schema "location_data" do | ||
| field :type, Ch, type: "LowCardinality(String)" | ||
| field :id, :string | ||
| field :name, :string | ||
| end | ||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| defmodule Plausible.DataMigration.LocationsSync do | ||
| @moduledoc """ | ||
| ClickHouse locations data migration for storing location names in ClickHouse. | ||
| Only run when `Location.version()` changes: either as a migration or in cron. | ||
| The migration: | ||
| 1. Truncates existing `location_data` table (if exists) | ||
| 2. Creates new table (if needed) | ||
| 3. Inserts new data from Location module | ||
| 4. (Re-)Creates dictionary to read location data from table | ||
| 5. Creates ALIAS columns in `events_v2`, `sessions_v2` and `imported_locations` table to make reading location names easy | ||
| 6. Updates table comment for `location_data` to indicate last version synced. | ||
| Note that the dictionary is large enough to cache the whole dataset in memory, making lookups fast. | ||
| This migration is intended to be idempotent and rerunnable - if run multiple times, it should always set things to the same | ||
| result as if run once. | ||
| SQL files available at: priv/data_migrations/LocationsSync/sql | ||
| """ | ||
| alias Plausible.ClickhouseLocationData | ||
|
|
||
| use Plausible.DataMigration, dir: "LocationsSync", repo: Plausible.IngestRepo | ||
macobo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| @columns [ | ||
| %{ | ||
| table: "events_v2", | ||
| column_name: "country_name", | ||
| type: "country", | ||
| input_column: "country_code" | ||
| }, | ||
| %{ | ||
| table: "events_v2", | ||
| column_name: "region_name", | ||
| type: "subdivision", | ||
| input_column: "subdivision1_code" | ||
| }, | ||
| %{ | ||
| table: "events_v2", | ||
| column_name: "city_name", | ||
| type: "city", | ||
| input_column: "city_geoname_id" | ||
| }, | ||
| %{ | ||
| table: "sessions_v2", | ||
| column_name: "country_name", | ||
| type: "country", | ||
| input_column: "country_code" | ||
| }, | ||
| %{ | ||
| table: "sessions_v2", | ||
| column_name: "region_name", | ||
| type: "subdivision", | ||
| input_column: "subdivision1_code" | ||
| }, | ||
| %{ | ||
| table: "sessions_v2", | ||
| column_name: "city_name", | ||
| type: "city", | ||
| input_column: "city_geoname_id" | ||
| }, | ||
| %{ | ||
| table: "imported_locations", | ||
| column_name: "country_name", | ||
| type: "country", | ||
| input_column: "country" | ||
| }, | ||
| %{ | ||
| table: "imported_locations", | ||
| column_name: "region_name", | ||
| type: "subdivision", | ||
| input_column: "region" | ||
| }, | ||
| %{ | ||
| table: "imported_locations", | ||
| column_name: "city_name", | ||
| type: "city", | ||
| input_column: "city" | ||
| } | ||
| ] | ||
|
|
||
| def out_of_date?() do | ||
| case run_sql("get-location-data-table-comment") do | ||
| {:ok, %{rows: [[stored_version]]}} -> stored_version != Location.version() | ||
macobo marked this conversation as resolved.
Show resolved
Hide resolved
ruslandoga marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| _ -> true | ||
| end | ||
| end | ||
|
|
||
| def run() do | ||
| cluster? = Plausible.MigrationUtils.clustered_table?("sessions_v2") | ||
|
|
||
| {:ok, _} = run_sql("truncate-location-data-table", cluster?: cluster?) | ||
macobo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| {:ok, _} = run_sql("create-location-data-table", cluster?: cluster?) | ||
|
|
||
| countries = | ||
| Location.Country.all() | ||
| |> Enum.map(fn %Location.Country{alpha_2: alpha_2, name: name} -> | ||
| %{type: "country", id: alpha_2, name: name} | ||
| end) | ||
|
|
||
| subdivisions = | ||
| Location.Subdivision.all() | ||
| |> Enum.map(fn %Location.Subdivision{code: code, name: name} -> | ||
| %{type: "subdivision", id: code, name: name} | ||
| end) | ||
|
|
||
| cities = | ||
| Location.City.all() | ||
| |> Enum.map(fn %Location.City{id: id, name: name} -> | ||
| %{type: "city", id: Integer.to_string(id), name: name} | ||
| end) | ||
|
|
||
| insert_data = Enum.concat([countries, subdivisions, cities]) | ||
| @repo.insert_all(ClickhouseLocationData, insert_data) | ||
|
|
||
| {:ok, _} = | ||
| run_sql("update-location-data-dictionary", | ||
| cluster?: cluster?, | ||
| dictionary_connection_params: Plausible.MigrationUtils.dictionary_connection_params() | ||
| ) | ||
|
|
||
| for column <- @columns do | ||
| {:ok, _} = | ||
| run_sql("add-alias-column", | ||
| cluster?: cluster?, | ||
| table: column.table, | ||
| column_name: column.column_name, | ||
| type: column.type, | ||
| input_column: column.input_column | ||
| ) | ||
| end | ||
|
|
||
| {:ok, _} = | ||
| run_sql("update-location-data-table-comment", | ||
| cluster?: cluster?, | ||
| version: Location.version() | ||
| ) | ||
| end | ||
| end | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| defmodule Plausible.Workers.LocationsSync do | ||
| @moduledoc false | ||
|
|
||
| use Plausible.Repo | ||
| use Oban.Worker, queue: :locations_sync | ||
macobo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| @impl Oban.Worker | ||
| def perform(_job) do | ||
| if Plausible.DataMigration.LocationsSync.out_of_date?() do | ||
| Plausible.DataMigration.LocationsSync.run() | ||
| end | ||
|
|
||
| :ok | ||
| end | ||
| end | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5 changes: 5 additions & 0 deletions
5
priv/data_migrations/LocationsSync/sql/add-alias-column.sql.eex
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| ALTER TABLE <%= @table %> | ||
| <%= if @cluster? do %>ON CLUSTER '{cluster}'<% end %> | ||
| ADD COLUMN IF NOT EXISTS | ||
| <%= @column_name %> String | ||
| ALIAS dictGet('location_data_dict', 'name', tuple('<%= @type %>', <%= @input_column %>)) |
13 changes: 13 additions & 0 deletions
13
priv/data_migrations/LocationsSync/sql/create-location-data-table.sql.eex
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| CREATE TABLE IF NOT EXISTS location_data <%= if @cluster? do %>ON CLUSTER '{cluster}'<% end %> | ||
| ( | ||
| `type` LowCardinality(String), | ||
| `id` String, | ||
| `name` String | ||
| ) | ||
| <%= if @cluster? do %> | ||
| ENGINE = ReplicateMergeTree('/clickhouse/{cluster}/tables/{shard}/plausible_prod/location_data', '{replica}') | ||
| <% else %> | ||
| ENGINE = MergeTree() | ||
| <% end %> | ||
| ORDER BY (type, id) | ||
| SETTINGS index_granularity = 128 |
1 change: 1 addition & 0 deletions
1
priv/data_migrations/LocationsSync/sql/get-location-data-table-comment.sql.eex
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| select comment from system.tables where database = currentDatabase() and table = 'location_data' |
1 change: 1 addition & 0 deletions
1
priv/data_migrations/LocationsSync/sql/truncate-location-data-table.sql.eex
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| TRUNCATE TABLE IF EXISTS location_data <%= if @cluster? do %>ON CLUSTER '{cluster}'<% end %> |
11 changes: 11 additions & 0 deletions
11
priv/data_migrations/LocationsSync/sql/update-location-data-dictionary.sql.eex
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| CREATE OR REPLACE DICTIONARY location_data_dict | ||
| <%= if @cluster? do %>ON CLUSTER '{cluster}'<% end %> | ||
| ( | ||
| `type` String, | ||
| `id` String, | ||
| `name` String | ||
| ) | ||
| PRIMARY KEY type, id | ||
| SOURCE(CLICKHOUSE(TABLE location_data <%= @dictionary_connection_params %>)) | ||
| LIFETIME(0) | ||
| LAYOUT(complex_key_cache(size_in_cells 500000)) |
3 changes: 3 additions & 0 deletions
3
priv/data_migrations/LocationsSync/sql/update-location-data-table-comment.sql.eex
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| ALTER TABLE location_data | ||
| <%= if @cluster? do %>ON CLUSTER '{cluster}'<% end %> | ||
| MODIFY COMMENT '<%= @version %>' |
18 changes: 18 additions & 0 deletions
18
priv/ingest_repo/migrations/20240709181437_populate_location_data.exs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| defmodule Plausible.IngestRepo.Migrations.PopulateLocationData do | ||
| use Ecto.Migration | ||
|
|
||
| def up do | ||
| try do | ||
| Location.load_all() | ||
| rescue | ||
| # Already loaded | ||
| ArgumentError -> nil | ||
| end | ||
|
|
||
| Plausible.DataMigration.LocationsSync.run() | ||
aerosol marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| end | ||
|
|
||
| def down do | ||
| raise "Irreversible" | ||
| end | ||
| end | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.