-
Notifications
You must be signed in to change notification settings - Fork 4
Add 2024 AVs, exemptions, equalizer, and CPI #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2024-data-update
Are you sure you want to change the base?
Add 2024 AVs, exemptions, equalizer, and CPI #63
Conversation
This reverts commit 128edb9.
…qualizer-av-and-exemptions
| # Remove footer lines that do not contain any data | ||
| filter( | ||
| !str_detect( | ||
| vals, | ||
| regex("printed by the authority|ptax-115", ignore_case = TRUE) | ||
| ) | ||
| ) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This footer appears to be new as of 2025. See here to examine it: https://tax.illinois.gov/content/dam/soi/en/web/tax/localgovernments/property/documents/cpihistory.pdf
| # Start and end years of data to query, inclusive. | ||
| # Set these to the same value if you want to update only one year of data | ||
| start_year <- 2006 | ||
| end_year <- 2024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding these in so that we have an easy way of skipping prior years of data whenever we do an update. This is particularly useful right now because I don't have access to the AS400 mirror, which is required in order to reproduce pre-2024 data. At some point we should get that set up, but I don't want it to block us right now.
| # 2023. These values come from the legacy CCAO database, which mirrors the | ||
| # county mainframe. | ||
| # Only query this data if we are pulling data for years up to 2023 | ||
| if (start_year <= 2023) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few conditional branches in this file that split depending on whether we're ingesting data before or after 2023. I considered creating a new file dedicated exclusively to post-2024 data manipulation, since it feels to me like this file will get very messy very fast if they substantially change the data model again in the future (causing us to need to introduce further conditional branches based on year). For now, however, modifying this file feels like the simpler path, and I expect it's easier to review anyway.
| # This exemption is new in 2024 and does not exist in the legacy data | ||
| exe_vet_dis_100 = 0L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one change I made to the pre-2023 query.
| pin_exe_vetdis_athena <- dbGetQuery( | ||
| ccaoathena, | ||
| glue_sql(" | ||
| WITH long AS ( | ||
| SELECT | ||
| det.parid AS pin, | ||
| det.taxyr AS year, | ||
| CASE | ||
| WHEN | ||
| det.excode IN ('DV1', 'C-DV1', 'DV0', 'C-DV0', 'DV-1') | ||
| THEN 'exe_vet_dis_lt50' | ||
| WHEN det.excode IN ('DV2', 'C-DV2', 'DV-2') THEN 'exe_vet_dis_50_69' | ||
| WHEN det.excode IN ('DV3', 'DV3-M', 'DV-3') THEN 'exe_vet_dis_ge70' | ||
| WHEN det.excode IN ('DV4', 'DV4-M', 'DV-4') THEN 'exe_vet_dis_100' | ||
| END AS exe_name, | ||
| COALESCE(cast(det.apother AS INT), 0) AS exe_amount | ||
| FROM iasworld.exdet AS det | ||
| INNER JOIN iasworld.exadmn AS admn | ||
| ON det.parid = admn.parid | ||
| AND det.caseno = admn.caseno | ||
| AND det.taxyr = admn.taxyr | ||
| AND det.excode = admn.excode | ||
| AND admn.cur = 'Y' | ||
| AND admn.deactivat IS NULL | ||
| AND admn.exstat = 'A' | ||
| AND (admn.user126 IS NULL OR admn.user126 = 'N') | ||
| INNER JOIN iasworld.excode AS code | ||
| ON det.excode = code.excode | ||
| AND det.taxyr = code.taxyr | ||
| AND code.cur = 'Y' | ||
| AND code.deactivat IS NULL | ||
| WHERE det.cur = 'Y' | ||
| AND det.deactivat IS NULL | ||
| AND det.excode IN ( | ||
| 'DV1', 'C-DV1', 'DV0', 'C-DV0', 'DV-1', | ||
| 'DV2', 'C-DV2', 'DV-2', | ||
| 'DV3', 'DV3-M', 'DV-3', | ||
| 'DV4', 'DV4-M', 'DV-4' | ||
| ) | ||
| AND det.taxyr >= '2024' | ||
| AND det.taxyr <= '{end_year}' | ||
| ) | ||
| SELECT | ||
| pin, | ||
| year, | ||
| CAST( | ||
| SUM( | ||
| CASE WHEN exe_name = 'exe_vet_dis_lt50' THEN exe_amount ELSE 0 END | ||
| ) | ||
| AS INT) AS exe_vet_dis_lt50, | ||
| CAST( | ||
| SUM( | ||
| CASE WHEN exe_name = 'exe_vet_dis_50_69' THEN exe_amount ELSE 0 END | ||
| ) | ||
| AS INT) AS exe_vet_dis_50_69, | ||
| CAST( | ||
| SUM( | ||
| CASE WHEN exe_name = 'exe_vet_dis_ge70' THEN exe_amount ELSE 0 END | ||
| ) | ||
| AS INT) AS exe_vet_dis_ge70, | ||
| CAST( | ||
| SUM( | ||
| CASE WHEN exe_name = 'exe_vet_dis_100' THEN exe_amount ELSE 0 END | ||
| ) | ||
| AS INT) AS exe_vet_dis_100 | ||
| FROM long | ||
| GROUP BY pin, year | ||
| ", .con = ccaoathena) | ||
| ) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much of this query logic duplicates the logic in default.vw_pin_exe. However, that view does not provide a method for filtering by CofE (ccao-data/data-architecture#962), which is important in this context. As such, I've decided to duplicate the logic for now, and we can clean this up later once we incorporate CofE flags into the data lake.
| # and Cook Central. We have to load each parquet file individually instead of | ||
| # loading them all as a Dataset because some issue with the file metadata causes | ||
| # an esoteric error when geoarrow tries to collect the files as a Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither Billy nor I are sure why this just started breaking recently. However, it prevents us from using geoarrow_collect_sf(), so I refactored to load each Parquet file individually instead -- an ugly solution, but it works.
This PR tweaks a few
data-rawscripts to add 2024 data to thepin,cpi, andeq_factortables. I have already used this code to load the corresponding files into the testing bucket on S3.The most complicated of these changes relates to the
pintable, whose data source needs to change in 2024 following the Clerk's migration from the AS400 to iasWorld as their source-of-truth database. Rather than pull AV and exemption data from a SQL server mirror of the AS400, as we used to do, we now pull these data from a flat file stored in S3. In future years, we may pull this data from iasWorld directly, so I did a little bit of QC work to check the flat file against iasWorld; they mostly match up, though there remain a few thousand rows with discrepancies that I couldn't track down. (See EI issue 395, which will investigate these discrepancies in more detail.)Connects #59.