Skip to content

Conversation

@kyrasturgill
Copy link
Member

@kyrasturgill kyrasturgill commented Jan 15, 2026

This PR adds the raw data for 2024 tax code rate, agency rate and TIF reports. It also makes necessary adjustments to the data-raw ingest scripts that are necessary on account of changes made to 2024 data structure.

Most changes to the ingest scripts involve slight adjustments to the code for renaming fields. This code was originally written to work assuming the same fields being selected from the report files exist across all years (which are contained in separate files). The introduction of new fields in the 2024 files led to errors within the rename_with() function when it was returning a vector with length of 1 rather than 0, even when the field did not exist. The work around for this was using rep() with rename_with(): rename_with(~rep("agency_name", length(.x)), any_of(c("authority_name"))) which would return empty vector if there were no fields present named authority_name.

Other changes were removing certain fields no longer present in 2024 reports, and the creation of the field fund_type_num to account for 2024 changes to fund number structure which are now 6 digits rather than 3. fund_type_num is the first 3 digits of fund_num, which should be consistent across all years. fund_num pre-2024 is also now padded with trailing zeros. Because 2024 is now reporting funds at a more detailed level than in prior years, any time trend analysis of funds should use fund_type_num.

Something I have not added to the data yet is agency_num_legacy or authority_num. I realized that we would need to alter prior years' agency_num and agency_name to align with the revised 2024 agency_num. To avoid altering source data, we could simply have an agency crosswalk added to the db that connects the new agency number, the legacy number and authority number, which would be available for user if they did want to do an analysis of agency extensions or agency rates over time.

Lastly, this PR also brings in a new TIF data source - the pin_tif_distribution which is derived from the Clerk's TIF PIN list report.

@kyrasturgill kyrasturgill changed the base branch from master to 2024-data-update January 15, 2026 22:39
@jeancochrane
Copy link
Member

jeancochrane commented Jan 16, 2026

Thanks for this! Very helpful work. A Few clarifying questions:

  1. It seems that our general preference is to try to keep the schema of our data tables as close as possible to the schema of the underlying data that we pull from the Clerk. However, we're deviating from that pattern in this PR by adding agency.fund_type_num and pin_tif_distribution.in_transit_tif. Do we really want to add these fields at the ingest stage? Or do we instead want to derive them on the fly and return them in the lookup_*() functions, to keep them separate from the underlying data model? I don't have super strong opinions here, and ultimately I trust you as a frequent PTAXSIM user to decide which of these approaches would be most convenient for the user.

  2. I'm uncertain about this comment:

Something I have not added to the data yet is agency_num_legacy or authority_num. I realized that we would need to alter prior years' agency_num and agency_name to align with the revised 2024 agency_num.

I thought that agency_num_legacy mapped cleanly to prior years' agency_num, in which case we could just fill agency_num_legacy for pre-2024 years with the value of agency_num in that year. I also thought authority_num didn't really matter because it's redundant with agency_num for 2024, so we might as well bring it in now. But am I misunderstanding? These fields are kinda confusing, so I'm happy to jump on a call if it would be faster to explain.

@kyrasturgill
Copy link
Member Author

Here are my thoughts on your questions:

  1. I see your point and am definitely good to omit the in_transit_tif field because that's not really that helpful to have in the table and we can build that into the lookup_pin() function. I do think that fund_type_num could be helpful for anyone who has use case for analyzing fund levies over time. lookup_agency() function doesn't return fund-level data, so we wouldn't be able to build that new field into the function output, which is why I was thinking it should be somewhere in the db. Maybe it could just live in the agency_fund_info table - it seems capped_fund_ind is a helper field we derive ourselves so I don't think it would totally break with past practice to include our own derived field there.
  2. For legacy_agency_num and authority_num, I'm very much confusing myself and need to talk it through. For now I'm going to add those to fields as we have them in the drafted proposed schema, and then we can discuss if we want to make changes.

@jeancochrane
Copy link
Member

jeancochrane commented Jan 21, 2026

Maybe it could just live in the agency_fund_info table - it seems capped_fund_ind is a helper field we derive ourselves so I don't think it would totally break with past practice to include our own derived field there.

That makes sense to me! If I understand the data model correctly , it strikes me that agency_fund_info is fundamentally a derived table, in that it represents our interpretation of the discrete entities that are present in the raw agency_fund table. From that perspective, including further derived information in agency_fund_info seems reasonable.

@kyrasturgill
Copy link
Member Author

@jeancochrane, this is ready for your review!
Changes include:

  • I added the fields to the agency_info table that tie the now rolled up agencies to their current "parent" agency. Let me know if you want to talk through my thinking here.
  • Additionally I removed the separate "overlap" county fields, summing and replacing them with a consistent cty_overlap_eav field in order to have pre-2024 years align with the 2024 structure.
  • Previously PTAXSIM renamed the field cty_overall_eav to cty_total_eav and removed the original cty_total_eav field. There seem to be discrepancies between these two fields only for agencies that fund particular bonds. My guess is there is some special rule for these bonds limiting the tax base to only a portion of the parent taxing agency. In 2024 agency report, the field OverallEAV is no longer present, and when reviewing past tax rate reports I'm more confident that the TotalEAV field reflects the EAV available to this specific agency/fund whereas OverallEAV represented taxable EAV within the entire jurisdiction. The proposed code now used cty_total_eav for all years. (Note of something I haven't yet addressed - it seems that this field didn't exist or was blank in the 2013 report, so currently the value is 0 for cty_total_eav in 2013, which we should probably replace with the cty_overall_eav just for that year?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants