-
Notifications
You must be signed in to change notification settings - Fork 91
Implement Urban Mental Health Model Option 2: LULC Inputs #2273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/urban-health-model
Are you sure you want to change the base?
Implement Urban Mental Health Model Option 2: LULC Inputs #2273
Conversation
…c ndvi map table and calculate avg ndvi per lucode and reclassify lulc tif natcap#2142
… ndvi column or raster natcap#2142
| task_name="calculate delta ndvi" # change in nature exposure | ||
| buffer_alt_dependencies.append(mask_alt_ndvi_task) | ||
|
|
||
| kernel_task = task_graph.add_task( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend hiding whitespace when reviewing this, as most (all?) of the changes between kernel_task and zonal stats are just unindenting
emilyanndavis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
| id="lulc_to_ndvi_csv", | ||
| path="intermediate/lulc_to_ndvi_map.csv", | ||
| about=gettext( | ||
| "Table giving mean NDVI by LULC codes, with excluded LULC " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Table giving mean NDVI by LULC codes, with excluded LULC " | |
| "Table giving mean NDVI by LULC code, with excluded LULC " |
| unique_lucodes, inverse_indices = numpy.unique( | ||
| masked_lulc.astype(numpy.int64), return_inverse=True) | ||
|
|
||
| sums = numpy.bincount(inverse_indices, | ||
| weights=masked_ndvi.astype(numpy.float32)) | ||
| counts = numpy.bincount(inverse_indices) | ||
| means = sums / counts | ||
|
|
||
| mean_ndvi_by_lulc_dict = {lucode: mean for lucode, mean in zip( | ||
| unique_lucodes, means)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, very clever solution! I've never used bincount before, so I had to go to the docs and puzzle over this for a bit before I could grasp exactly what's going on here. Not sure if it would be worth adding a comment to help explain it up front? On the one hand, a gloss might have helped me (though I probably would have looked it up in the docs as well); on the other hand, it already seems pretty clear for people who (unlike me) are fluent in numpy. Up to you. Just thought I'd mention it on the off-chance you were already considering adding a comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, great idea! It is definitely not completely intuitive so I'll add several comments
dcdenu4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @claire-simpson , I didn't make it all the way through but wanted to provide some initial feedback in a timely manner!
| for input_raster, resample_method in raster_to_method_dict.items(): | ||
| if args[input_raster]: | ||
| input_align_list.append(args[input_raster]) | ||
| output_align_list.append(file_registry[input_raster+'_aligned']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've generally adopted f-strings for string concatenation. I think it's technically "faster" and can be more readable than using the plus operator.
| args=(args['lulc_attr_csv'], | ||
| file_registry['lulc_to_ndvi_csv'], | ||
| file_registry['lulc_base_aligned'], # Note: won't get used if `ndvi` column in `lulc_attr_csv`` | ||
| file_registry['ndvi_base_aligned'] #TODO: will this get set to None if used doesn't input ndvi_base? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a TODO to work through for this PR or later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops no, I just forgot to delete this. I determined that for args, if the user doesn't enter an optional input, it gets set to None, but the value of the file_registry[key] is the file path regardless of whether its actually used/created (so, not set to None).
| for lu, ndvi, exclude in zip(codes, ndvi_means, excludes): | ||
| if bool(exclude): | ||
| value_map[lu] = FLOAT32_NODATA | ||
| elif numpy.isfinite(lu): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this fails and there is a positive infinity or NaN value? Given the ModelSpec validation for this should we be worried about it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably unnecessary. I think I added the isfinite check because I was finding that pandas was reading in an empty line at the end of the lulc csv.. but I couldn't figure out why then and now I can't reproduce this behavior.
| elif base_ndvi_path: | ||
| LOGGER.info("Using NDVI raster to calculate mean NDVI by LULC class.") | ||
| lulc_dict = {lu: ex for lu, ex in zip(codes, excludes) | ||
| if numpy.isfinite(lu)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used somewhere else too, but is it necessary to check numpy.isfinite here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - I think I added the isfinite check because I was finding that pandas was reading in an empty line at the end of the lulc csv.
| Dict containing the mean NDVI values for each LULC class | ||
| """ | ||
| # TODO: iterblocks? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we'll want to handle this a little differently to avoid loading the entire raster into memory with raster_to_numpy_array. A common way is using iterblocks and keeping a running average as you iterate. Habitat Quality does something like this where it counts the number of unique pixel values. But it could be altered to keep a running average too.
invest/src/natcap/invest/habitat_quality.py
Line 1119 in 98de13b
| def _raster_pixel_count(raster_path_band): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks I've implemented this!
| data_type=int, | ||
| units=None, | ||
| required="scenario=='lulc'", | ||
| # Allow lulc_alt for masking if using scenario 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the scenarios numbered or would it be helpful to use the name descriptor instead?
dcdenu4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @claire-simpson ! I took another pass and had some comments and suggestions.
| func=build_lulc_ndvi_table, | ||
| args=(args['lulc_attr_csv'], | ||
| file_registry['lulc_to_ndvi_csv'], | ||
| file_registry['lulc_base_aligned'], # Note: won't get used if `ndvi` column in `lulc_attr_csv`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| file_registry['lulc_base_aligned'], # Note: won't get used if `ndvi` column in `lulc_attr_csv`` | |
| file_registry['lulc_base_aligned'], # Note: won't get used if `ndvi` column in `lulc_attr_csv` |
| args=(file_registry['lulc_base_aligned'], | ||
| file_registry['lulc_to_ndvi_csv'], | ||
| file_registry['ndvi_base_aligned_masked']), | ||
| # this outputs raster to use as new NDVI to use going forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comments wording is a bit unclear and maybe unnecessary?
| file_registry['ndvi_base_aligned_masked'])["datatype"], | ||
| 'target_nodata': pygeoprocessing.get_raster_info( | ||
| file_registry['ndvi_base_aligned_masked'])["nodata"][0]}, | ||
| dependent_task_list=buffer_base_dependencies+[kernel_task], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit pick that this list concatenation has no spaces and line 794 has a space :)
| None | ||
| """ | ||
| lulc_df = pandas.read_csv(lulc_attr_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some benefits to opening input tables using the MODEL_SPEC.get_input function. Here's an example from the carbon model.
invest/src/natcap/invest/carbon.py
Line 419 in 98de13b
| carbon_pool_df = MODEL_SPEC.get_input( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also might guard against the behavior you were seeing with trailing empty lines, etc!
|
|
||
| create_lulc_ndvi_csv_task = task_graph.add_task( | ||
| func=build_lulc_ndvi_table, | ||
| args=(args['lulc_attr_csv'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented in the function as well but the MODEL_SPEC has some handy get_input methods that allow us to open CSV tables in a way consistent manner. I gave an example from the carbon model below.
| # return_inverse returns the indices of unique array | ||
| # e.g., for lulc array [1, 2, 2, 5]: unique_lucodes = [1, 2, 5] | ||
| # and inverse_indices = [0, 1, 1, 2] | ||
| unique_lucodes, inverse_indices = numpy.unique( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy.unique also has a return_counts parameter that might be a little more straightforward than using bincount?
| block_counts = numpy.bincount(inverse_indices) | ||
|
|
||
| # accumulate into global sums/counts | ||
| for lucode, sum, c in zip(unique_lucodes, block_sums, block_counts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the bincount solution is pretty neat and not one that I would have thought of. For the sake of sharing ideas, since we ultimately need to iterate over the unique codes, another solution would be kind of our typical one:
for lucode in unique_lucodes:
if lucode == nodata:
continue
ndvi_sum = numpy.sum(ndvi[lulc==lucode])
sum[lucode] = ndvi_sum
etc...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One positive of this approach is that we might be able to reduce some of the above code that flattens the array and creates the nodata masks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although checking nodata blocks and continuing if there's no relevant data seems like a good thing to keep!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using return_counts in np.unique is a good idea! However I do think this approach of iterating over the unique codes would be slower because, for each lucode, you're looking at the whole block to build lucode == nodata and sum. So I guess the tradeoff is the current approach being faster but the approach you proposed being more readable. What do you think?
| target_nodata = FLOAT32_NODATA | ||
|
|
||
| # Create lucode: ndvi dict | ||
| lulc_df = pandas.read_csv(mean_ndvi_by_lulc_csv, index_col='lucode') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non model spec input CSVs, there is a utils function for opening CSV's that does some sanitization. Could be helpful even though this CSV is curated by the model.
invest/src/natcap/invest/utils.py
Line 293 in 98de13b
| def read_csv_to_dataframe(path, **kwargs): |
| # raise error if user enters lulc_attr_csv without 'ndvi' column and also | ||
| # doesn't provide base_ndvi raster | ||
| if args['scenario'] == 'lulc' and args.get('lulc_attr_csv'): | ||
| lulc_df = pandas.read_csv(args['lulc_attr_csv']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another use case where using the MODEL_SPEC helper functions to open this could be nice.
Description
Implements LULC-based option (scenario 2) in the Urban Mental Health model and unifies NDVI preprocessing across scenarios.
Adds workflow for LULC reclassification to NDVI:
ndvi_base) bylucodeif neededexcluded LULC classes toNODATAlulc_to_ndvi_map.csvto store the final mapping usedOther changes:
lulc) --> buffer-mean --> delta_NDVI pipeline.ndvicolumn andbase_ndviboth aren't provided for option 2Fixes #2142
Checklist