Models can overfit when training samples are spatially adjacent.
A way to mitigate this is to select a pixel block size when extracting training folds such that pixels in the same local block are assigned the same fold.
The model will be encouraged to predict well outside areas local to the training data during cross-validation/model-selection.