-
Notifications
You must be signed in to change notification settings - Fork 31
Return the actual error instead of ErrInvalidType #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
inflight.go
Outdated
| // We need to iterate through the values that WE want from this call. The batch | ||
| // could contain hundrdreds of IDs, but we might only want a few of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typos: we, hundreds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thank you for spotting this! The WE was intentional but _ hundrdreds_ was not. I'll update!
|
Thanks! I'll do that and report back. |
|
Have tested locally and looks good. Didn't have time to try in production today but will do so tomorrow. |
|
Sorry, this will take longer! I want to familiarize myself with the cache internals a bit so I can understand this change before deploying to prod. |
inflight.go
Outdated
| // It could be only cached records here, if we we're able | ||
| // to get some of the IDs from the distributed storage. | ||
| if call.err != nil && !errors.Is(call.err, ErrOnlyCachedRecords) { | ||
| return response, call.err | ||
| } | ||
|
|
||
| if errors.Is(call.err, ErrOnlyCachedRecords) { | ||
| err = ErrOnlyCachedRecords | ||
| } | ||
| } | ||
|
|
||
| return response, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Note
I know you are moving code, so the code is not yours, but as you are refactoring it should be reviewed and improved
There is something strange.
if call.err != nil && !errors.Is(call.err, ErrOnlyCachedRecords) {
return response, call.err
}
if errors.Is(call.err, ErrOnlyCachedRecords) {
err = ErrOnlyCachedRecords
}
}
return response, errLet's try to check all cases.
if call.err is nil, the first and second if are ignored, so we return err
Then, if call.err is ErrOnlyCachedRecords, the first if is ignored, the second sets err to ErrOnlyCachedRecords, but here we could have returned call.err
Last case, if call.err is not nil, but not ErrOnlyCachedRecords, the call.err is returned
So call.err is returned when ever present
But then, here this code could be simplified to this, no?
| // It could be only cached records here, if we we're able | |
| // to get some of the IDs from the distributed storage. | |
| if call.err != nil && !errors.Is(call.err, ErrOnlyCachedRecords) { | |
| return response, call.err | |
| } | |
| if errors.Is(call.err, ErrOnlyCachedRecords) { | |
| err = ErrOnlyCachedRecords | |
| } | |
| } | |
| return response, err | |
| // It could be only cached records here, if we we're able | |
| // to get some of the IDs from the distributed storage. | |
| if call.err != nil { | |
| return response, call.err | |
| } | |
| } | |
| return response, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be simpler but it's not functionally equivalent. The original code intentionally handles errors of type ErrOnlyCachedRecords differently from other errors. Specifically, it does the following:
- For non‑ErrOnlyCachedRecords errors: It immediately returns the error, short‑circuits the loop, and doesn’t process further calls.
- For ErrOnlyCachedRecords errors: It doesn’t return immediately; instead, it records that error (by setting err = ErrOnlyCachedRecords) and continues processing the rest of the batch. This way, even if one call returns ErrOnlyCachedRecords, the loop still gathers IDs from other calls. If later another call yields a non‑ErrOnlyCachedRecords error, that error would take precedence and be returned immediately.
The original logic will only short‑circuit the function when a “real” error occurs, while still accumulating partial responses when the only issue is cached‑only records.
However, one could argue that the block could be simplified like this:
if call.err != nil {
if !errors.Is(call.err, ErrOnlyCachedRecords) {
return response, call.err
}
err = ErrOnlyCachedRecords
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it. It's indeed non-obvious
I would write it like this to make sure anything wrapped in the Error is not list then
if call.err != nil {
if !errors.Is(call.err, ErrOnlyCachedRecords) {
return response, call.err
}
err = call.err
}Also, your reply was valuable, you should document it somehow
Please note, one solution could be to use a switch statement
switch {
case call.err != nil:
// nothing to do
case errors.Is(call.err, ErrOnlyCachedRecords):
// this error might get returned if there is no other error while reading the rows, it doesn't block gathering other items
err = call.err
default:
// anything else is blocking
return response, call.err
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original logic will only short‑circuit the function when a “real” error occurs, while still accumulating partial responses when the only issue is cached‑only records.
Another question: Why do we want to short-circuit in any case? If call A had e.g. a 500 response error, why not still process call B?
To answer my own question, I think it could get complicated:
- Call A: Error X
- Call B: Success
- Call C: Error Y
In this scenario, how should callAndCacheBatch respond, with Error X or Error Y?
So I think it makes sense to "fail fast" like you do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Yeah, there are multiple places like this where choosing the "right thing" to do is quite complex, and I don’t want to flood the API with custom errors that clients would then have to handle with large switch cases. My approach has been to keep it as simple as possible until someone opens an issue and explicitly requests more flexibility.
However, since we’ve discussed this code quite a bit, I went ahead and simplified it, as well as added some comments:
for call, callIDs := range callIDs {
call.Wait()
// We need to iterate through the values that WE want from this call. The batch
// could contain hundreds of IDs, but we might only want a few of them.
for _, id := range callIDs {
if v, ok := call.val[id]; ok {
response[id] = v
}
}
// This handles the scenario where we either don't get an error, or are
// using the distributed storage option and are able to get some records
// while the request to the underlying data source fails. In the latter
// case, we'll continue to accumulate partial responses as long as the only
// issue is cached-only records.
if err == nil || errors.Is(call.err, ErrOnlyCachedRecords) {
err = call.err
continue
}
// For any other kind of error, we'll short‑circuit the function and return.
return response, call.err
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version is way cleaner 👍
ernstwi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now worked my way through the code and have a few questions.
I've also added "Assertion:" here and there just to test my understanding. In these cases where there is no question, I'd appreciate if you just give a thumbs up if my understanding is correct :)
| if err != nil && !errors.Is(err, ErrOnlyCachedRecords) { | ||
| if len(cachedRecords) > 0 { | ||
| maps.Copy(cachedRecords, response) | ||
| return cachedRecords, errors.Join(ErrOnlyCachedRecords, err) | ||
| } | ||
| return cachedRecords, err | ||
| } | ||
|
|
||
| maps.Copy(cachedRecords, response) | ||
| return cachedRecords, err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion: This change makes sure that if we got an error but also some new records (i.e. one of two requests succeeded), the new entries are added to the response.
Could it be written clearer like this?
if err != nil && !errors.Is(err, ErrOnlyCachedRecords) && len(cachedRecords) > 0 {
err = errors.Join(ErrOnlyCachedRecords, err)
}
maps.Copy(cachedRecords, response)
return cachedRecords, err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to explain the logic:
if err != nil && !errors.Is(err, ErrOnlyCachedRecords) {
// At this point, the call for the IDs that we didn't have in the cache
// have failed. However, these ID's could have been picked from multiple
// in-flight requests. Hence, we'll check if we're able to add any of these
// IDs to the cached records before returning.
if len(cachedRecords) > 0 {
maps.Copy(cachedRecords, response)
return cachedRecords, errors.Join(ErrOnlyCachedRecords, err)
}
}
maps.Copy(cachedRecords, response)
return cachedRecords, errThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion: The changes to this file makes sure that wrap and wrapBatch don't throw away the return value of fetchFn even if it returns an error, if the return value has the correct concrete type.
inflight.go
Outdated
| valueAsT, valueIsAssignableToT := any(response).(T) | ||
| if !valueIsAssignableToT { | ||
| call.err = ErrInvalidType | ||
| return | ||
| } | ||
| call.val = valueAsT | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion: No functional difference here, just moved.
inflight.go
Outdated
| // It could be only cached records here, if we we're able | ||
| // to get some of the IDs from the distributed storage. | ||
| if call.err != nil && !errors.Is(call.err, ErrOnlyCachedRecords) { | ||
| return response, call.err | ||
| } | ||
|
|
||
| if errors.Is(call.err, ErrOnlyCachedRecords) { | ||
| err = ErrOnlyCachedRecords | ||
| } | ||
| } | ||
|
|
||
| return response, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original logic will only short‑circuit the function when a “real” error occurs, while still accumulating partial responses when the only issue is cached‑only records.
Another question: Why do we want to short-circuit in any case? If call A had e.g. a 500 response error, why not still process call B?
To answer my own question, I think it could get complicated:
- Call A: Error X
- Call B: Success
- Call C: Error Y
In this scenario, how should callAndCacheBatch respond, with Error X or Error Y?
So I think it makes sense to "fail fast" like you do here.
inflight.go
Outdated
| opts.call.val[id] = v | ||
| if err == nil || errors.Is(err, errOnlyDistributedRecords) { | ||
| c.Set(opts.keyFn(id), v) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we set call.val to v if it is a valid T, but we only update the cache if there is no "real" (non-errOnlyDistributedRecords) error.
This will mean that this function call will return v but subsequent calls will not. Correct? Why do we want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, you mentioned here that this remains a question. However, the code was modified when I cleaned up the generics for the batch operations, so the function no longer handles V types. If you still have a question, could you rephrase it to fit the current version of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, absolutely. My question is about this block in makeBatchCall, current state on this branch is:
for id, record := range response {
opts.call.val[id] = record
if err == nil || errors.Is(err, errOnlyDistributedRecords) {
c.Set(opts.keyFn(id), record)
}
}
This is changed from this code on main:
for id, record := range response {
v, ok := any(record).(T)
if !ok {
c.log.Error("sturdyc: invalid type for ID:" + id)
continue
}
c.Set(opts.keyFn(id), v)
opts.call.val[id] = v
}
The thing that looks weird to me in the new code is that you have split up opts.call.val[id] = record and c.Set(opts.keyFn(id), record).
So that opts.call.val[id] = record is always done, but c.Set(opts.keyFn(id), record) is only done in some cases (if there is no significant error).
Doesn't this mean that for a request that errors, the first call to GetOrFetchBatch will get the record anyway, but subsequent calls will not? I don't get why these two things are separated, shouldn't we either return the record (opts.call.val[id] = record) and save it in the cache, or neither?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again! Ah, I see what you mean now. Sorry for misunderstanding your original question - I thought it was about the types.
This change is essentially to achieve the functionality I described in the PR. Currently, if you run that example code from the PR description on main, you would get the following output(values are discarded):
❯ go run main.go
2025/03/12 10:59:37 sturdyc: invalid response type
2025/03/12 10:59:37
2025/03/12 10:59:37 something went wrong with the batch call tooBut with these changes, we'll return the actual values along with the error:
2025/03/12 11:03:43 something went wrong
2025/03/12 11:03:43 Error getting order status
2025/03/12 11:03:43 something went wrong with the batch call too
2025/03/12 11:03:43 Error getting order status for ID: 1
2025/03/12 11:03:43 Error getting order status for ID: 2
2025/03/12 11:03:43 Error getting order status for ID: 3
Since the call errored, we won’t store the values in the cache. That said, this introduces a bit of additional complexity for something that could perhaps be seen as an edge case or maybe functionality that nobody really needs.
However, I thought of scenarios where an underlying data source returns partial response errors. In such cases, the partial data we do receive could still be useful, so it doesn’t feel right to discard it automatically. One could argue that if it's a partial response, we should store the successfully retrieved IDs in the cache and only retry the missing ones. However, the problem is that we can't know what these errors represent for the consumer. Storing values in the cache when the function returns an error could lead to unexpected behavior, which might not be desirable.
The TLDR of this is essentially: Here are the values you gave us from your fetch function, discard them or use them if you like. If you want them to be kept in the cache, signal to us that the call was successful by making the fetch function return a non-nil error.
I added some comments to the code that hopefully explains this:
for id, record := range response {
// We never want to discard values from the fetch functions, even if they
// return an error. Instead, we'll pass them to the user along with any
// errors and let them decide what to do.
opts.call.val[id] = record
// However, we'll only write them to the cache if the fetchFunction returned a non-nil error.
if err == nil || errors.Is(err, errOnlyDistributedRecords) {
c.Set(opts.keyFn(id), record)
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks! That explanation makes a lot of sense 👍
|
The new changes look good to me and nice to see some new tests :) Only real question I have remaining is this |
Great! It would be awesome if you could test these changes by installing a version of
I'll put a comment in that thread |
Deploying this branch to production now, will let you know how our metrics look after running for a while 👍 |
|
The metrics are looking good, but I found an issue with package main
import (
"context"
"log"
"time"
"github.com/viccon/sturdyc"
)
type apiClient struct {
cache *sturdyc.Client[any]
}
func newAPIClient(c *sturdyc.Client[any]) *apiClient {
return &apiClient{cache: c}
}
func (a *apiClient) OrderStatus(ctx context.Context, id string) (string, error) {
fetchFn := func(_ context.Context) (string, error) {
return "Error getting order status", sturdyc.ErrNotFound // <- CHANGED
}
return sturdyc.GetOrFetch(ctx, a.cache, id, fetchFn)
}
func main() {
capacity := 10000
numShards := 10
ttl := 2 * time.Hour
evictionPercentage := 10
minRefreshDelay := time.Second
maxRefreshDelay := time.Second * 2
synchronousRefreshDelay := time.Second * 30
retryBaseDelay := time.Millisecond * 10
// Create a new cache client with the specified configuration.
cacheClient := sturdyc.New[any](capacity, numShards, ttl, evictionPercentage,
sturdyc.WithEarlyRefreshes(minRefreshDelay, maxRefreshDelay, synchronousRefreshDelay, retryBaseDelay),
sturdyc.WithRefreshCoalescing(10, time.Second*15),
sturdyc.WithMissingRecordStorage(), // <- ADDED
)
api := newAPIClient(cacheClient)
if res, err := api.OrderStatus(context.Background(), "1"); err != nil {
log.Println(err)
log.Println(res)
}
}We get: I think what we would expect is: Looks like this can be fixed by adding |
Hi, ah, this actually makes a lot of sense! I saw the question mark after you linked to the line where it could be fixed, so I thought I'd add an explanation of why this change fixes the issue.
Great that you spotted this and I've pushed a fix! It also made me reflect a bit on the fact that now, with the removal of the |
| }() | ||
|
|
||
| response, err := fn(ctx) | ||
| call.val = response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is great and I agree with it.
I followed the valuable discussion that led to do this
But I think the changes introduced by faee623 should be convered by tests
Adding such a change is great but it requires to make sure any future changes won't break this behavior
See also #36 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, yeah I agree that we should get test coverage for this!
| // We never want to discard values from the fetch functions, even if they | ||
| // return an error. Instead, we'll pass them to the user along with any | ||
| // errors and let them decide what to do. | ||
| opts.call.val[id] = record | ||
| // However, we'll only write them to the cache if the fetchFunction returned a non-nil error. | ||
| if err == nil || errors.Is(err, errOnlyDistributedRecords) { | ||
| c.Set(opts.keyFn(id), record) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as #36 (comment)
Here, also I think the changes introduced by f9684a3 should be covered by tests.
Nice, thanks for the fix + explanation 🙏 I'll push these new commits to prod today, but think we're good to go with the PR now? 🙂 A big thanks from all of us at the office for your work, we really appreciate it. P.S. Sorry for slow response – I don't work on Fridays, so didn't read until today. |
| // return an error. Instead, we'll pass them to the user along with any | ||
| // errors and let them decide what to do. | ||
| opts.call.val[id] = record | ||
| // However, we'll only write them to the cache if the fetchFunction returned a non-nil error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a typo in this comment? We only write them to the cache if the fetchFunction didn't return a non-nil error.
(except for errOnlyDistributedRecords)
|
I'm closing this PR as #40 was branched from here and includes improvements for additional errors. |
Overview
As pointed out by @ernstwi here, the
GetOrFetchfunction always returnsErrInvalidTypeerrors instead of the actual error. In addition to this, it also throws away the value and returns a zero value of the type instead.As an example, running this code on main:
Produces the following output:
Running the same code again, with the changes introduces by this PR, produces the following: