Conversation
|
@kraftpunk97 can you push a commit that shows how to vectorize some parts of this code (either to this branch, or to another)? |
jpsamaroo
left a comment
There was a problem hiding this comment.
Just a quick semantic pass over the PR; I haven't had a chance to sit down and work through the math against the paper.
|
The math looks good to me. I'm more concerned about the order in which the statements are executed. I suspect that the sequence of the parts in paper isn't necessarily the sequence in which they are to be executed. @merckxiaan knows better here. |
|
AFAIU, the methods section doesn't explain the parts in order. I've tinkered a bit with the ordering and I believe (I could be wrong though!) that, at each line, the necessary variables have already been computed/updated. |
|
I'll gladly take it on. I'll re-read the paper to ensure that statements are executed in the proper order. |
|
Okay, I think an update is due at this point. I've been looking at the code shared by the DeepMind team, and dear god, is that codebase a nightmare to navigate. But I've been staring at it long enough to make sense of the basic structure of the DNC. The statements in are executed in the expected order (update usage(dynamic memory allocation) --> write --> update linkage memory --> read), so yay. I think all we need to do now is to calculate the gates that we then use in the forward pass (which should be easy) and this PR should then be good to merge. It also wouldn't hurt to reconsider the fields stored in the |
|
@merckxiaan Looking at the DeepMind code, they assume multiple read/write heads, while the code we have is for multiple read heads, but only a single write heads. Just so we are in the clear, would you like to keep it that way, or should we accommodate multiple write heads as well? I leave that decision to you. |
69412d9 to
cf8b1c0
Compare
|
Nice work! |
|
Okay, so the way things are right now, we're in a pickle, because Zygote doesn't allow for mutation of arrays, and that means we have to overcome a few challenges, since we're dealing with a lot arrays here. The forward pass is complete from what I can tell. But the gradients can't be properly calculated right now. If I haven't already uploaded the commits with the current progress, I'll do so in the next 48 hours. |
|
Is it possible for this code to be efficient on the Zygote mutate branch? Or is it possible to use the buffer mechanism that Zygote has for mutation? Either way, do what you think is best. Thankfully this model isn't ginormous by default, so we shouldn't run into problems with too many allocations from allocating new arrays every pass. |
I'm not aware of any "mutate" branch in Zygote's repository. Please clarify @jpsamaroo. |
By @merckxiaan