Skip to content

initial DNC forward pass.#6

Open
jpsamaroo wants to merge 12 commits intodnc-cleanroomfrom
jm/dnc_forward_pass
Open

initial DNC forward pass.#6
jpsamaroo wants to merge 12 commits intodnc-cleanroomfrom
jm/dnc_forward_pass

Conversation

@jpsamaroo
Copy link
Owner

By @merckxiaan

@jpsamaroo
Copy link
Owner Author

@kraftpunk97 can you push a commit that shows how to vectorize some parts of this code (either to this branch, or to another)?

Copy link
Owner Author

@jpsamaroo jpsamaroo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick semantic pass over the PR; I haven't had a chance to sit down and work through the math against the paper.

@kraftpunk97-zz
Copy link
Collaborator

The math looks good to me. I'm more concerned about the order in which the statements are executed. I suspect that the sequence of the parts in paper isn't necessarily the sequence in which they are to be executed. @merckxiaan knows better here.

@jumerckx
Copy link
Collaborator

AFAIU, the methods section doesn't explain the parts in order. I've tinkered a bit with the ordering and I believe (I could be wrong though!) that, at each line, the necessary variables have already been computed/updated.
Variable/function naming is one of my weak spots, for example, I haven't made a distinction between variables at t-1 and t. Perhaps that would clear things up and further illuminate problems.

@kraftpunk97-zz
Copy link
Collaborator

I'll gladly take it on. I'll re-read the paper to ensure that statements are executed in the proper order.

@kraftpunk97-zz
Copy link
Collaborator

kraftpunk97-zz commented Aug 23, 2019

Okay, I think an update is due at this point. I've been looking at the code shared by the DeepMind team, and dear god, is that codebase a nightmare to navigate. But I've been staring at it long enough to make sense of the basic structure of the DNC. The statements in are executed in the expected order (update usage(dynamic memory allocation) --> write --> update linkage memory --> read), so yay. I think all we need to do now is to calculate the gates that we then use in the forward pass (which should be easy) and this PR should then be good to merge. It also wouldn't hurt to reconsider the fields stored in the DNC struct.

@kraftpunk97-zz
Copy link
Collaborator

@merckxiaan Looking at the DeepMind code, they assume multiple read/write heads, while the code we have is for multiple read heads, but only a single write heads. Just so we are in the clear, would you like to keep it that way, or should we accommodate multiple write heads as well? I leave that decision to you.

@jumerckx
Copy link
Collaborator

Nice work!
If the official code base can accomodate multiple write heads then perhaps that's a good idea to include as well.

@kraftpunk97-zz
Copy link
Collaborator

Okay, so the way things are right now, we're in a pickle, because Zygote doesn't allow for mutation of arrays, and that means we have to overcome a few challenges, since we're dealing with a lot arrays here. The forward pass is complete from what I can tell. But the gradients can't be properly calculated right now. If I haven't already uploaded the commits with the current progress, I'll do so in the next 48 hours.

@jpsamaroo
Copy link
Owner Author

Is it possible for this code to be efficient on the Zygote mutate branch? Or is it possible to use the buffer mechanism that Zygote has for mutation?

Either way, do what you think is best. Thankfully this model isn't ginormous by default, so we shouldn't run into problems with too many allocations from allocating new arrays every pass.

@kraftpunk97-zz
Copy link
Collaborator

Is it possible for this code to be efficient on the Zygote mutate branch? Or is it possible to use the buffer mechanism that Zygote has for mutation?

I'm not aware of any "mutate" branch in Zygote's repository. Please clarify @jpsamaroo.

@jpsamaroo
Copy link
Owner Author

FluxML/Zygote.jl#75

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants