Thanks for your code.
It helps me understand the BiDAF in details.
However, I found the model had no performance increasing.
Every epoch, the metric is always the same.
And then, I found it's the optimized gradients too small.
it's the order of 10^-3~10^-8.
I can't find what's wrong.
And I think your code is good to understand.
So, what may be the problem?