Support attention in decoder Reference: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html