Hello, may I ask you some questions about the training process?
I have modified the SR to 24kHz and HOP_SIZE to 300, which results in a 80Hz spectrum feature for input. I used my own dataset for training, and the training curve is like follows:

VQ loss is increasing, but the accuracy is at around 75%.
Is this a normal situtation?
In fact, I want to use this model for an unsupervised phone loss, but the input size is fixed. Thus, I also want to know, will the phonetic discrimination performance still be good, for other input with arbitrary length?
Thank you.