Normal case means that the speaker and language of input speech are the same as those of the training set. Emotion is neutral.
Corresponding model from left to right: Proposed model, MFCC-BLSTM, Groundtruth.
The speaker or language of input speech in this case is different from those in training set.
Corresponding model from left to right: Proposed model, MFCC-BLSTM, Groundtruth.
Speaker and language of the speech are the same as training set. Emotion is neutral.
Corresponding model: Left: Proposed model without energy. Right: Proposed model with energy
Normal case means that the speaker and language of input speech are the same as those of the training set. Neutral case can be found above.
Corresponding model from left to right: Proposed model, MFCC-BLSTM, Groundtruth.