AUTOMATIC ACCIACCATURA PREDICTION FOR CHINESE SINGING VOICE SYNTHESIS

Submitted to Interspeech 2022

Weiya You, Shaohuan Zhou, Shiyin Kang, Yuren You, Jiankun Hu, Deyi Tuo, Zhiyong Wu

ABSTRACT

The acciaccatura is a kind of ornaments which is very commonly used in playing or singing. The flexible use of the acciaccatura can make the singing more expressive. However, as far as we observe, there is no research on the analysis of the acciaccatura and its prediction. In this paper, we analyze a Chinese music score dataset with acciaccatura annotations. Base on the analysis results, we obtain the factors affecting the acciaccatura: duration and pitch, and use them as features, after being encoded by embedding, they are feed into BiLSTM-CRF models which have good performance in named entity recognition (NER) to predict the acciaccatura position and pitch. Finally, the ABX tests is used to verify that the music score containing the model’s predicted acciaccatura allowed the singing voice synthesis model to synthesize a more beautiful song.

ACCIACCATURA PREDICTION

Our goal is to predict the acciaccatura position and acciaccatura pitch from the original score. Finally, the acciaccatura score is used to synthesize a more beautiful song.

PREDICTION ACCIACCATURA POSITION IS THE SAME AS LABELING

The percentage of sentences with the exact same predicted acciaccatura position and labeling was 69.61%.

No Labeled music score Audio Without acciaccatura Audio Predicted music score Audio
1 007027_label 007027_no 007027_pred
2 015039_label 015039_no 015039_pred
3 077023_label 077023_no 077023_pred
4 095020_label 095020_no 095020_pred
5 187001_label 187001_no 187001_pred
6 242023_label 242023_no 242023_pred
7 059004_label 059004_no 059004_pred

LESS PREDICTION ACCIACCATURA POSITION THAN LABELING

The percentage of sentences with only partial acciaccatura labels missing was 16.99%.

No Labeled music score Audio Without acciaccatura Audio Predicted music score Audio
1 082005_label 082005_no 082005_pred
2 436007_label 436007_no 436007_pred
3 018007_label 018007_no 018007_pred
4 103011_label 103011_no 103011_pred
5 241037_label 241037_no 241037_pred

MORE PREDICTION ACCIACCATURA POSITION THAN LABELING

The percentage of sentences with more partial acciaccatura labels was 10.01%.

No Labeled music score Audio Without acciaccatura Audio Predicted music score Audio
1 156011_label 156011_no 156011_pred
2 029006_label 029006_no 029006_pred
3 052019_label 052019_no 052019_pred
4 439000_label 439000_no 439000_pred

SAME NUMBER, PREDICTION ACCIACCATURA POSITION OFFSET

The percentage of predicted sentences with the same number of acciaccatura but with acciaccatura position shifts was 2.42%.

No Labeled music score Audio Without acciaccatura Audio Predicted music score Audio
1 076024_label 076024_no 076024_pred
2 130040_label 130040_no 130040_pred
3 244042_label 244042_no 244042_pred

OTHER SITUATIONS

The percentage of sentences with other cases was 0.957%.

No Labeled music score Audio Without acciaccatura Audio Predicted music score Audio
1 003000_label 003000_no 003000_pred