![]() In the first image, blue indicates frame prediction, red indicates onset prediction, and magenta indicates frame and onset prediction overlap. Cases where a note briefly reactivated after being active for a while were also removed because a second onset for that note was not detected. Most of the notes that were active for only a few frames did not have a corresponding onset detection and were removed. The second image shows the frame results after being restricted by the onset detector. There are several examples of notes that either last for only a few frames or that reactivate briefly after being active for a while. The first image shows the results from the frame and onset detectors. The figure below illustrates the importance of restricting model output based on the onset detector. Because the weight vector assigns higher weights to the early frames of notes, the model is incentivized to predict the beginnings of notes accurately, thus preserving the most important musical events of the piece. Within the frame-based loss term, we apply a weighting to encourage accuracy at the start of the note. Our loss function is the sum of two cross-entropy losses: one from the onset side and one from the frame side. We use the output from the onset detector in two ways: we feed the raw output of that detector into the frame detector as an additional input, and we also restrict the final output of the model to start new notes only when the onset detector is confident that a note onset is in that frame. Previous models used only a single stack, but we found that by separating out the onset detection task we were able to achieve much higher accuracy. The reason our model works as well as it does is because we split the task of note detection across two stacks of neural networks: one stack is trained to detect only onset frames (the first few frames of every note) and one stack is trained to detect every frame where a note is active. There are definitely some mistakes, but it does a good job in terms of capturing harmony, melody and even rhythm. The examples above are a good illustration of the performance of our system. More metrics and details available in our paper. More technical details are available in our paper on arXiv: Onsets and Frames: Dual-Objective Piano Transcription. We’ve also made the source code available on GitHub for both Python and JavaScript. You can try out our model with your own piano recordings in your browser by visiting Piano Scribe or the Onsets and Frames Colab Notebook. We’re able to achieve a new state of the art by using CNNs and LSTMs to predict pitch onset events and then using those predictions to condition framewise pitch predictions.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |