Skip to content

Conversation

@avan06
Copy link

@avan06 avan06 commented Aug 7, 2025

When converting music with basic-pitch, the output is mistakenly truncated at the end for unknown reasons. This issue becomes more noticeable as the audio length increases. After repeated testing, it was found that simply adding 20 seconds of silence to the input audio in the predict function of basic_pitch/inference.py can prevent this issue from occurring.

Since basic-pitch trims silence before outputting, the added 20 seconds of silence here will not result in a longer output.


Modified the predict function in inference.py to always append 20 seconds of silence to the input audio before running inference.

This prevents the model from incorrectly truncating the tail end of the audio, which was happening on long, continuous files due to CNN edge effects.

Modified the `predict` function in `inference.py` to always append 20 seconds of silence to the input audio before running inference.

This prevents the model from incorrectly truncating the tail end of the audio, which was happening on long, continuous files due to CNN edge effects.
@hyperc54
Copy link
Collaborator

Hi!
Thanks for raising this issue and starting the investigation.

It's good to know that adding 20s of samples at the end seems to mitigate the issue, although I think we'd like to find the root cause before making any changes here, which I'm happy to help doing!

Could you share more details to help reproduce the issue? (eg: audio file tested, command run, etc.)

@avan06
Copy link
Author

avan06 commented Aug 15, 2025

Hi,

Sure, no problem. I can provide the details to reproduce this issue.

  1. The music source I used is the following YouTube video, which I downloaded and converted to FLAC. Its length is 38:11.
    https://www.youtube.com/watch?v=JkWeyX7Hquc

  2. Below is the script I used for testing:

from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH

audio_path = "(GB)ONI V 隠忍を継ぐ者⧸Oni V: Innin wo Tsugumono-Soundtrack [JkWeyX7Hquc].flac"

model_output, midi_data, note_events = predict(
    audio_path,
    ICASSP_2022_MODEL_PATH,
    onset_threshold=0.55,
    frame_threshold=0.25,
    minimum_note_length=100,
    minimum_frequency=50,
    maximum_frequency=3000
)

midi_data.write("ONI V.mid")
  1. The result after running basic_pitch predict has a length of 37:53.

Please help confirm, thank you.


By the way, below is the log from my execution:

>python test.py
WARNING:root:Coremltools is not installed. If you plan to use a CoreML Saved Model, reinstall basic-pitch with `pip install 'basic-pitch[coreml]'`
WARNING:root:tflite-runtime is not installed. If you plan to use a TFLite Model, reinstall basic-pitch with `pip install 'basic-pitch tflite-runtime'` or `pip install 'basic-pitch[tf]'`
WARNING:root:Tensorflow is not installed. If you plan to use a TF Saved Model, reinstall basic-pitch with `pip install 'basic-pitch[tf]'`
Predicting MIDI for (GB)ONI V 隠忍を継ぐ者⧸Oni V: Innin wo Tsugumono-Soundtrack [JkWeyX7Hquc].flac...

@hyperc54
Copy link
Collaborator

Hi @avan06,

Sorry for the delay in answering.
I have been able to find the root cause for the issue you're highlighting and tentatively fixed it in this PR. If the team approves of the fix, I would suggest we close your PR.

Let me know if you would like to co-author commits in the other PR I created to thank you for your contribution in finding the bug!

@avan06
Copy link
Author

avan06 commented Oct 29, 2025

Hi @hyperc54,
No problem on my side — as long as the issue is resolved, that’s what matters. Many thanks to the team for your hard work!

@avan06 avan06 closed this Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants