we're working on another version fixing the folds issue on Urbansound8k and will...

jononor · on Nov 27, 2019

Nice!

gidim · on Nov 27, 2019

just to clarify - are you referring to this experiment? https://www.comet.ml/demo/urbansound8k/be09e32700cd435fb6b55...

jononor · on Nov 27, 2019

Sure, that demonstrates the issue. Problem is with using train_test_split(X, yy, test_size=0.2..) - this assumes independent samples, which is violated for this dataset (because some come from same source audio files). The easiest (and completely acceptable) is to use one fold as the validation data, one fold for the test set, and the remaining folds as training.

This problem is unfortunately quite common even in academic papers using this dataset, even though the authors warn about it.

EDIT: There is one more issue with Urbansound8k folds, and that is that the difficulty of the various folds is quite different. So one should ideally report the performance across all folds (mean/std or boxplot). But this is a minor issue compared to data leakage.

PS: Nice use of Comet.ml platform this, collaborating online on improving the experimental setup :)

nikolaskaris · on Nov 27, 2019

Hey jononor — we've updated the post to split the training and test sets based on the folds. Good catch and thanks again for reporting this. Some of the experiments in the project will still have the old code, but the blog post will reflect this new train/test split.

jononor · on Dec 1, 2019

Nice. Did you update the reported results also? I think they will change quite a bit