Peter Widhalm, M. Leodolter, N. Brändle:
"Into the Wild - Avoiding Pitfalls in the Evaluation of Travel Activity Classifiers";
in: "Human Activity Sensing. Corpus and Applications",
Most submissions to the 2018 Sussex-Huawei Locomotion-Transportation (SHL) recognition challenge strongly overestimated the performance of their algorithms in relation to their performance achieved on the challenge evaluation data. Similarly, recent studies on smartphone based trip data collection promise accurate and detailed recognition of various modes of transportation, but it appears that in field tests the available techniques cannot live up to the expectations. In this chapter we experimentally demonstrate potential sources of upward scoring bias in the evaluation of travel activity classifiers. Our results show that (1) performance measures such as accuracy and the average class-wise F1 score are sensitive to class prevalence which can vary strongly across sub-populations, (2) cross-validation with random train/test splits or large number of folds can easily introduce dependencies between training and test data and are therefore not suitable to reveal overfitting, and (3) splitting the data into disjoint subsets for training and test does not always allow to discover model overfitting caused by lack of variation in the data.
"Offizielle" elektronische Version der Publikation (entsprechend ihrem Digital Object Identifier - DOI)
Erstellt aus der Publikationsdatenbank des AIT Austrian Institute of Technology.