Mining Nearby Repairs that Improve Machine Learning Pipeline Performance


We frame the task of improving predictive performance of an existing machine learning pipeline by performing a small modification as an analogue to automated program repair. In this setting, the existence of a similar pipeline with better performance, the modification that delivers that improvement, and the task of automatically generating and applying that modification are the analogues of bug, patch, and automated program repair, respectively. We develop a system, Janus, that mines repair rules from a large corpus of pipelines, an approach conceptually similar to learning patches from code corpora. Our experiments show Janus can improve performance in 16%-42% of the test pipelines in our experiments, outperforming baseline approaches in 7 of the 9 datasets in our evaluation.

Under submission