Learning Repair Rules for Machine Learning Pipelines from AutoML Search Traces


We frame the task of improving predictive performance of an existing machine learning pipeline by performing a small modification as an analogue to automated program repair. In this setting, the existence of a similar pipeline with better performance, the modification that delivers that improvement, and the task of automatically generating and applying that modification are the analogues of bug, patch, and automated program repair, respectively. We develop a system, Janus, that learns to extract repair rules from a corpus of pipelines produced as a by-product of an automated machine learning search, an approach conceptually similar to learning patches from code corpora. Our experiments show Janus can improve pipelines more often than a baseline approach, with comparable improvements when both succeed, and resulting pipelines closer to the original input pipelines.

Under submission