I’m a researcher working at the intersection of machine learning and software engineering. I currently work as a staff software engineer in Google’s DevAI team, where we build machine learning systems to make Google developers more productive.

Previously, I was a senior researcher on the PROSE team at Microsoft, where I worked on developing state-of-the-art program synthesis technologies to make software development more accessible, productive, and fun.

I received a PhD in Computer Science (CS) from MIT under the supervision of Martin Rinard, an MS in CS from NYU working with Dennis Shasha, and a BA in Economics from University of Pennsylvania. I’m originally from Costa Rica.

Random blog posts

Publications

[1] Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen 2025. DataVinci: Learning syntactic and semantic string repairs. to appear SIGMOD 2025. (2025).

[2] Pat Rondon, Renyao Wei, José Cambronero, Jürgen Cito, Aaron Sun, Siddhant Sanyam, Michele Tufano, Satish Chandra 2025. Evaluating agent-based program repair at google. arXiv preprint arXiv:2501.07531 (to appear ICSE SEIP 2025). (2025).

[3] Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Junyu Xiong, Mengyu Zhou, Yun Lin, José Cambronero, Yeye He, Shi Han, Dongmei Zhang 2024. Encoding spreadsheets for large language models. Proceedings of the 2024 conference on empirical methods in natural language processing (Miami, Florida, USA, Nov. 2024), 20728–20748.

[4] Usneek Singh, José Cambronero, Sumit Gulwani, Aditya Kanade, Anirudh Khatry, Vu Le, Mukul Singh, Gust Verbruggen 2024. An empirical study of validating synthetic data for formula generation. arXiv preprint arXiv:2407.10657 (to appear NAACL Findings 2025). (2024).

[5] Shraddha Barke, Christian Poelitz, Carina Suzana Negreanu, Benjamin Zorn, José Cambronero, Andrew D Gordon, Vu Le, Elnaz Nouri, Nadia Polikarpova, Advait Sarkar, others 2024. Solving data-centric tasks using large language models. NAACL 2024. (2024).

[6] Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen 2023. CodeFusion: A pre-trained diffusion model for code generation. Proceedings of the 2023 conference on empirical methods in natural language processing (Singapore, Dec. 2023), 11697–11708.

[7] Mukul Singh, José Cambronero Sánchez, Sumit Gulwani, Vu Le, Carina Negreanu, Mohammad Raza, Gust Verbruggen 2023. Cornet: Learning table formatting rules by example. Proc. VLDB Endow. 16, 10 (Jun. 2023), 2632–2644. DOI:https://doi.org/10.14778/3603581.3603600.

[8] Andrew D Gordon, Carina Negreanu, José Cambronero, Rasika Chakravarthy, Ian Drosos, Hao Fang, Bhaskar Mitra, Hannah Richardson, Advait Sarkar, Stephanie Simmons, others 2023. Co-audit: Tools to help humans double-check AI-generated content. arXiv preprint arXiv:2310.01297 (PLATEAU 2024). (2023).

[9] Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Gust Verbruggen 2023. EmFore: Online learning of email folder classification rules. Proceedings of the 32nd ACM international conference on information and knowledge management (New York, NY, USA, 2023), 2280–2290.

[10] Harshit Joshi, Abishai Ebenezer, José Cambronero, Sumit Gulwani, Aditya Kanade, Vu Le, Ivan Radiček, Gust Verbruggen 2023. FLAME: A small language model for spreadsheet formulas. arXiv preprint arXiv:2301.13779 (AAAI 2024). (2023).

[11] José Cambronero, Sumit Gulwani, Vu Le, Daniel Perelman, Arjun Radhakrishna, Clint Simon, Ashish Tiwari 2023. FlashFill++: Scaling programming by example by cutting to the chase. Proceedings of the ACM on Programming Languages. 7, POPL (2023), 952–981.

[12] Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Elnaz Nouri, Mohammad Raza, Gust Verbruggen 2023. FormaT5: Abstention and examples for conditional table formatting with natural language. VLDB 2024. (2023).

[13] Tung Phung, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, Gustavo Soares 2023. Generating high-precision feedback for programming syntax errors using large language models. EDM 2023. (2023).

[14] Tung Phung, Victor-Alexandru Pădurean, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, Gustavo Soares 2023. Generative AI for programming education: Benchmarking ChatGPT, GPT-4, and human tutors. Proceedings of the 2023 ACM conference on international computing education research - volume 2 (New York, NY, USA, 2023), 41–42.

[15] Harshit Joshi, José Cambronero Sanchez, Sumit Gulwani, Vu Le, Ivan Radiček, Gust Verbruggen 2023. Repair is nearly generation: Multilingual program repair with LLMs. Proceedings of the thirty-seventh AAAI conference on artificial intelligence and thirty-fifth conference on innovative applications of artificial intelligence and thirteenth symposium on educational advances in artificial intelligence (2023).

[16] Ananya Singha, José Cambronero, Sumit Gulwani, Vu Le, Chris Parnin 2023. Tabular representation, noisy operators, and impacts on table structure understanding tasks in LLMs. arXiv preprint arXiv:2310.10358 (Table Representation Learning at NeurIPS 2023). (2023).

[17] Rohan Bavishi, Harshit Joshi, José Cambronero, Anna Fariha, Sumit Gulwani, Vu Le, Ivan Radiček, Ashish Tiwari 2022. Neurosymbolic repair for low-code formula languages. Proc. ACM Program. Lang. 6, OOPSLA2 (Oct. 2022). DOI:https://doi.org/10.1145/3563327.

[18] Bram Wasti, José Pablo Cambronero, Benoit Steiner, Hugh Leather, Aleksandar Zlateski 2022. LoopStack: A lightweight tensor algebra compiler stack. arXiv preprint arXiv:2205.00618. (2022).

[19] Jialu Zhang, José Cambronero, Sumit Gulwani, Vu Le, Ruzica Piskac, Gustavo Soares, Gust Verbruggen 2022. Repairing bugs in python assignments using large language models. OOPSLA 2024. (2022).

[20] Limor Appelbaum, Alexandra Berg, Jose Pablo Cambronero, Thurston Hou Yeen Dang, Charles Chuan Jin, Lori Zhang, Steven Kundrot, Matvey Palchuk, Laura A Evans, Irving D Kaplan, others 2021. Development of a pancreatic cancer prediction model using a multinational medical records database. American Society of Clinical Oncology.

[21] Fatjon Zogaj, José Pablo Cambronero, Martin C Rinard, Jürgen Cito 2021. Doing more with less: Characterizing dataset downsampling for AutoML. Proceedings of the VLDB Endowment. 14, 11 (2021), 2059–2072.

[22] Thurston HY Dang, Jose P Cambronero, Martin C Rinard 2021. Inferring drop-in binary parsers from program executions. arXiv preprint arXiv:2104.09669. (2021).

[23] Malavika Samak, Jose Pablo Cambronero, Martin C Rinard 2021. Searching for replacement classes. arXiv preprint arXiv:2110.05638. (2021).

[24] José P Cambronero, Jürgen Cito, Martin C Rinard 2020. AMS: Generating AutoML search spaces from weak specifications. Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering (2020), 763–774.

[25] Limor Appelbaum, Jose Pablo Cambronero, Karla Pollick, George Silva, Jennifer P Stevens, Harvey J Mamon, Irving D Kaplan, Martin Rinard 2020. Development and validation of a pancreatic cancer prediction model from electronic health records using machine learning. American Society of Clinical Oncology.

[26] Limor Appelbaum, José P Cambronero, Jennifer P Stevens, Steven Horng, Karla Pollick, George Silva, Sebastien Haneuse, Gail Piatkowski, Nordine Benhaga, Stacey Duey, others 2020. Development and validation of a pancreatic cancer risk model for the general population using electronic health records: An observational study. European Journal of Cancer. 143, (2020), 19–30.

[27] José P Cambronero, Thurston HY Dang, Nikos Vasilakis, Jiasi Shen, Jerry Wu, Martin C Rinard 2019. Active learning for software engineering. Proceedings of the 2019 ACM SIGPLAN international symposium on new ideas, new paradigms, and reflections on programming and software (2019), 62–78.

[28] José P Cambronero, Martin C Rinard 2019. AL: Autogenerating supervised learning programs. Proceedings of the ACM on Programming Languages. 3, OOPSLA (2019), 1–28.

[29] José Pablo Cambronero, Jiasi Shen, Jürgen Cito, Elena Glassman, Martin Rinard 2019. Characterizing developer use of automatically generated patches. 2019 IEEE symposium on visual languages and human-centric computing (VL/HCC) (2019), 181–185.

[30] Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, Satish Chandra 2019. When deep learning met code search. Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering (2019), 964–974.

[31] Jose Cambronero, Phillip Stanley-Marbell, Martin Rinard 2018. Incremental color quantization for color-vision-deficient observers using mobile gaming data. arXiv preprint arXiv:1803.08420. (2018).

[32] José Cambronero, John K Feser, Micah J Smith, Samuel Madden 2017. Query optimization for dynamic imputation. Proceedings of the VLDB Endowment. 10, 11 (2017), 1310–1321.