Rohan Bavishi

Applied Scientist
Amazon AGI

rohan.bavishi95bots-gotta-get-smarter-than-this@its-really-not-that-hardinspired-from-rohan-padhyegmail.com
rbavishibots-gotta-get-smarter-than-this@its-really-not-that-hardinspired-from-rohan-padhyecs.berkeley.edu
@code_monet
rbavishi
Google Scholar
CV

Hi, I’m an Applied Scientist at Amazon AGI working on foundation models and agents. Previously, I was a research scientist at Adept AI Labs working on agents as well. I completed my Ph.D. at the University of California, Berkeley, where I was advised by Koushik Sen. I was also a member of the Programming Systems group. My research focused on designing tools and techniques for improving the productivity of programmers, specifically data scientists. Check out the project descriptions for DataButler, VizSmith, Gauss, and AutoPandas below for a primer on the kinds of problems my theis tackled.

I’ve also researched techniques for automatic program repair in my internships with the PROSE team at MSR and the Software Systems Innovation Group at Fujitsu Research of America. At MSR, we developed techniques for fixing syntax errors in formula languages such as Excel and PowerApps. See the LaMirage project below for details. At Fujitsu, I worked on a system Phoenix, that leverages the rich development history of open-source projects on Github to automatically learn generic strategies to repair static analysis violations, such as those reported by Findbugs. The techniques behind Phoenix were published in FSE 2019.

I obtained my bachelor’s degree in computer science from the Indian Institute of Technology, Kanpur. My undergraduate research was focused on developing more precise bug localization techniques and was published in OOPSLA 2016. Here we combined model-checking techniques with soft invariants learned from regression tests to obtain better precision in reasoning about possibly faulty lines of code.

Feel free to contact me via email, or any of the platforms above. Cheers!

News

14-Sep-2022
Come see our latest demo!
06-Sep-2022
Started full time at Adept AI Labs!
12-Aug-2022
Filed my dissertation and officially graduated!
01-Jul-2022
Our paper on formula repair for low-code applications is conditionally accepted at OOPSLA 2022!
10-Apr-2022
I’m joining Adept AI Labs full-time as a research scientist!
01-Sep-2021
Gauss accepted at OOPSLA 2021!
07-Jul-2021
VizSmith accepted at ASE 2021!
24-May-2021
I will be interning with the PROSE team at Microsoft for Summer 2021.
29-Apr-2021
Passed my qualifying exam and now I’m officially a PhD candidate!
29-Oct-2020
I gave a tutorial on Gauss at Rise Camp 2020. Video is available here.
02-Feb-2020
Our tool paper submission for Phoenix was accepted to ICSE 2020 Demo track!
11-Dec-2019
Gave a talk on AutoPandas at Facebook
25-Oct-2019
Presented AutoPandas at OOPSLA 2019
17-Oct-2019
I gave a tutorial on using the Atlas framework to build synthesizers (part of AutoPandas) at Rise Camp 2019. Video is available here
27-Sep-2019
Had loads of fun chatting about AutoPandas at Fujitsu.
05-Sep-2019
I passed my preliminary examination at UC Berkeley!
30-Aug-2019
AutoPandas accepted at OOPSLA 2019!
01-Jul-2019
AutoPandas featured on Oreilly.
24-May-2019
Phoenix accepted at FSE 2019!
more...
less...

Projects

DataButler / Datana
Rich Natural-Language Interfaces for Data Science Code using Large Language Models

Recent advances in NLP, specifically the advent of large language models, have revolutionized program synthesis research. These models can output human-like given a textual context, which can be either natural language or existing/incomplete code.

While such advancements have opened up many exciting possibilities in using natural language as a modality for synthesis, it may be lacking when it comes to data science. What if a data scientist does not know how to express something in natural language that the models will pick up on? What if they do not know what is possible? What if they do not know what to do in the first place?

We combine code mining techniques and the code summarization ability of language models to build an autocompleting code search engine that allows scientists to use a handful of keywords to explore various possibilities along with previews. It also recommends next steps to perform when starting with a blank slate. Stay tuned for a demo!

Formula Repair for Excel/PowerApps
Combining the best of symbolic enumeration and language models for fast, precise repair

Excel is used by millions of people everyday. Maybe you’ve used it too. Have you ever written an Excel formula? If yes, chances are you’ve made some silly mistakes along the way - missing parentheses, forgetting a comma, leaving out an int-to-string transform etc. You may have also realized that Excel does not necessarily provide the most useful feedback regarding these errors - it does not point out the error location accurately every time, and often the error message is misleading. Can we do better?

In collaboration with the PROSE team at Microsoft, I worked on developing a formula repair technology that automatically suggests repairs for a faulty formula. Classical search-based repair techniques enumerate all possible repairs by leveraging the formal language specification. But these techniques often return too many repairs, and can be slow. We leverage the power of language models to bias the search towards likely error locations, and also rank the repairs by how similar it is to human-written formulas. Stay tuned for a release!

VizSmith
Synthesizing Visualization Code from Text Queries

Ever spent hours making plots using matplotlib or other visualization libraries? Such tools, although powerful, present a steep learning curve. Developers often end up searching on StackOverflow and copying and adapting code from answers. However this is non-trivial as understanding visualization code in the context of another data-set can be difficult and time-consuming.

VizSmith alleviates these issues by accepting both the data to visualize as well as a text query describing the visualization. VizSmith then seaches a database of visualization code snippets automatically mined from Kaggle to find the best fit and returns the produced visualizations along with readable, ready-to-use code. A manuscript is currently under submission. Try out VizSmith below!

Demo Video

Demo

Gauss
Synthesizing Table Transformation Programs by Leveraging User Interaction

Plain I/O examples for synthesizing table transformations (see AutoPandas below) lose out on readily available information, such as computational relationships between input and output. Also such I/O tables can be cumbersome to provide!

Gauss alleviates this issue by offering a special UI that the user can use to construct the I/O example which allows Gauss to record precise information about the inputs and output. Gauss then employs novel graph-based reasoning to vastly improve both synthesis time and reduce the burden of the user by allowing them to provide partial input-output examples while still returning correct solutions. A manuscript is currently under review. Try out Gauss below!

Demo Video

Demo

AutoPandas
Synthesizing Table Transformation Programs using Machine Learning

Pandas is a hugely popular Python library for table manipulation. However its size and complexity can be daunting to beginners.

AutoPandas helps automate coding in Pandas by generating Pandas code given an input table and the output table that should be produced. AutoPandas encodes the input-output table as a graph and leverages the recent advancements in graph neural networks to find the correct Pandas program. Links to the paper and code can be found in the OOPSLA 2019 paper. Try out AutoPandas below!

Demo Video

Demo

Phoenix
Learning to Repair Static Analysis Violations by Analyzing OSS

Static analysis tools help catch bugs in programs without having to execute them. Are they actually used? A 2017 Coverity Scan report estimates that 600k out of 1.1 million identified defects over 4600 OSS projects were fixed. However, there is still a large barrier to adoption because the defects have to be manually investigated, confirmed and fixed. Static analysis tools also report a large number of false positives.

Phoenix solves this problem by mining commit histories of large open source projects on Github for patches to static analysis violations reported by FindBugs. It then employs a novel program synthesis algorithm to generalize the patches into reusable repair templates or strategies which it then uses to fix new, unseen violations. Check out the FSE 2019 paper and the demo video below!

Tool Demo Video

Publications Dissertations

Tools and Techniques for Building Programming Assistants for Data Analysis
Rohan Bavishi
Ph.D. Thesis, UC Berkeley, 2022
Link
Neural-Backed Generators for Program Synthesis
Rohan Bavishi
Master's Thesis, UC Berkeley, 2019
Link

Conference Publications

VizSmith: Automated Visualization Synthesis by Mining Data-Science Notebooks
Rohan Bavishi, Shadaj Laddad, Hiroaki Yoshida, Mukul R. Prasad, and Koushik Sen
Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2021
PDF
Video
Demo
Gauss: Program Synthesis by Reasoning over Graphs
Rohan Bavishi, Caroline Lemieux, Koushik Sen, and Ion Stoica
ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA) 2021
DOI Link
PDF
Video
Demo
AutoPandas: Neural-Backed Generators for Program Synthesis
Rohan Bavishi, Caroline Lemieux, Roy Fox, Koushik Sen, and Ion Stoica
ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA) 2019
DOI Link
PDF
Slides
Video
Code
Demo
Phoenix: Data-Driven Synthesis of Repairs for Static Analysis Violations
Rohan Bavishi, Hiroaki Yoshida, and Mukul R. Prasad
ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2019
DOI Link
PDF
Data
* Code not available due to proprietary reasons
To Be Precise : Regression Aware Debugging
Rohan Bavishi, Awanish Pandey, and Subhajit Roy
ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA) 2016
DOI Link
PDF
Slides
Code

Tool Papers

Phoenix: A Tool for Automated Data-Driven Synthesis of Repairs for Static Analysis Violations
Hiroaki Yoshida, Rohan Bavishi, Keisuke Hotta, Yusuke Nemoto, Mukul R. Prasad, and Shinji Kikuchi
ACM/IEEE 42nd International Conference on Software Engineering (ICSE) 2020
DOI Link
PDF
Video
* Code not available due to proprietary reasons

Workshop Publications

Neural Inference of API Functions from Input–Output Examples
Rohan Bavishi, Caroline Lemieux, Neel Kant, Roy Fox, Koushik Sen, and Ion Stoica
Workshop on ML for Systems at NeurIPS 2018
PDF
Slides
Regression Aware Debugging for Mobile Applications
Rohan Bavishi, Awanish Pandey, and Subhajit Roy
International Workshop on Mobile Development (Mobile!) 2016
DOI Link
PDF

arXiv

Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts
Rohan Bavishi, Michael Pradel, and Koushik Sen
2018
Link
PDF
Code

Patents

Data-Driven Synthesis of Fix Patterns
Rohan Bavishi, Hiroaki Yoshida, and Mukul R. Prasad
2020
Link