Rohan Bavishi's Headshot

PhD Candidate
University of California Berkeley
Department of Computer Science
Google Scholar

Hi, I’m a fourth-year PhD student at the University of California, Berkeley, advised by Koushik Sen. I am a member of the Programming Systems group. My research focuses on designing tools and techniques for improving the productivity of programmers, specifically data scientists.

A first step in this direction was AutoPandas, an input-output example based program-synthesis system for the Pandas data-science library in the Python ecosystem. AutoPandas helps boost productivity by helping users bypass document consultation and develop pipelines quickly. The work was published at OOPSLA 2019. It has also received coverage via internet blogs. I am currently working on leveraging the rich source code available on platforms such as Kaggle to power synthesis, recommendation and search engines for data transformation and visualization code (see the VizSmith project).

Prior to this, I interned at the Fujitsu Laboratories of America, where I worked on a system Phoenix, that leverages the rich development history of open-source projects on Github to automatically learn generic strategies to repair static analysis violations, such as those reported by Findbugs. The techniques behind Phoenix were published in FSE 2019.

I obtained my bachelor’s degree in computer science from the Indian Institute of Technology, Kanpur. My undergraduate research was focused on developing more precise bug localization techniques and was published in OOPSLA 2016. Here we combined model-checking techniques with soft invariants learned from regression tests to obtain better precision in reasoning about possibly faulty lines of code.

Feel free to contact me via email, or any of the platforms above. Cheers!
Synthesizing Visualization Code from Text Queries

Ever spent hours making plots using matplotlib or other visualization libraries? Such tools, although powerful, present a steep learning curve. Developers often end up searching on StackOverflow and copying and adapting code from answers. However this is non-trivial as understanding visualization code in the context of another data-set can be difficult and time-consuming.

VizSmith alleviates these issues by accepting both the data to visualize as well as a text query describing the visualization. VizSmith then seaches a database of visualization code snippets automatically mined from Kaggle to find the best fit and returns the produced visualizations along with readable, ready-to-use code. A manuscript is currently under submission. Try out VizSmith below!

Synthesizing Table Transformation Programs by Leveraging User Interaction

Plain I/O examples for synthesizing table transformations (see AutoPandas below) lose out on readily available information, such as computational relationships between input and output. Also such I/O tables can be cumbersome to provide!

Gauss alleviates this issue by offering a special UI that the user can use to construct the I/O example which allows Gauss to record precise information about the inputs and output. Gauss then employs novel graph-based reasoning to vastly improve both synthesis time and reduce the burden of the user by allowing them to provide partial input-output examples while still returning correct solutions. A manuscript is currently under review. Try out Gauss below!

Synthesizing Table Transformation Programs using Machine Learning

Pandas is a hugely popular Python library for table manipulation. However its size and complexity can be daunting to beginners.

AutoPandas helps automate coding in Pandas by generating Pandas code given an input table and the output table that should be produced. AutoPandas encodes the input-output table as a graph and leverages the recent advancements in graph neural networks to find the correct Pandas program. Links to the paper and code can be found in the OOPSLA 2019 paper. Try out AutoPandas below!

Learning to Repair Static Analysis Violations by Analyzing OSS

Static analysis tools help catch bugs in programs without having to execute them. Are they actually used? A 2017 Coverity Scan report estimates that 600k out of 1.1 million identified defects over 4600 OSS projects were fixed. However, there is still a large barrier to adoption because the defects have to be manually investigated, confirmed and fixed. Static analysis tools also report a large number of false positives.

Phoenix solves this problem by mining commit histories of large open source projects on Github for patches to static analysis violations reported by FindBugs. It then employs a novel program synthesis algorithm to generalize the patches into reusable repair templates or strategies which it then uses to fix new, unseen violations. Check out the FSE 2019 paper and the demo video below!

Publications Conference PublicationsTool PapersWorkshop PublicationsarXivDissertationsPatents