Karan Vombatkere

PhD Candidate in Computer Science | Boston University

MCS 105A, Department of Computer Science
111 Cummington Mall, Boston, MA 02215

Email: username @ domain
where username = kvombat , domain =



I am a first-year Computer Science PhD student at Boston University, advised by Dr. Evimaria Terzi. I am a part of the Massive Data & Algorithms (MiDAS) research group at BU, and my research is primarily focused on data mining algorithms. My current research is on designing efficient approximation algorithms for submodular optimization problems, such as the Team Formation/Max-k Coverage problem.

My research interests are shaped by the diversity of my past experiences: I am excited about all things statistical, algorithmic and game theoretic. I am interested in studying about optimality and efficient strategies in environments of incomplete and imperfect information. I am curious about using machine learning and data-driven methods to formulate a better understanding of social systems. I am also interested in using data distributions to explain real-world phenomenon.

Before starting my PhD I worked for 3 years as a Data Scientist at IBM, where I developed machine learning solutions from the ground-up on exciting projects at the intersection of business and tech. I developed a bid-price optimization engine in Python, where I used hierarchical clustering and logistic model trees to build a microservice to handle real-time pricing requests. I also helped automate pre-surgical authorization for patients using NLP on clinical data. Prior to IBM I was at the University of Rochester, where I studied Physics and Electrical & Computer Engineering as an undergraduate, and explored a variety of topics such as computer audition (with Prof. Zhiyao Duan), computational physics and game theory. I then pursued a masters degree in Data Science with a focus in computational and statistical methods, and worked with Prof. Jiebo Luo.

I play tennis and soccer, and enjoy outdoor activities such as hiking, skiing and biking. I also like to challenge myself with chess, poker and other strategy games. I am an eager home cook and I love good food - do let me know if you're looking for good food around Boston/Cambridge!


Team Formation: Balancing Task Coverage and Maximum Expert Load

Develop efficient approximation algorithms for the team formation problem, with the goal of balancing task coverage and the maximum expert load.

Mining travel and weather patterns to understand the spread of COVID-19

I collaborated with Prof. Jiebo Luo to research the spread of COVID-19 along political lines in US States using transportation patterns and weather data. I wrote code to cluster and analyze population mobility features across Red and Blue states (as per the 2016 presidential election) and discovered significant correlation patterns between the way people travel and the COVID-19 infection growth rate from March to July 2020.

SBP-BRiMS 2021.

Displaying automatic lyrics for streaming audio

I was awarded a research fellowship by Xerox to conduct research under the guidance of Prof. Zhiyao Duan in the Audio Information Research Lab at the University of Rochester during the summer of 2016. The primary purpose of my research was to design a computational system that could follow live musical performances and display pre-encoded lyrics in real-time. I implemented a multi-threaded audio framework with a shared synchronous buffer to handle simultaneous recording and processing. I extracted chroma feature vectors from the waveform audio data and then used online dynamic time warping to align the live audio stream with a pre-annotated version of the audio based on their harmonic similarity, and display lyrics in real-time.

IEEE Signal Processing Magazine, Jan 2017


Boston University | Boston, MA

Doctoral Student

Computer Science PhD candidate advised by Dr. Evimaria Terzi.

IBM | Cambridge, MA

Data Scientist

At IBM, I designed real-world solutions and developed ML models from the ground up on exciting projects at the intersection of business and tech.
Under the guidance of Dr. Mark Freeman, I developed a novel algorithmic method in Python for business-to-business bid price optimization, and successfully deployed it as a microservice for a large telecommunications company.
I also worked with Watson Health to develop an NLP solution to automate pre-surgical authorization. My contributions included building a rules engine in Python and implementing algorithms to extract natural language features from patient clinical data.

University of Rochester | Rochester, NY

Teaching Assistant

I was awarded the Citation for Achievement in College Leadership in recognition of outstanding undergraduate teaching and research commitment. I led workshops, recitations and labs for groups of 15-20 students for the following courses across the Physics, Electrical & Computer Engineering and Math departments:

  • ECE 231: Applied Electromagnetics
  • ECE 111: Circuit Analysis; ECE 112: Digital Logic
  • MTH 161: Calculus I; MTH 162: Calculus II
  • PHY 113: Mechanics; PHY 122: Electromagnetism
  • AST 105 & AST 106: Introductory Astronomy

Research Fellow

I was awarded a research fellowship by Xerox to conduct research under the guidance of Prof. Zhiyao Duan in the Audio Information Research Lab during the summer of 2016. The primary purpose of my research was to design a computational system that could follow live musical performances and display pre-encoded lyrics in real-time.


Settlers of Catan AI Framework

Settlers of Catan boardgame built in Python. The goal of this project is to implement full multiplayer game functionality and use MCTS and reinforcement learning to build an AI player that can effectively explore-exploit heuristic strategies.

Xenou, Konstantia, Georgios Chalkiadakis, and Stergos Afantenos. "Deep Reinforcement Learning in Strategic Board Game Environments." European Conference on Multi-Agent Systems. Springer, Cham, 2018.
Gendre, Quentin, and Tomoyuki Kaneko. "Playing Catan with Cross-Dimensional Neural Network." International Conference on Neural Information Processing. Springer, Cham, 2020.

Tennis Player Performance Prediction

Under the guidance of Prof. Jiebo Luo, I used efficient feature extraction on historical tennis match data, combined with a machine learning implementation in Python to predict the likelihood of professional tennis player success with 80% accuracy. I implemented code to create player-specific statistical feature sets aggregated from individual match data, and used neural network and logistic regression classification models to categorize player success.

Adversarial Search: Ultimate Tic Tac Toe

An Ultimate Tic Tac Toe framework in Java, with an implementation of adversarial search using MiniMax with Alpha-Beta pruning. Ultimate Tic Tac Toe comprises nine 3x3 Tic Tac Toe boards, and the goal is to win 3 boards. I also developed a heuristic AI player, which was tested to beat a control player in 99 out of 100 games.

German WWII Enigma Machine

A complete implementation of the Enigma Machine in Python. I use an object-oriented framework to design the plugboard, reflector and rotor set, allowing for full encryption and decryption functionality. I also implement code to crack the Enigma cipher, using a known-plaintext attack methodology.

Non-Linear Dynamics of the Damped and Driven Pendulum

I investigated the non-linear dynamics of the damped and driven pendulum and developed a theoretical framework for the system. I then computationally solved the classical mechanics problem using Mathematica to discover regions of chaotic and non-chaotic motion.

Augmented Audio Reality: Binaural Headphones

Designed, built and tested binaural headphones with real-time recording and filtering capability, with a 12ms latency. Modeled the head-related tranfer function (HRTF) using a Neumann head microphone, and implemented the FFT algorithm in C++ to enable real-time audio filtering.

Brownian Motion Stock Price Evolution

A statistical framework in Python to predict stock price evolution using geometric Brownian motion. The model was tested to have under 5% error using Monte Carlo simulations on 2 years of historical Nike stock prices.

Reference: Dmouj, Abdelmoula. "Stock price modelling: Theory and Practice." Masters Degree Thesis, Vrije Universiteit (2006).


Boston University

Aug 2021 - Present

Doctor of Philosophy in Computer Science

Focus in Machine Learning and Data Mining

University of Rochester

Aug 2017 - May 2018

Master of Science in Data Science

Focus in Computational & Statistical Methods
Relevant coursework: Artificial Intelligence, Statistical Learning, Data Mining, Database Systems, Cryptography

University of Rochester

Aug 2013 - May 2017

Bachelor of Science in Electrical & Computer Engineering
Bachelor of Arts in Physics

Highest Distinction, Magna Cum Laude
Relevant coursework: Computational Physics, Data Structures & Algorithms, Robot Control, Game Theory, Classical Mechanics, Linear Algebra, Signal Processing