Kiran Karkera

Email / LinkedIn / Github


As a Data Scientist, I am enthusiastic about using Machine Learning and Natural Language Processing (NLP) tools to build data products that solve challenging problems customers face today. I have implemented solutions that tackle Machine Learning at scale, where the data is at high volume/velocity/variety.

As an Applied Machine Learning engineer, I have wide exposure to both statistical as well as machine learning techniques. I’ve participated in several global Machine Learning competitions, and I have been placed among the top 1% of data scientists at (as of Dec 2013).

I’m passionate about Clojure and I have contributed to Open Source Machine Learning libraries in Clojure.

I have authored a book Building Probabilistic Graphical Models in Python, which is a gentle introduction to Graphical Models that have applications in Machine Learning and NLP.

I have filed two patents that help improve Femtocell Devices and Smallcell Device Management Server efficiency using machine learning approaches.

I’m a programmer with 16 years of product development experience, having donned the roles of a developer, team lead and architect in start-ups and product companies.

I have built products in varied domains such as Ad-tech, Customer Experience, Network and Device Management. I have also worked with many cross functional as well as globally diverse teams.

Open Source Contributions

I have contributed modules/algorithms to the following projects:

  • Machine Learning in Spark using Clojure :- sparkling
  • Clojure’s Weka wrapper :- clj-ml
  • Clojure wrapper for Word2Vec :- clojure-word2vec
  • Elsner-Charniak Chat thread disentanglement :- jakkur


Data scientist at Datacraft Sciences (May 2016 - present)

Senior Data Scientist at Eyeota (Jan 2016 - Mar 2016)

At Eyeota, I design and implement Machine Learning systems to solve these problems:

  • In-stream prediction of audience segments, on a data stream that is big on both Volume and Velocity. (Billions of data points daily, 1.5 Billion+ users monthly).
  • Tools for ad-hoc data analysis.

Our stack consisted of Clojure and Apache Spark, and we open-sourced a module which facilitates building of Machine Learning models on Apache Spark using Clojure.

Lead Data Scientist at Bridgei2i Analytics

(July 2014 - Dec 2015)

I wore two hats at Bridgei2i. As a Lead Data Scientist, I designed solutions using the appropriate tools from the domains of Machine Learning (ML) and Natural Language Processing (NLP) to solve the problems for the data products described below. I also wore an engineer’s hat to architect, develop and fine tune implementations of ML and NLP algorithms.

I led the Data Products team in building following solutions:

  • Extrack is a tool meant to help customers manage their Customer Experience. Extrack can analyze unstructured datasets such as customer support forums, emails and customer surveys. It gives actionable insights, by using tools such as unsupervised and semi-supervised Topic Models, search, keyword and sentiment extraction.
  • S-Reco is a recommendation engine for the consumer product goods industry. It helps salespersons increase revenue by providing recommendations on the SKUs that a store manager might be interested in. I led the implementation of a ‘white box’ Recommendation Engine using Apache Spark, that can scale to generating monthly recommendations for more than a million stores countrywide.
  • Lead Scoring Engine is a customizable platform that can help prioritize marketing generated leads (from platforms such as Eloqua and Marketo) so that critical leads can be driven by the sales team. This engine combines Lead data and External data sources and scores Leads using a toolbox of Machine Learning classifiers. It provides customers with ‘white box’ models that give insights into the factors that drive Lead Conversion.

Methods and Tools:

Topic Models, Text clustering, Apache Solr, Apache Spark, Scikit-learn, classifiers such as Random Forest, SVM, Neural Networks

Senior Technical Specialist at Alcatel-Lucent

(Aug 2009-Feb 2014)

I was a part of the Femtocell Device Management product, which enables operators to provision, configure, update and manage their Femtocell (and other TR-069 compliant) devices in the network. This product is part of the Home Device Manager suite of solutions, which manages more than 200 million devices worldwide.

Achievements in Device Management team:

  • Architected and implemented features in multiple product releases.
  • Filed 2 patents that used Machine Learning approaches to improve Femtocell device and Device Management server efficiency.
  • Implemented VMware-based cloud integration for product builds.
  • Top 10 finalist in ‘Ideaz Central”, an internal competition that awards innovation in Alcatel

Lead Engineer at Sasken Communications, May 2004- September 2009

System Engineer at a Telecom focused startup (Alopa Networks) 2000 - 2003


Reducing Energy Consumption of Small cell devices (filed)

India Patent Application 3532/DEL/2013 Co-Inventor: Rakesh Chella

Auto Configuration Servers (filed)

India Patent Application 803/DEL/2014



  • Machine Learning
  • Probabilistic Graphical Models
  • Neural Networks
  • Data Analysis
  • Natural Language Processing.

CDAC Diploma in Advanced Computing 1999 - 2000

Bharati Vidyapeeth, University of Mumbai

BE, Tele Communications, 1995 - 1999


Whitewater Kayaking, Climbing, Hindustani classical music