KAUSHIK KUMAR

I'm working late... "cause I'm a Data Scientist"

view resume Personal Statement

about me

Hi! I'm Kaushik Kumar, a Data Scientist and a Master’s in Data Science student at the University of Arizona. My fascination with AI began back in school when I first learned how closely it mirrors the human brain. That curiosity took me to a research internship at the National University of Singapore, where I explored deep learning. In my final year of undergrad, I lost my father in a car accident caused by a faulty vehicle, the event that deepened my resolve to work on safety-driven AI. That journey led me to Johnson Electric, where I built AI agents and end-of-line testing systems to ensure no faulty components left production.

Outside of AI and data, football is my first love—I'm a trained footballer, and beyond the competition, it’s the clarity and freedom the game brings me that I cherish most. I’m also disciplined and committed, qualities that resonate with my achievement as a national Karate champion. I truly believe learning doesn’t just happen in classrooms or through code but rather it comes from all the little experiences in life. I'm always curious, always exploring, and constantly looking for new ways to grow.

email

kaushikkumar.208@gmail.com

linkedin

Kaushik Kumar

education

2023-2025

Master of Science

Data Science



University of Arizona

Arizona, USA

Coursework: Cloud Computing, Applied NLP, Advanced Machine Learning, Neural Networks, MBA Advanced Field Projects, SQL/NoSQL Databases, Data Visualization and Analytics, Data Mining/Discovery, Foundations of Data Scinece, Data Ethics

2019-2023

Bachelor of Engineering

Computer Science Engineering



Vels Institute of Science, Technology and Advanced Studies

Pallavaram, Chennai, India

Coursework: Programming for Problem Solving, Adv. Calculus and Complex Analysis, Adv. Python Programming, Computer Communication Networks, Machine Perception with Cognition, Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, Robotics, Operating Systems, Database Management Systems, Software Engineering, Data Structures and Algorithms, Computer Organization and Architecture, Natural Language Processing, Operating Systems, Database Management Systems, Genetic Algorithms, Mobile Application Development.

professional experience

Experience Banner
  • Jan 2025 - Present

    Co-Founder & Chief Technology Officer (CTO)



    Kamuit

    Remote

    As a Co-Founder and CTO of Kamuit, I drive the technical vision and execution of our detour-aware, community-first ridesharing platform, ensuring that everyday commutes are transformed into safe, efficient, and shared journeys. I architect and implement our driver assignment, detour-aware matching, and fallback logic using H3 - Hexagonal Hierarchical Edge-Cutting Grid System, Alonso Moro Algorithm, FastAPI, PostgreSQL + PostGIS, and Google Maps/Places APIs, balancing scalability with real-time performance. I designed and deployed multi-layered matching algorithms, combining spatial filtering, polyline overlap, and ETA-based detour scoring—to minimize driver inconvenience while maximizing match success rates. I lead the development of robust fallback and cancellation mechanisms with queue-based retries, timeouts, and fault-tolerant recovery, ensuring reliability even under driver rejections, GPS failures, or API outages. I oversee integration of Stripe Connect for cost-sharing payments, secure one-time login flows, and trust/safety modules that enforce driver verification and community reputation. Collaborating with engineers and stakeholders, I ensure metrics-driven development—tracking detour scores, match success rate, system latency, and CO₂ offset—to guide continuous improvement and investor-ready demonstrations.

  • March 2024 - Present

    Graduate Research Assistant



    Act Lab, University of Arizona

    Tucson, AZ, USA

    At the Adaptive Control Technologies (ACT) Lab, I developed ShieldNN-AM, a dual-agent reinforcement learning framework that enhanced safety in autonomous driving by introducing adaptive safety margins to Control Barrier Functions (CBFs). I implemented predictive safety filters in the CARLA simulator, achieving zero collision rates while improving track completion from 94% to 98% and boosting speed efficiency by 20%. I engineered a multi-component reward system combining safety, performance, and efficiency signals, and optimized GPU training pipelines, reducing training time by . Through theoretical analysis and simulation experiments, I demonstrated how adaptive margins preserve provable safety guarantees while enabling aggressive-but-safe performance.

    March 2024 - Present

    Graduate Research Assistant

  • May 2024 - Present

    Graduate Tutor


    SALT Center, University of Arizona

    Tucson, AZ, USA

    At the SALT Center, I tutored 50+ undergraduate studentsComputer Science, Information Management, and Information Technology courses. I created innovative, student-centered teaching strategiesproblem-solving confidence by equipping students with tools for debugging, time management, and independent learning. This role strengthened my ability to translate complex technical concepts into clear, practical explanations, while fostering resilience and growth in students navigating rigorous STEM coursework.

  • August 2024 - December 2025

    Data Engineer and Analyst Extern



    Banner Health

    Phoenix, AZ, USA

    At Banner Health, I worked with a cross-functional team to deliver a data-driven strategy for hospital expansion and senior health equity across Arizona, Colorado, and Utah. I analyzed Medicare inpatient discharge data, chronic disease prevalence, and social determinants of health (SDOH) to identify service gaps and quantify community needs. My modeling revealed that housing insecurity was the strongest predictor of chronic disease burden, while hotspot mapping flagged high-growth counties like Pinal (AZ) and El Paso (CO) as prime expansion opportunities. I synthesized findings into interactive heatmaps, risk indices, and regression models, enabling Banner to target underserved populations such as dual-eligible seniors and rural communities with elevated diabetes and depression rates. The final recommendations balanced capacity growth in core markets (Phoenix, Tucson) with equity-focused interventions (community health workers, mobile clinics, behavioral health integration), directly aligning expansion with mission-driven community impact

    August 2024 - December 2025

    Data Engineer and Analyst Extern

  • June 2024 - August 2024

    Software Development Intern (GenAI)



    Selector AI

    Santa Clara, CA, USA

    At Selector AI, I advanced the Logminer observability platform, which ingests machine logs through rsyslog, Promtail, and Kafka, and surfaces insights in Grafana dashboards for real-time monitoring. I replaced the system’s Random Forest classifier with BM25S, a modern sparse retrieval algorithm, enabling orders-of-magnitude faster and more reproducible log event classification across variants This involved benchmarking multiple BM25 variants, implementing eager sparse scoring, and optimizing throughput so the pipeline could handle millions of daily log events with low latency. I upgraded the Logminer codebase from Python 3.8 to 3.12, modernizing UDF normalization, inference, and NER training modules, while ensuring compatibility with Kafka-based distributed log streams. Working closely with Selector’s observability engineers, I strengthened the platform’s ability to correlate logs with metrics and alerts, empowering enterprise customers to identify anomalies, trace root causes, and reduce mean time to resolution (MTTR) within their infrastructure.

  • June 2023 - July 2024

    Data Scientist

       

    Johnson Electric

    Chennai, TN, India

    As a Data Scientist at Johnson Electric, I designed and deployed AI-powered solutions that transformed manufacturing productivity, quality assurance, and enterprise AI adoption. My most impactful contribution was developing a Kalman Filter–based model, coupled with SQL-driven data pipelines, to predict mass outflow in Tesla AGP Water Pumps. Integrated into End-of-Line (EOL) testing via Dockerized deployments on VMs, this solution reduced testing time by 72% for 80W pumps and 50% for 50W pumps—delivering monthly cost savings of $74K. I also improved actuator defect detection by building an ensemble model (Gradient Boosting + Random Forest) that analyzed vibration frequency data in real time, achieving 93% accuracy and reducing test cycle times by 83%. Results were surfaced via PyQt5 GUIs and Power BI dashboards, enabling data-driven decisions on the factory floor. For anomaly detection, I designed Power BI dashboards with SQL + statistical divergence methods (JSD, KL), achieving zero false positives in real-time monitoring. I further optimized leakage testing of ITMS Gen2 pumps by applying curve fitting, SVR, and skew-normal modeling, cutting test times from 160s to 45s with a 0.93 correlation coefficient. Beyond manufacturing, I spearheaded enterprise AI initiatives, including PLM Teamcenter chatbots powered by Streamlit, Azure Blob Storage, Microsoft Graph API, Marqo AI, Solr, and Azure OpenAI GPT models. These tools automated internal knowledge retrieval, integrated with SharePoint, and featured feedback loops stored in PostgreSQL for reliability and security. I also advanced predictive defect analysis via SNADE AI, leveraging hybrid Gradient Boosting and Random Forest methods to classify actuators in 8s cycles, and worked on synthetic data generation using GANs to improve anomaly detection across production lines. Through these efforts, I delivered measurable gains in manufacturing efficiency, defect prevention, predictive maintenance, and digitalization, establishing scalable AI pipelines that bridged research innovation with enterprise-grade deployment.

    June 2023 - July 2024

    Data Scientist

  • Dec 2022 - Jan 2023

    Data Science Researcher



    National University of Singapore

    Singapore

    In a research collaboration with the National University of Singapore (NUS), I led a project on Language-Invariant Hate Speech and Gender Bias Detection, developing an AI-powered content moderation system capable of identifying abusive speech across multiple languages. I combined an n-gram–based language identification module with feature extraction and classification using mBERT (Multilingual BERT) and GRU, fine-tuning hyperparameters and leveraging attention mechanisms to achieve high accuracy while minimizing false positives. I processed and cleaned large multilingual datasets, designed a scalable GPU-based training workflow, and implemented a robust pipeline for cross-lingual generalization. The research outcomes were later adapted into a business case for Hewlett Packard Enterprise (HPE), exploring applications in automated content regulation and ethical AI systems. Beyond model development, I played a critical role in deployment: building a Flask-based RESTful API, integrating it with a React frontend for real-time moderation, and deploying the system on a Microsoft Azure VM instance for scalability testing. As Team Leader, I coordinated a team of six multinational researchers, led rigorous evaluation studies, and presented results to HPE executives—demonstrating the system’s feasibility for enterprise adoption, responsible AI governance, and commercial deployment.

  • Jan 2022 - Mar 2022

    Full Stack Developer Intern

       

    Suvidha Foundation

    Remote, India

    As a Full Stack Developer Intern, I built a comprehensive university management system with end-to-end ownership of design, development, and deployment. On the frontend, I developed responsive interfaces using JavaScript, while implementing server-side logic in PHP for secure data handling and business workflows. For deployment, I integrated Heroku for cloud hosting and Apache XAMPP for local testing and development, ensuring smooth transitions from development to production environments. Security was prioritized through JWT-based authentication and RESTful APIs for robust and secure data exchange. I also automated CI/CD pipelines and managed version control via Git, streamlining deployments and improving reliability. My contributions to the project’s functionality, scalability, and automation earned me the company’s “Golden Intern” award for outstanding performance.

    Jan 2022 - Mar 2022

    Full Stack Developer Intern

Technical Expertise

Programming & Development

Languages: Python, SQL, JavaScript, PHP, Solidity, R, Bash

Frameworks: TensorFlow, PyTorch, Scikit-learn, Keras, OpenCV, FastAPI, spaCy, ONNX

Web Development: Flask, React, Streamlit, RESTful APIs, PyQt5, Microservices

APIs & Integration: JWT, OAuth2, Microsoft Graph API, LangChain, HuggingFace, Google Maps/Places APIs, Stripe Connect, OpenAI API, Google Gemini API, Anthropic Claude API

Data Science & ML

ML Models: Ensemble Methods, CNNs, UNet, GANs, mBERT, RLHF, Kalman Filters, SVR, GRU, DeepDFA, LTL, RoBERTa, DeBERTa, ALBERT, ELECTRA, T5, GPT Models

Techniques: Time-series forecasting, Statistical Modeling, Curve fitting, Skew-normal modeling, JSD, KL Divergence, Supervised PCA, Pattern Recognition

Domains: Anomaly detection, NLP, Computer Vision, Predictive maintenance, Synthetic data generation, Hyperparameter Tuning

NLP Expertise: Text Preprocessing, Tokenization, Stemming/Lemmatization, NER, POS Tagging, Sentiment Analysis, Text Classification, Named Entity Recognition, Text Summarization, Intent Detection, Language Identification, Cross-lingual Transfer

RAG & LLM Integration: Retrieval-Augmented Generation (RAG), Vector Embeddings, Semantic Search, Document Chunking, Context Window Management, Prompt Engineering, Baseline Testing, Multi-Model Comparison, API Rate Limiting, Cost Optimization

Specialized: Statistical and Physics based Models,Control Barrier Functions (CBFs), Reinforcement Learning, ShieldNN-AM, CARLA Simulator, NN Pruning, A/B Testing

Emerging AI: Context Engineering, Agentic AI, Multimodal AI, LLM Agents, Fine-tuning (LoRA, QLoRA, PEFT), Few-Shot Learning, Zero-Shot Learning, Explainable AI (XAI)

Reinforcement Learning: PPO, SAC, TD3, IMPALA, Rainbow DQN, Multi-Agent RL, Hierarchical RL, Offline RL

Cloud & MLOps

Cloud Platforms: Microsoft Azure (VMs, Blob Storage, Synapse, Data Lake, ML, DevOps, Data-Studio, Timeseries), AWS (EC2, S3, Lambda, RDS), Google Cloud Platform (GCP), Heroku, DigitalOcean (Droplets, Spaces, VPC, VPS, Managed Databases)

MLOps & Pipelines: Docker, Kubernetes, Apache Airflow, Databricks, CI/CD, Parallel Pipelines, Job Scheduling, MLflow, Apache Spark, Azure ML, AWS SageMaker, Google Cloud AI Platform

Monitoring & Observability: Kafka, Promtail, rsyslog, Loki, Grafana, BM25S, Prometheus, OpenTelemetry

Deployment: Dockerized deployments, VM management, Azure VM instances, NGINX, systemd

Data Management

Databases: SQL, NoSQL, MongoDB, Chroma DB, FAISS, PostgreSQL, PostGIS, Vector Databases

Visualization & BI: Power BI, Tableau, Excel, Interactive heatmaps, Matplotlib, Seaborn, Plotly, Dash, Streamlit, Gradio

ETL & Processing: PySpark, Dask, ETL pipelines, SQL-driven data pipelines, Spark

Search & Retrieval: Marqo AI, Solr, BM25 variants, Sparse retrieval algorithms, RAG tools, Embedding Models, Semantic Search, Document Retrieval, Knowledge Graphs

Additional Tools & Platforms

Version Control: Git, GitHub, Azure DevOps, GitHub Actions

Development: Jupyter, VS Code, PyCharm, MATLAB, Simulink, MetaMask/Truffle (Ethereum), NLTK, Pandas, NumPy, spaCy, Transformers, Haystack

Infrastructure: NVIDIA GPU clusters, Apache XAMPP, TorchVision, CUDA, Edge Computing (ONNX Runtime, TensorRT, Mobile AI, IoT deployments)

Enterprise Systems: PLM Teamcenter, SharePoint, Azure OpenAI GPT models

Manufacturing: End-of-Line (EOL) testing, SNADE AI, Tesla AGP Water Pumps, Ford Motors

Research: CARLA Simulator, ShieldNN-AM, Adaptive Control Technologies, OpenAI Gym

RL Frameworks: TF-Agents, PyTorch-RL, Stable Baselines3, RLlib, Dopamine, Ray RLlib

Blockchain: Ethereum, Solidity, Truffle, MetaMask, Smart Contracts

AI Engineering: Knowledge Architecture, Domain Adaptation, Persona Context, Temporal Awareness, RAG Pipeline Development, LLM Evaluation, Model Benchmarking

Publications: NCRTCI'23 Conference Paper, ICLR 2025 Conference Paper (under review)

Projects

Beyond the Code

My Extracurriculars

When I’m not making machines learn stuff, I’m probably working out, gaming, watching football or getting carried away with new AI accomplishments.

Recommendation Letters

Letters of recommendation from esteemed professors and academic mentors

Dr. ETH Roman Pawel Klis

Professor & Sr.Engineering Data Scientist, ETH Zurich

ETH Zurich, Switzerland

Recommendation letter from Professor Dr. ETH Roman Pawel Klis at ETH Zurich, one of the world's leading technical universities.

Dr. K. Kalaivani

Head of Computer Science and Engineering

Academic Institution, Vels University

Academic recommendation from Dr. K. Kalaivani highlighting research capabilities and academic excellence.

Dr. A. Rajesh

Professor & Research Supervisor

Research Lab, Vels University

Research-focused recommendation from Dr. A. Rajesh emphasizing technical skills and research contributions.

Dr. A. Packialatha

Professor & Academic Advisor

Academic Institution, Vels University

Comprehensive recommendation from Dr. A. Packialath covering academic performance and professional potential.

contact me

Kaushik Kumar

email

kaushikkumar.208@gmail.com