Senior Data Engineer at IBM · Bengaluru, India · Open to remote-first roles
4+ years designing and operating petabyte-scale Azure Databricks Lakehouse platforms at IBM for global enterprise clients. Fast-tracked from fresher to Senior Data Engineer in under 3 years by building infrastructure that creates lasting, measurable value.
I also build production-grade AI systems from scratch — not API wrappers. My self-engineered multi-agent platform (LangGraph, RAG, vector memory) runs 5 specialized autonomous agents with config-driven prompt architecture that eliminates hallucinations and cuts token cost dramatically.
DataOps Observability Platform — Designed and built a unified monitoring system from scratch for 200+ pipelines across SAP Data Intelligence, Azure Data Factory, and Databricks. Centralized into a Power BI dashboard with automated Logic App alerting. Cut daily manual monitoring by 80% (1.0 FTE to 0.2 FTE). Adopted as the account-wide standard.
FinOps Optimization — Migrated short-duration pipelines from always-on interactive compute to ephemeral Job Compute nodes, and applied strategic Z-Ordering on SHA256 key columns across Delta tables. Delivered ~50% reduction in Databricks compute cost and ~30% reduction in storage spend — yielding €1,000+/month in recurring savings.
Enterprise AI Assistants — Independently built three domain-specialized AI assistants on the IBM Consulting Advantage platform (Databricks Transform Expert, User Story Generator, RAG-powered Operations Helper). Secured IBM Data Service Line-wide adoption. Recovered 50-60 engineering hours per month across the practice.
Aria — Multi-Agent AI Platform — Self-engineered production AI platform using LangGraph. YAML config-driven engine injects tailored prompts and tool sets per action at runtime. Async Plan-Execute-Evaluate-Respond ReAct graph with a self-evaluation node scoring output quality across 5 vectors. Hybrid sqlite-vec relational and semantic memory for long-horizon retrieval. Runs 5 specialized agents on cost-effective models via OpenRouter.
Data Corruption Recovery — Recovered 200M to 2B row production tables using Delta Lake time-travel and targeted partition-level reprocessing. Saved 70% compute cost and 30% engineering effort vs. a full-table rerun while keeping downstream SLA timelines intact.
Data platform
Azure Databricks PySpark Delta Lake Unity Catalog Azure Data Factory SAP Data Intelligence Medallion Architecture Azure DevOps
Languages
Python SQL PySpark
AI / GenAI
LangGraph RAG Multi-Agent Systems Prompt Engineering sqlite-vec OpenRouter Docker
Analytics
Microsoft Fabric Power BI Azure Monitor Azure Logic Apps
| Credential | Issuer | Valid |
|---|---|---|
| Databricks Certified Data Engineer Professional | Databricks | Jan 2027 |
| Microsoft Certified: Azure AI Engineer Associate (AI-102) | Microsoft | Jun 2027 |
| Microsoft Certified: Fabric Analytics Engineer Associate (DP-600) | Microsoft | Apr 2027 |
| Microsoft Certified: Azure Data Engineer Associate (DP-203) | Microsoft | Retired |
- Kafka + Spark Structured Streaming end-to-end pipeline with exactly-once semantics and schema evolution
- dbt transformation layer over Lakehouse Gold
- Terraform provisioning for full-stack data infrastructure
Senior Data Engineer roles · Data Platform Engineer · AI Infrastructure Engineer · Bengaluru hybrid / Kochi hybrid / Remote