Available for Consulting · Q3 2026

I Build the Infrastructure That Powers AI.

Sachin Diwakar — Software Engineer @ Microsoft. I design and operate the distributed messaging, cloud and AI infrastructure that quietly moves billions of events behind the products people use every day.

Book Consultation Hire Me View Projects

0 Years Engineering

0 Messages Routed Daily

0 Microservices Shipped

0 Cloud Scale Projects

0 Platform Availability

Kafka· RabbitMQ· IBM MQ· Event Streams· NATS· Pulsar· AI Inference· LLM Serving· Kubernetes· OpenShift· gRPC· Vector DB· RAG· Kafka· RabbitMQ· IBM MQ· Event Streams· NATS· Pulsar· AI Inference· LLM Serving· Kubernetes· OpenShift· gRPC· Vector DB· RAG·

scroll

About · Why Me

I help teams move from "it works" to "it scales".

For half a decade I've lived inside the hardest parts of distributed systems — message brokers under brutal load, cloud platforms with zero tolerance for downtime, and the AI infrastructure quietly powering the next wave of products.

The story

I started where every great engineer starts — obsessed with fundamentals. Operating systems, networks, data structures. Solved over 1,000 algorithm problems before I ever shipped a feature.

At IBM India Software Labs I spent years on security and mobility platforms — designing message-driven microservice fabrics, identity integrations and secure tunnels for some of the world's largest enterprises.

Today, at Microsoft, I work on the cloud and AI infrastructure that frontier workloads depend on. Same instinct, bigger blast radius.

Microsoft · Software Engineer II

Building cloud and AI infrastructure that runs at planet scale. The substrate beneath modern AI workloads.
- AI Infra
- Distributed Systems
- Cloud Native
- Kubernetes
- Azure
IBM India Software Labs · Software Engineer

Security & Mobility platforms. Distributed messaging, identity, microservice architecture and platform engineering across products serving Fortune-500 customers.
- Java
- Spring Boot
- Kafka
- RabbitMQ
- IBM MQ
- OpenShift
- Go
IIIT Jabalpur

Bachelor of Technology, CSE. 1,000+ algorithm problems. Deep work on operating systems, networking, databases and machine learning.
- DSA
- OS
- Networks
- AI/ML
- Databases

Core Expertise

Where I'm dangerously good.

Deep, hard-won expertise in the systems modern AI products quietly depend on — message buses, distributed platforms and the cloud-native plumbing that keeps everything moving.

AI Infrastructure

Frontier

Inference fleets, model serving, vector stores and the event pipelines that feed them. Building the substrate behind production-grade AI.

Distributed Messaging

Specialty

Kafka, RabbitMQ, IBM MQ, Event Streams, NATS. Designing reliable, ordered, exactly-once pipelines that survive partitions, replays and very bad days.

Distributed Systems

Core

Consensus, replication, partitioning, backpressure — the physics of building correct systems on unreliable networks.

Cloud Platforms

Hyperscale

Azure, AWS, GCP. Cloud-native primitives, hybrid topologies and platforms that hum quietly while serving millions.

Kubernetes & OpenShift

Production

Operators (Go SDK), Helm, multi-tenant clusters, GitOps — shipped and operated for years, not weeks.

Microservice Architecture

300+ services

Java / Spring Boot, Node.js, Go. Event-driven boundaries, clean contracts, REST + gRPC, identity-aware routing.

AI Engineering

Applied

RAG pipelines, embeddings, evaluation harnesses, agent orchestration. Practical AI shipped to real users — not just demos.

Observability

Signal

Prometheus, Grafana, ELK, distributed tracing — the difference between "it works" and "we know exactly why."

Reliability Engineering

SRE

99.99%

Error budgets, SLOs, fault injection and incident response that keeps mission-critical platforms standing.

Selected Work

Systems built to scale, survive and stay quiet.

A snapshot of the kind of problems I love — distributed messaging at hyperscale, AI-native platforms, and the cloud-native plumbing in between.

Distributed Messaging

Event Backbone for Enterprise Platforms

Problem

Mission-critical workflows were tightly coupled, hard to evolve and impossible to scale independently.

Architecture

Designed an event-driven backbone on Kafka, RabbitMQ and IBM MQ with idempotent consumers, dead-letter queues, schema contracts and exactly-once semantics.

Impact

Decoupled 100+ services, unlocked horizontal scale, and turned a brittle integration layer into a calm, replay-safe nervous system.

Kafka
RabbitMQ
IBM MQ
Spring Boot
Avro

AI Infrastructure

AI Inference & Serving Platform

Problem

Teams shipping AI features needed low-latency, multi-tenant inference with predictable cost.

Architecture

Autoscaling inference fleet on Kubernetes, gRPC + streaming APIs, model versioning, batching and observability baked into every hop.

Impact

p99 latency cut significantly while doubling throughput per GPU — and engineers stopped paging at 3 AM.

Kubernetes
gRPC
Python
Go
Prometheus

Cloud Native

Hybrid Cloud Connectivity Platform

Problem

Large enterprises needed to bridge sensitive on-prem assets (AD, LDAP, PKI) to a multi-tenant cloud — without compromising security.

Architecture

Modular microservice fabric — 300+ services on a hybrid topology with encrypted tunnels, identity-aware routing (SAML, OAuth2, Kerberos) and zero-trust controls.

Impact

Secure, audited connectivity for thousands of enterprise customers under FedRAMP, SOC2 and GDPR.

Java
Spring Boot
SAML
OAuth2
LDAP

Platform Engineering

Kubernetes Operators & Self-Healing Platform

Problem

Manual lifecycle for critical services meant slow rollouts, drift and weekend incidents.

Architecture

Custom Operators using the Go Operator SDK, declarative Helm charts, GitOps pipelines (Jenkins + GitHub Actions) and policy-driven rollbacks.

Impact

Hands-off lifecycle and self-healing for the most critical services — fewer pages, safer releases.

Go
Kubernetes
OpenShift
Operator SDK
Helm

Messaging · K8s

Highly-Available Messaging on Kubernetes

Problem

Enterprise messaging needed cloud-native HA, external reachability and strict transport security.

Architecture

Multi-instance queue managers with persistent storage on OpenShift, external routes, mutual TLS and proactive diagnostics.

Impact

Resilient messaging fabric that survives node loss, AZ failures and noisy neighbours — without operator panic.

IBM MQ
Kafka
OpenShift
mTLS
Event Streams

AI · RAG

RAG & Agentic AI Pipelines

Problem

Turning raw enterprise data into trustworthy AI answers — with evaluation, not vibes.

Architecture

Event-driven ingestion, embeddings, vector store, retrieval + re-ranking and agent orchestration — every hop instrumented and evaluated.

Impact

Production-ready AI workflows with measurable answer quality, cost controls and full traceability.

Python
LLMs
Vector DB
Kafka
Eval Harnesses

~/sachin/event-platform — zsh

Work With Me

Let’s build something that scales.

Whether you’re shipping your first event‑driven platform or scaling AI inference to millions of users — here’s how I can help. Reach out and I’ll get back within 24 hours.

Architecture & System Design

For teams designing the next big thing.

Event‑driven & messaging architectures
Microservice decomposition & API design
Scalability, HA & resiliency reviews
Cloud & Kubernetes strategy

Start a Conversation

02 Most Requested

AI Infrastructure & Inference

From prototype to production‑grade AI.

LLM serving, autoscaling & cost optimization
RAG, embeddings & vector store design
Agentic AI pipelines & evaluation harnesses
GPU infra, observability & SLOs

Book a Discovery Call

Distributed Messaging Deep Dive

When your event backbone needs to survive very bad days.

Kafka / RabbitMQ / IBM MQ / NATS reviews
Exactly‑once, ordering & replay strategies
DLQs, back‑pressure & consumer health
Migration & modernization paths

Request a Review

Production Debugging & SOS

Pager screaming? Latency spiking? Let’s fix it.

Distributed systems root‑cause analysis
Kubernetes & cloud incident triage
Performance & latency investigations
Post‑incident hardening

Get Urgent Help

Mentorship & Interview Prep

For engineers aiming for top product companies.

Microsoft / FAANG interview coaching
System design & LLD walkthroughs
Career roadmap & resume reviews
1:1 long‑term mentorship

Reach Out

Fractional / Advisory Engagements

For founders & CTOs building serious infrastructure.

Technical strategy & architecture leadership
Team mentoring & engineering processes
Due diligence & tech audits
Hands‑on guidance through critical milestones

Let’s Talk

Not sure where to start?

Send a short note about your problem — I’ll reply within 24 hours with how I can help (or who else can).

Email Me Directly Connect on LinkedIn

Achievements

Numbers that tell a story.

0 Messages Routed Daily

0 Microservices Architected

0 Algorithm Problems Solved

0 Production Availability

Microsoft Software Engineer · AI & Cloud Infrastructure

Ex IBM Security & Mobility · Distributed Platforms

Distributed Messaging Specialist · Kafka · RabbitMQ · MQ

AI Infrastructure Inference · RAG · Agentic Systems

Thought Leadership

Writing on infra, AI and scale.

All AI Infrastructure Distributed Messaging Distributed Systems Cloud Engineering Microservices System Design Career

AI Infrastructure

What "Production AI" Actually Means

Inference, evaluation, cost — the things demos never show.

10 min read Distributed Messaging

Kafka vs RabbitMQ vs MQ — A Practitioner's Map

Pick the right broker for the right shape of problem.

9 min read Distributed Systems

Backpressure Is a Product Decision

Treat queues like contracts, not buffers.

7 min read Microservices

300 Microservices Later — What I'd Do Differently

Hard-won lessons from a real enterprise fabric.

8 min read Cloud Engineering

The Three Latencies You Must Measure

And the one that quietly destroys cloud bills.

6 min read System Design

Designing for the 99.99th Percentile

Tail latency is the customer experience.

11 min read

Recommendations

Trusted by engineers and leaders.

"Sachin operates at the rare intersection of deep systems expertise and product-grade pragmatism. He fixes things most engineers don't even see."

"If your AI infra is on fire, you want him on the call. Calm, surgical and absurdly fast at root-cause analysis."

"He architected our platform's hardest piece — the kind of system most people are scared to touch — and made it boring to operate."

"Sharp, generous with knowledge, and the only mentor who made distributed systems click for me."

Let's Build

Ready to accelerate your systems?

Whether you need a single architecture review or a long-term advisory engagement — let's talk.

Book Career Consultation Schedule Architecture Review Hire for Consulting Collaborate ↗

I Build the Infrastructure That Powers AI.

The story

Microsoft · Software Engineer II

IBM India Software Labs · Software Engineer

IIIT Jabalpur

Event Backbone for Enterprise Platforms

AI Inference & Serving Platform

Hybrid Cloud Connectivity Platform

Kubernetes Operators & Self-Healing Platform

Highly-Available Messaging on Kubernetes

RAG & Agentic AI Pipelines

Architecture & System Design

AI Infrastructure & Inference

Distributed Messaging Deep Dive

Production Debugging & SOS

Mentorship & Interview Prep

Fractional / Advisory Engagements

Not sure where to start?

What "Production AI" Actually Means

Kafka vs RabbitMQ vs MQ — A Practitioner's Map

Backpressure Is a Product Decision

300 Microservices Later — What I'd Do Differently

The Three Latencies You Must Measure

Designing for the 99.99th Percentile

Ready to accelerate your systems?