DevOps Engineer

Jan 2023 — Jan 2026 | Bengaluru, India


About

FinTech platform enabling digital EMI and credit-based payments. Customers scan a QR code, get approved for EMI from banks, and complete purchases instantly.

Responsibilities

Cloud and DevOps Engineer. I took ownership of the platform processing 15,000+ daily API requests across 10+ microservices for bank payment integrations. Over 3 years, I built the CI/CD pipelines, migrated the monolith to microservices, set up centralised logging and monitoring from scratch, and ensured PCI-DSS compliance across every deployment. When something broke in production, I was the one who fixed it.

Highlights

  • Led the migration from a single monolithic application (monorepo) with no team ownership to 15+ independently deployable microservices (Java Spring Boot, Node.js, Python, Go), each with its own repo, data, pipeline, and scaling profile. Worked with the business team to identify low-impact APIs first, extracted services one by one using the Strangler Fig Pattern, each with its own MongoDB collection instead of sharing one giant database. Coordinated every cutover with stakeholders to make sure nothing broke mid-transaction.

  • Took ownership of the Kubernetes platform (AWS EKS) 15,000+ daily API requests (5M+ annually) running the entire payment backend for a FinTech processing EMI transactions. Handled cluster setup, version upgrades, and rollouts. Shifted workloads to right-sized node groups, cutting compute costs without touching the payment SLA.

  • Removed all manual deployments from production. Every Kubernetes deployment used to be a manual kubectl command in production — no review, no audit trail, no rollback path. Introduced ArgoCD and moved everything through Git. Deployments became reviewable, approvable, and automatic.

  • Implemented zone-based deployment strategy (blue/green approach). Deployed new version to Zone A, pointed it to an internal URL for the QA team to run full test cases and manual production sanity checks. Only after QA sign-off, routed live traffic to the updated zone. Zone B stayed on the previous version as instant rollback. Three years of production releases, zero downtime.

  • Eliminated production outages caused by merchant traffic spikes. External merchants were hitting our payment servers directly with no rate limiting or throttling during peak sales events, the traffic spikes brought production down. Placed AWS API Gateway in front of all merchant-facing APIs with rate limiting, throttling, and request validation. Introduced versioned API surfaces so partners could migrate without backend modifications. Zero production downtime from traffic spikes after that, including a 3x peak load during the next sales event.

  • Cut incident response from hours to under 20 minutes. The support team had no visibility across 15 production services — engineers were SSH-ing into individual servers to read log files during incidents. Led the build of an ELK stack pulling logs from every layer: frontend and backend application services, MongoDB, EKS containers, and payment gateway responses from HDFC, ICICI, and Axis Bank. For the first time, a single transaction could be traced end to end in one place.

  • Owned the Business Continuity Plan and designed active-active DR (Disaster Recovery) strategy across multi regions, achieving real-time RPO and zero-downtime RTO. Both regions served live traffic simultaneously, so there was no failover delay — if one region dropped, the other was already running. Documented recovery procedures, backup strategies and ran failover tests to verify the setup held under failure conditions. Validated through quarterly BCP/DR drills with zero PCI DSS audit findings.

  • No security scanning existed in the pipeline. Code went straight from Git to production with no checks. Integrated SonarQube quality gates into Jenkins for SAST at every commit, Trivy for container image scanning, and OWASP ZAP for API testing. Nothing ships without passing all three. Vulnerabilities dropped 55% in the first quarter. Passed every PCI-DSS audit for 3 years.

  • Mentored 3 junior engineers into independent production operators through weekly knowledge-sharing sessions covering DevOps, Docker, CI/CD, and Kubernetes. Two of them were handling production deployments independently within 3 months.