SOFTWARE ENGINEER · BACKEND & DEVELOPER TOOLS

Yash Bhardwaj · Software Engineer

hi, y4sh here.

I build backend systems and developer tools, and make hard debugging feel less lonely.

/ about

I'm a software engineer who builds backend systems and developer tools. Right now I'm a Senior SDE at AMD on DCAuto, a firmware CI platform, where I also maintain Sherlog Holmes, an LLM log-triage assistant my team uses daily.

I came up through Qualcomm and Licious before this, mostly writing services in Python and Java backed by Postgres, Mongo, and a lot of Redis. BITS Pilani '21. Outside work, I'm at the gym, on a run, or walking by the lake.

Tell me what you're building.

Currently working with

  • python
  • flask
  • celery
  • faiss
  • azure
  • mongodb
  • docker

/ experience

Senior Software Development Engineer · AMD

SEP 2025 - PRESENT · DCAUTO PLATFORM

DCAuto is AMD's internal platform for firmware CI and regression testing on MI450 data-center GPUs. It orchestrates deployments and validation workloads across racks of physical test systems, so a single flaky node can stall a whole regression run.

I built the self-check-in and fallback-recovery flows that keep those runs moving. The work is idempotent by design: a system can drop out mid-run, recover, and rejoin without double-counting or corrupting state, and the orchestrator retries instead of failing the batch.

I also built Sherlog Holmes (internally DCAutoAI), a five-stage triage system for the 50,000-line logs that GPU firmware validation produces. It uses embeddings and FAISS vector search to find the failure signal, then an LLM to propose a root cause. It ships as Flask APIs plus async services on Celery, Azure Service Bus, and MongoDB, and posts results to the firmware team in Teams. They use it daily.

  • Python
  • Flask
  • Celery
  • Azure
  • MongoDB
  • FAISS
  • LLMs

Full résumé (PDF) ↗

/ projects

SENIOR SDE @ AMD · 2025–

DCAuto

Firmware CI fabric for MI450 data-center GPUs. Orchestrates deployments and validation across physical test systems, with idempotent self-check-in and fallback-recovery flows.

  • Python
  • Flask
  • Celery
  • Azure

AMD · LLM SYSTEMS

Sherlog Holmes

Five-stage failure triage for 50,000-line GPU firmware logs. Embeddings plus FAISS search plus LLM root-cause analysis, shipped as Flask APIs and Celery workers over Azure Service Bus and MongoDB, with Teams alerts.

  • Embeddings
  • FAISS
  • LLMs
  • MongoDB
Qualcomm

SOFTWARE ENGINEER @ QUALCOMM · 2022–25

Qualcomm Software Center

Electron app delivering drivers to global OEMs in Node, Angular, and TypeScript. Shipped the first macOS universal build (Intel and Apple Silicon); automated Jenkins and AWS CI/CD to cut release times 50%.

  • TypeScript
  • Electron
  • Angular
  • AWS

Tinkering rebuilding these

Playgroundsoon
QSC Demosoon
Turbofansoon
imgansoon