Software Engineer · Siemens Digital Industries · NVIDIA · VIT '22

Aryan Gupta

Backend Engineer. Distributed Systems. Problem Solver.

See my work Let's connect ⬇ Resume LinkedIn ↗
60%
Latency reduction
1M+
Req/day system
30M+
Records migrated live
3
AWS certifications
Latency reduction 60% · denormalized policy store Redisson RLock 5–6% over-provisioning eliminated Percona pt-osc 30M+ rows · zero downtime SNS/SQS sync DLQs + configurable retries POST API latency -20% via observer pattern Bulk insertion framework -30% latency · adopted org-wide NVIDIA DLRM 0.639 accuracy · Criteo 1TB AWS Certified Solutions Architect · Developer · Practitioner LangChain / LangGraph agentic systems · in progress GitHub 240+ combined stars & forks Latency reduction 60% · denormalized policy store Redisson RLock 5–6% over-provisioning eliminated Percona pt-osc 30M+ rows · zero downtime SNS/SQS sync DLQs + configurable retries POST API latency -20% via observer pattern Bulk insertion framework -30% latency · adopted org-wide NVIDIA DLRM 0.639 accuracy · Criteo 1TB AWS Certified Solutions Architect · Developer · Practitioner LangChain / LangGraph agentic systems · in progress GitHub 240+ combined stars & forks
What the system logs say

Metrics that
lived in production.

Measured in production, every metric below came from a live system at Siemens, processing 1M+ requests a day.

Live · In Production
60%
Latency Slashed
SNS/SQS Event-Driven Policy Sync
Redesigned policy data access with a denormalized store, eliminating a 10K+ ID lookup bottleneck. Result: 60% off the critical path — in a live, 1M+ req/day system.
Live · In Production
5–6%
Over-Provisioning Killed
Distributed Locking — Redisson RLock
Implemented per-asset-ID distributed locking via Redisson RLock, eliminating concurrent over-provisioning races under burst traffic on a 1M+ req/day system.
Executed · Internal
30M+
Zero-Downtime Migration
Live Schema alteration — Percona pt-osc
Led a zero-downtime schema alteration on a 30M+ record MySQL table using Percona pt-osc. Avoided 3+ hours of production downtime with zero data loss.
Cross-Team Adoption
20–30%
POST Latency Crushed
Bulk Insertion + Observer Pipeline
Cut POST API latency 20% with a multi-threaded observer validation pipeline, then another 30% with a reusable bulk insertion framework — adopted across 3+ teams.
Backend Systems & Agentic AI Tracks

Backend brain.
Agentic AI Upskilling.

Four years in production backend systems, and a research lineage that started before that — now pointed at AI agents. Same instinct: understand the system, then build it.

⚙️
Backend Engineer
Siemens · Java · Spring Boot · AWS
  • Distributed Systems
    Redisson RLock, SNS/SQS event-driven sync, DLQs & retries — at 1M+ req/day.
  • Database at Scale
    MySQL, PostgreSQL, Redis, Percona pt-osc for zero-downtime migrations on 30M+ rows.
  • Spring Boot & APIs
    Greenfield services, reusable frameworks (bulk insertion, query filtering), native queries.
  • AWS Cloud
    EC2, S3, RDS, SQS/SNS, ElastiCache, CloudWatch — 3x AWS certified.
  • Concurrency & Performance
    Multi-threaded validation pipelines, observer pattern, JMeter load testing.
🤖
AI / Agentic Systems
NVIDIA · Research · LangChain & LangGraph
  • LangChain / LangGraph
    Currently building agentic workflows — chaining tools, memory, and reasoning loops.
  • Model Context Protocol
    Currently learning Model Context Protocols for agents.
  • Recommendation Systems
    DLRM on Criteo 1TB with Merlin & NVTabular at NVIDIA — 0.639 accuracy, GPU benchmarking.
  • Natural Language Processing
    TF-IDF, word embeddings, attention-based hybrid architectures (F1: 0.88) at IIT Kharagpur.
  • Deep Learning & CNNs
    96–98% accuracy CNN models for sign language and object detection, published research.
Tools & Platforms
Languages
JavaC++Python
Cloud & Infrastructure
AWS EC2S3RDSSQS / SNSElastiCacheCloudWatchPercona
Databases
MySQLPostgreSQLRedis
AI / ML & Agentic
LangChainLangGraphModel Context Protocol (MCP)TensorFlowNVTabularMerlinCNN / NLP
Web & Tooling
AngularHTMLCSSJMeterJUnitSonarQubeGit
AWS Certifications
Amazon Web Services
Solutions Architect Associate
Certified
Amazon Web Services
Developer Associate
Certified
Amazon Web Services
Cloud Practitioner
Certified
How I got here

From research labs
to production systems.

From building recommendation systems at NVIDIA to optimizing latency on backend services handling 1M+ requests a day — every stop added a layer.

👆 Tap each role to explore
July 2022 – Present
Software Engineer
Siemens Digital Industries
Feb – June 2022
Software Intern
Siemens Digital Industries
May 2021 – Jan 2022
Deep Learning Intern
NVIDIA · Bangalore
Aug 2020 – Oct 2020
Natural Language Processing Intern
IIT Kharagpur
Sept 2020 – Mar 2021
Deep Learning Intern
Tata Communications
2018 – 2022
B.Tech, Information Technology
Vishwakarma Institute of Technology

Software Engineer — Distributed Systems & Backend

Siemens Digital Industries Software, Pune
July 2022 – Present
Current role
JavaSpring BootRedisson RLockAWS SNS/SQSMySQLPercona pt-osc
60% latency cut · 30M+ row migration · Redisson distributed locking · 1M+ req/day

Own backend infrastructure for distributed services on Java & Spring Boot — distributed locking via Redisson RLock, denormalized policy data access, SNS/SQS event-driven sync with DLQs, and zero-downtime schema migrations on MySQL using Percona pt-osc.

- Implemented distributed locks via Redisson RLock, 1M+ req/day, eliminating 5–6% concurrent overprovisioning under burst traffic.

- Redesigned policy data access using a denormalized store, removing a 10K+ ID lookup bottleneck and cutting latency by 60%.

- Architected SNS/SQS-based event-driven sync for policy store consistency, with DLQs and configurable retries for fault tolerance.

- Led zero-downtime schema alteration on a 30M+ record table using Percona pt-osc, avoiding 3 hours of production downtime

- Reduced POST API latency by 20% via multi-threaded validation pipeline using the observer pattern for large payloads.

Software Intern

Siemens Digital Industries Software, Pune
Feb 2022 – June 2022
AngularJavaJUnitSonarQube

Developed interactive data grid components with filtering and pagination in Angular. Implemented chart rendering and notification popup features in dialog boxes, resolving 6 UI bugs across the application.

- Developed an interactive data grid components, with filtering and pagination in Angular Framework.

- Implemented chart rendering and notification popup features in Angular dialog boxes; resolved 6 UI bugs across the application.

- Added JUnit tests and resolved SonarQube flagged bugs, code smells, improving overall code quality by 15%.

Deep Learning Intern

NVIDIA · Bangalore
May 2021 – Jan 2022
DLRMMerlinNVTabularCriteo 1TBGPU

Trained DLRM architecture on the Criteo 1TB dataset, achieving 0.639 accuracy over 10K iterations using Merlin and NVTabular. Implemented feature engineering and embedding layers on raw clickstream data.

- Trained DLRM architecture on Criteo 1TB dataset achieving 0.639 accuracy over 10K iterations

- Implemented feature engineering techniques, embedding layers, on raw clickstream data

- Benchmarked 3 popular recommendation system architectures against DLRM on GPU, published comparative training statistics as blogs.

NLP Research Intern

Centre of Excellence in Safety Engineering & Analytics, IIT Kharagpur
Aug 2020 – Oct 2020
NLPTF-IDFWord EmbeddingsAttention LayersF1: 0.88

Worked in the field of Natural Language Processing at Centre of Excellence in Safety Engineering and Analytics, IIT-KGP.

- Worked in the field of Natural Language Processing at Centre of Excellence in Safety Engineering and Analytics, IIT-KGP.

- Analyzed Construction Site Catastrophe reports by applying chunking, TF-IDF vectorization, word embeddings to classify causes of accidents and chunk out fatal objects.

- Proposed a Hybrid Neural Network Architecture with Attention Layers with an F1 score of 0.88.

Computer Vision Intern

Corporate Venturing & Innovation Group, Tata Communications
Sept 2020 – Mar 2021
Computer VisionCNNsFace DetectionVGG-16MobileNetV2

Worked across computer vision, RNNs, and data science — predicting cloud failure on Google Cluster Traces data, and building a compact face-mask detection model optimised for low latency.

- Working in the field of Computer Vision, CNNs, Data Science at Corporate Venturing and Innovation Group, Tata Communications.

- Worked on building a compact Face-Mask Detection model with low latency.

- The proposed model outperforms VGG-16, MobilenetV2 architectures.

B.Tech in Information Technology

Vishwakarma Institute of Technology, Pune
2018 – 2022 · CGPA: 8.62
PythonC++Data StructuresDeep Learning

Foundation in algorithms, data structures, and systems thinking — the launchpad for everything that followed. Spent the latter half of college alternating between ML research internships and building open-source projects.

- Cracked multiple off-campus internships at Big Techs that shaped how I think about systems at scale from scratch from the very begining.

- Built two open-source projects - Sign Language to Speech, and Blind Assistance System.

- Published research papers along coursework.

Built & shipped

Built in college.
Still running.

Two open-source projects with real community traction, three published research papers — built before I had a job, still cited and starred today.

Open Source · Live on GitHub

Blind Assistance System

Real-time object detection for visually impaired users using TensorFlow Object Detection API and SSD architecture on a live webcam feed, achieving 98% accuracy. Estimated object proximity using bounding-box area ratio and delivered real-time audio alerts — published in Springer Journal.

TensorFlowSSD ArchitectureObject DetectionPythonAudio OutputSpringer Journal
60K
YouTube Views
124
GitHub Forks
95
GitHub Stars
98%
Accuracy
🤟
Open Source

Sign Language to Speech

CNN-based sign language classifier using TensorFlow translating static ASL gestures to speech output. Published in IJNGC.

100
Forks
140
Stars
96%
Accuracy
🏗️
Elsevier Journal

CCNet — Accident Classification

Proposed the CCNet model for construction site accident classification — a deep learning architecture that classifies accident types and root causes from visual site data.

CNN
Architecture
28
Citations
🟢
Intern · NVIDIA

Recommendation Systems — NVIDIA Merlin

Trained DLRM architecture on the Criteo 1TB dataset, achieving 0.639 accuracy over 10K iterations using Merlin and NVTabular. Implemented feature engineering and embedding layers on raw clickstream data.

+30%
Accuracy Improved
3
Blogs
Open to new opportunities

If your systems
need to hold up —

I'm a backend engineer who builds distributed systems himself — and is now extending that into AI agents. If you're working on something that needs to handle real load, or a team exploring agentic AI, I'd love to hear about it.

Open to conversations with engineers and teams doing interesting work.

Send me a message

Fill this out and hit Send Message to open it in your mail app.

0/2000