Sr. SRE
Systems Plus Solutions
Full time- 8+ years
- Not Disclosed
- Pune - Maharashtra, India, India
- Post Date: May 12, 2026
- End Date: Aug 12, 2026
- 8+ years
- Not Disclosed
- Pune - Maharashtra, India, India
- Post Date:May 12, 2026
- End Date: Aug 12, 2026
Skills:
- Jenkins
- DevOps
- Azure
- Java
- Python
- Kubernetes
- ITSM
- Ansible
- Terraform
- Splunk
- CI/CD
- GitHub
Job Description:
Responsibilities
- Collaborate with U.S.-based counterparts to define and monitor service SLOs, SLAs, and key performance indicators.
- Lead root cause analysis, blameless postmortems, and reliability improvements across environments.
- Review application code (primarily Java/Spring) to assist in identifying defects and systemic performance issues.
- Automate deployment pipelines, recovery workflows, and runbook processes to minimize manual intervention.
- Build and manage dashboards, alerts, and health checks using tools like Dynatrace, Azure Monitor, Prometheus, and Grafana.
- Contribute to architectural decisions with a lens on performance and operability.
- Guide and mentor offshore team members in incident response and production readiness.
- Participate in 24x7 support rotations aligned with EST coverage expectations.
Qualification & Experience
- 8-10 years in SRE, DevOps, or platform engineering experience, ideally supporting U.S. enterprise systems.
- Strong hands-on experience with Java/Spring Boot applications, with the ability to assist in code-level troubleshooting.
- Must Have Skills:
- Cloud & Infrastructure
- Kubernetes (AKS ) — container orchestration and management
- Docker — containerization
- Terraform — Infrastructure as Code
- Ansible — configuration management and provisioning
- CI/CD & SCM
- Jenkins / ArgoCD — pipeline design and maintenance
- GitHub / BitBucket / Azure Repos — source code management
- Observability & Monitoring
- Dynatrace— APM and infrastructure monitoring
- Prometheus & Grafana — metrics and dashboards
- Splunk / Elasticsearch — log aggregation and analysis
- Reliability & Operations
- Incident management and on-call support
- Root cause analysis (RCA) and postmortem practices
- SLI / SLO / SLA definition and tracking
- Performance tuning and capacity planning
- Scripting
- Shell, Python, or PowerShell — automation and tooling
- Good to Have Skills:
- Service Mesh — Istio / Linkerd for traffic management and observability
- GitOps — ArgoCD
- Chaos Engineering — tools like Chaos Monkey, LitmusChaos
- DevSecOps — security scanning in pipelines ( Snyk, SonarQube)
- Distributed Tracing — Jaeger / OpenTelemetry
- Cloud Certifications — Azure associate or professional level
- ITSM Tools — PagerDuty, OpsGenie, ServiceNow for alert routing
-
Salary
Not Disclosed
-
Role
Engineer
-
Area of Practice
- Development
- Cloud Computing
-
Experience
8+ years
Remove this line later

