Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Sr Staff Site Reliability Engineer (SRE) @ Einfochips

Home > Devops

 Sr Staff Site Reliability Engineer (SRE)

Job Description

Position: Sr Staff Site Reliability Engineer (SRE)
We are seeking a Sr Staff Site Reliability Engineer on a long-term basis during USA hours who brings deep software engineering roots alongside SRE expertise. This individual will help shape and scale the reliability of our global cloud platform, bringing the full-stack perspective of someone who has built and shipped software and now drives reliability from the inside out.
The Role
This is a Senior Staff-level technical leadership role with organization-wide influence. You will define and drive reliability strategy across our multi-cloud infrastructure (AWS and GCP), establish architectural standards, and ensure our backend systems operate with exceptional availability, scalability, and resilience.
You will also collaborate with strategic partners and engineering teams to enable our organization as a cloud-integrated service, leading technical discussions and ensuring secure and reliable integrations.
This is a long-term position for someone who thrives at the intersection of software development and reliability engineering. The ideal candidate has hands-on development experience, understands the complete software delivery lifecycle, and brings an end-to-end systems perspective from code commit to production operation.
What You ll Do
  • Define and drive Organization s SRE strategy across engineering teams.
  • Establish reliability standards, architectural guardrails, and production readiness frameworks.
  • Initiate, participate in, and review architectural changes leveraging development experience to ensure reliability and operability are built in, not bolted on.
  • Apply SDLC knowledge to reliability decisions engage early in design and architecture reviews to embed reliability, testability, and operability as first-class requirements.
  • Proactively identify system-wide gaps continuously assess the platform for reliability blind spots, missing observability, or architectural debt, and drive initiatives to close them without waiting to be asked.
  • Bridge development and SRE teams translate between engineering intent and operational reality, serving as a technical liaison who can read code, review PRs, and contribute to service-level design decisions.
  • Design and maintain highly available, multi-region, multi-cloud systems.
  • Ensure platform reliability supporting millions of IoT devices globally.
  • Guide engineering teams in building fault-tolerant, scalable microservices and monolithic systems.
  • Define and enforce SLIs, SLOs, and error budgets.
  • Lead architecture reviews and production readiness reviews.
  • Partner with strategic teams to deliver our organization as a cloud-integrated service and support partner integrations.
  • Improve and streamline production release processes.
  • Implement safe deployment strategies (canary, blue/green, progressive delivery).
  • Build CI/CD guardrails to reduce deployment risk and improve reliability.
  • Develop and mature observability strategies across infrastructure and services.
  • Lead high-severity incident response, facilitate blameless postmortems, and drive systemic improvements to prevent recurring issues.
What You Bring
  • 10+ years of combined software engineering and SRE/infrastructure experience, with a clear progression from development into reliability or platform engineering.
  • Deep understanding of the complete Software Development Lifecycle (SDLC) enabling well-informed reliability and design decisions across all phases of software delivery.
  • Strong software development background with hands-on experience building and shipping production software enabling effective design collaboration, code-level review, and reliability-driven architectural input.
  • End-to-end system comprehension ability to reason about the full stack from device/client behavior through API layer, backend services, data stores, and infrastructure, connecting the dots across teams and domains.
  • Self-directed gap identification demonstrated initiative in spotting reliability, scalability, or process gaps and driving improvements without needing explicit direction.
  • Collaborative cross-team communication proven ability to work across engineering, product, and operations teams; comfortable influencing without authority and presenting technical decisions to both technical and non-technical stakeholders.
  • Proven experience operating large-scale distributed systems in production.
  • Strong hands-on expertise with AWS and GCP cloud platforms.
  • Deep experience with Kubernetes in production environments.
  • Advanced knowledge of Terraform, including modular design and infrastructure governance.
  • Strong understanding of distributed systems, networking, and system reliability principles.
  • Experience supporting Java-based monolithic systems and microservices architectures.
  • Proficiency in Python for automation and tooling.
  • Experience with modern observability stacks (Prometheus, Grafana, Datadog, OpenTelemetry, etc.).
  • Strong debugging, incident response, and root cause analysis skills.
  • Security knowledge in transport and identity working knowledge of SSL/TLS certificate lifecycle management, mutual TLS (mTLS) for service-to-service authentication, cipher suite selection and hardening, and TLS version enforcement across microservices and infrastructure boundaries.
  • Excellent written and verbal communication skills, with experience coordinating across distributed engineering teams, facilitating technical discussions, and driving alignment on reliability decisions.

Qualification-
  • This Position is only for IST Evening (3pm to midnight) OR IST night (10pm to 7am) flexible rotation shift
  • Bachelor s degree in computer science or software engineering.
  • 10+ years of combined software engineering and SRE/infrastructure experience, with a clear progression from development into reliability or platform engineering.
Location: IN-GJ-Ahmedabad, India-Ognaj (eInfochips)
Time Type: Full time
Job Category: Engineering ServicesDisclaimer: This job posting has been aggregated from external source. Role details, content, and availability are subject to change. Applicants are advised to confirm the latest information directly on the company website before applying.

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Einfochips
Location(s): Indore

+ View Contactajax loader


Keyskills:   Computer science Engineering services Automation Backend Networking Debugging SSL Distribution system SDLC Python

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Sr Staff Site Reliability Engineer (SRE)

  • Einfochips
  • 10 - 15 years
  • 12 days ago
₹ Not Disclosed

Sr Staff Site Reliability Engineer (SRE)

  • Einfochips
  • 10 - 15 years
  • Ahmedabad
  • 12 days ago
₹ Not Disclosed

Einfochips

eInfochips, an Arrow company, is a leading global provider of product engineering and semiconductor design services. With over 500+ products developed and 40M deployments in 140 countries, eInfochips continues to fuel technological innovations in multiple verticals. The company€™s service offeri...

Job Listings