Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Site Reliability Engineer - Datadog @ HUCON Solutions

Home > Software Development

 Site Reliability Engineer - Datadog

Job Description

Role Overview


We are looking for a highly skilled Site Reliability Engineer (SRE) to build, operate, and continuously improve highly available, scalable, and observable platforms running on baremetal

Kubernetes clusters, DataDog and Google Kubernetes Engine (GKE).


The ideal candidate brings deep Kubernetes expertise, strong cloud-native experience on GCP, and a passion for reliability, automation, and operational excellence. This role works closely with application, platform, and architecture teams to ensure production systems are resilient, secure, and performant at scale.

Experience Range - 4 to 12 years 


Locations: Chennai, Hyderabad, Noida and Gurgaon only


Mode of work: Work from office only 

Key Responsibilities

Design, operate, and support Kubernetes platforms across baremetal clusters and GKE

Ensure high availability, scalability, performance, and reliability of production systems

Implement and manage GitOps-based deployment workflows using tools like Argo CD

Build, maintain, and optimize CI/CD pipelines using tools such as GitHub Actions,Harness, CircleCI, or equivalent

Deploy and manage applications using Helm, including canary and progressive delivery strategies

Hands on exp on cloudbased monitoring and observability tool i.e. DataDog Implement comprehensive observability using Prometheus, Grafana, Loki, and Tempo

Proactively monitor systems, troubleshoot incidents, and perform root cause analysis (RCA)

Partner with development teams to improve service reliability, scalability, and operational maturity

Provision and manage cloud infrastructure on Google Cloud Platform (GCP)

Automate infrastructure and platform operations using Infrastructure as Code (IaC) and scripting

Drive continuous improvements in resilience, automation, and operational efficiency


Required Skills & Qualifications

Strong hands-on experience with Kubernetes architecture and administration

Experience managing both bare-metal Kubernetes clusters and Google Kubernetes

Engine (GKE)

Solid understanding of Google Cloud Platform (GCP) services and networking concepts

Proven experience with GitOps practices and tools such as Argo CD

Proficiency with CI/CD tools (GitHub Actions, Harness, CircleCI, or similar)

Practical experience with:

  • Helm
  • Canary / progressive deployments

Strong expertise in observability and monitoring:

  • Prometheus
  • Grafana
  • Loki
  • Tempo
  • Experience with Terraform for infrastructure provisioning

Understanding of modern API technologies such as GraphQL

Familiarity with API management platforms (Apigee Edge, Apigee X)

Knowledge of CDN and edge services (e.g., Akamai)


Good to Have


Working knowledge of Java (Spring Boot) and/or Node.js framework

Understanding of microservices architecture and service-to-service communication

Experience with Ansible or similar configuration management tools

Exposure to hybrid or multicloud environments

Experience in performance tuning and cost optimization on GCP

Understanding of Kubernetes and cloud security best practices

SRE experience aligned with SLIs, SLOs, and error budgets

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Platform Engineer
Employement Type: Full time

Contact Details:

Company: HUCON Solutions
Location(s): Hyderabad

+ View Contactajax loader


Keyskills:   Kubernetes Cluster Springboot Java Site Reliability Engineering Datadog Gcp Cloud Platform Development Ansible Prometheus Helm Grafana

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Site Reliability Engineer

  • Capgemini
  • 6 - 9 years
  • Hyderabad
  • 20 days ago
₹ Not Disclosed

Senior Site Reliability Engineer (SRE)

  • Kale Logistics
  • 10 - 15 years
  • Pune
  • 1 month ago
₹ 25-32.5 Lacs P.A.

Senior Site Reliability Engineer

  • Tekskills
  • 6 - 10 years
  • Bengaluru
  • 2 mths ago
₹ 5-15 Lacs P.A.

Manager Site Reliability Engineer

  • Global Technology
  • 8 - 12 years
  • Pune
  • 2 mths ago
₹ 0-40 Lacs P.A.

HUCON Solutions

Hucon Solutions India Pvt.Ltd. Hucon Solutions is an Integrated HR Service Provider for all Corporates all over India. We are backed by a good ERP and enough experience in HR and related activities. It has helped generate career opportunities for more than a million individuals in India. Hucon Solu...

Job Listings