Job Description
About Impacteers
Impacteers is building the worlds first AI-powered Business-to-Talent (B2T) platform, focused on bridging the gap between talent and opportunity in a faster, smarter, and more purpose-driven way.
At Impacteers, we believe the future of work needs to be both data-driven and human-centred. Our ecosystem helps individuals discover career pathways, build relevant skills, and access meaningful opportunities, while enabling organizations to identify, develop, and retain the right talent more effectively.
We are creating products at the intersection of AI, careers, hiring, and workforce transformation. That means solving meaningful problems across the talent lifecycle, from discovery and assessment to recruitment experience and long-term engagement.
About the Role
Were looking for a Lead DevOps & SRE to join the Engineering Team at Impacteers. In this role, youll own the reliability, scalability, security, and operational excellence of our entire platform infrastructure.
Youll work closely with engineering, product, and cross-functional teams to build and maintain the infrastructure, CI/CD pipelines, observability systems, and cloud operations that keep our products running smoothly and shipping confidently. This role is ideal for someone with strong infrastructure engineering fundamentals, hands-on cloud and automation experience, a reliability-first mindset, ownership mentality, and the ability to guide engineers on operational best practices while staying close to execution.
You will play a key role in ensuring our AI-powered career, hiring, and workforce products are deployed reliably, scale efficiently, remain secure, and deliver consistently high availability for users and businesses.
Role Overview
As a Lead DevOps & SRE, you will:
- Own the design, implementation, and continuous improvement of CI/CD pipelines across all products and services
- Architect and manage cloud infrastructure on AWS (or equivalent) including compute, networking, storage, databases, and managed services
- Define and enforce infrastructure-as-code practices using tools such as Terraform, CloudFormation, Ansible, or Pulumi
- Build and maintain observability, monitoring, alerting, and incident response systems to ensure platform reliability and uptime
- Establish and track SLIs, SLOs, and error budgets to drive data-informed reliability decisions
- Lead incident management, root cause analysis, post-mortems, and corrective action follow-through
- Implement security best practices across infrastructure, networking, secrets management, access controls, and compliance
- Manage containerized workloads using Docker and orchestration platforms such as Kubernetes or ECS
- Automate repetitive operational tasks, environment provisioning, scaling, backup, and disaster recovery
- Collaborate with backend and frontend engineers to ensure smooth deployments, rollback strategies, and zero-downtime releases
- Optimize infrastructure costs through rightsizing, reserved capacity planning, and usage monitoring
- Evaluate and introduce DevOps tooling, platforms, and practices that improve developer productivity and release velocity
- Contribute to AI-enabled product infrastructure by supporting GPU workloads, model serving, AI pipeline orchestration, and data platform needs where relevant
- Mentor engineers on DevOps practices, operational hygiene, and reliability culture
Why This Role Might Be for You
- You want to build the infrastructure backbone that powers products creating real career and hiring impact
- Youre excited by the opportunity to design and scale cloud-native infrastructure for AI-powered platforms
- You enjoy solving complex operational problems and turning fragile systems into resilient, self-healing platforms
- You like staying hands-on with infrastructure while also guiding engineers and improving team operational maturity
- You want to work with a fast-moving team shaping products across career tech, hiring tech, AI, and workforce transformation
- Youre looking for a role where reliability, automation, security, speed, and engineering excellence all matter
Basic Qualifications
- Bachelor's or master's degree in computer science, Engineering, Information Technology, or a related field
- 58 years of experience in DevOps engineering, site reliability engineering, cloud infrastructure, or platform engineering
- Prior experience leading or owning infrastructure for production systems serving real users at scale
- Strong hands-on experience with cloud platforms such as AWS, GCP, or Azure
- Deep experience building and maintaining CI/CD pipelines using tools such as Jenkins, GitHub Actions, Bitbucket Pipelines, or ArgoCD
- Strong understanding of containerization (Docker) and container orchestration (Kubernetes, ECS, or similar)
- Experience with infrastructure-as-code tools such as Terraform, CloudFormation, Ansible, or Pulumi
- Strong understanding of networking, load balancing, DNS, SSL/TLS, firewalls, and VPC design
- Experience with monitoring, logging, and observability tools such as Prometheus, Grafana, Datadog, ELK, CloudWatch, or similar
- Comfortable scripting and automating with Bash, Python, or Go
- Strong communication and collaboration skills with a practical problem-solving mindset
- Comfortable working in a fast-paced, evolving environment with ambiguity and changing priorities
- Understanding of how AI-enabled products influence infrastructure requirements, scaling patterns, and operational considerations
Preferred Qualifications
- Experience with deploying products with technologies such as Python, Node.js, PostgreSQL, Redis, Nginx, or any similar application stacks
- Experience operating infrastructure for SaaS platforms
- Strong understanding of secrets management, IAM policies, network security, vulnerability scanning, and compliance frameworks
- Experience with database operations including backup, replication, failover, migration, and performance tuning for PostgreSQL, MongoDB, or similar
- Experience supporting AI/ML infrastructure including GPU instances, model serving (e.g. SageMaker, TorchServe), and data pipeline orchestration (e.g. Airflow, n8n)
- Experience with cost optimization strategies, FinOps practices, and cloud billing analysis
- Familiarity with tools such as Bitbucket, Jira, PagerDuty, Opsgenie, Vault, and collaboration platforms
- Interest in platform engineering, developer experience tooling, and internal developer platforms
- Curiosity about emerging trends in cloud-native architecture, AI infrastructure, and operational excellence
Job Classification
Industry: Education / Training
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time
Contact Details:
Company: ti Steps
Location(s): Chennai
Keyskills:
Sre
Cloud Infrastructure
AWS
CI/CD
load balancing
ELK
or Pulumi
DNS
firewalls
Prometheus
networking
Datadog
Grafana
GitHub Actions
Bitbucket Pipelines
Docker
Terraform
Azure
CloudFormation
or ArgoCD
or similar
Jenkins
GCP
SSL/TLS
CloudWatch
Ansible