Hirelo

Site Reliability Engineer - L2 & L3

Job Location

bangalore, India

Job Description

Key Responsibilities : - Incident Management : Provide L2 support for critical production incidents, performing root cause analysis, and implementing effective solutions to minimize downtime. - Automation and Infrastructure as Code (IaC) : Develop and maintain automation scripts using Python, Bash, and Go to streamline operational tasks. Implement and manage IaC using Terraform and Ansible to automate infrastructure provisioning and configuration. - UNIX Systems Administration : Manage and troubleshoot critical applications running in a UNIX environment, ensuring system stability and performance. - Database Management : Administer and optimize production databases (Postgres, MySQL, Oracle) in both cloud and on-premise environments. Perform database backups, restores, and performance tuning. - Cloud Infrastructure Management : Design, deploy, and manage infrastructure on AWS and/or Azure cloud platforms. Implement best practices for security, scalability, and cost optimization. - Containerization and Orchestration : Deploy, manage, and troubleshoot Kubernetes clusters. Ensure high availability and scalability of containerized applications. - Monitoring and Logging : Implement and maintain monitoring and logging solutions using the ELK stack (Elasticsearch, Logstash, Kibana) to proactively identify and resolve issues. - Performance Tuning and Optimization : Analyze system performance metrics, identify bottlenecks, and implement solutions to optimize performance. - Collaboration and Communication : Collaborate with cross-functional teams to resolve issues and implement improvements. Communicate effectively with stakeholders and provide clear and concise documentation. - On-Call Support : Participate in an on-call rotation to provide 24/7 support for critical systems. - Documentation : create and maintain detailed documentation of systems, procedures, and troubleshooting steps. Required Skills and Experience : - Experience : 5-8 years of experience in an L2 Site Reliability Engineer, DevOps Engineer, or similar role. - Scripting : Proficiency in scripting languages such as Python, Bash, and Go. - Infrastructure as Code : Hands-on experience with Terraform and Ansible for infrastructure automation. - UNIX Systems : Strong experience supporting critical applications in a UNIX environment. - Database Management : Expertise in managing production databases (Postgres, MySQL, Oracle) in cloud and on-premise environments. - Cloud Platforms : Extensive experience with AWS and/or Azure cloud environments. - Containerization : Solid understanding of Kubernetes and containerization technologies. - Monitoring and Logging : Experience with the ELK stack for monitoring and logging. - Education : Bachelor's or Master's degree in Computer Science or a related field with 5 years of relevant experience. - Problem-Solving : Excellent problem-solving and troubleshooting skills. - Communication : Strong communication and collaboration skills. Preferred Qualifications : - Relevant certifications (e.g., AWS Certified DevOps Engineer, Kubernetes Administrator, Oracle Database Administrator). - Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI). - Knowledge of networking concepts and protocols (TCP/IP, DNS, HTTP). - Experience with configuration management tools (e.g., Chef, Puppet). - Experience with other monitoring tools (Prometheus, Grafana). (ref:hirist.tech)

Location: bangalore, IN

Posted Date: 5/9/2025

View More Hirelo Jobs

Contact Information

Contact	Human Resources Hirelo