Skills: SRE - AWS, Docker, Kubernetes, APIGEE, Cassandra, Oracle, PostgresSQL, Jenkins, Github, ELK, New Relic, Terraform, Python, Bash
This position is for SRE Lead in Dallas, TX:
Responsibilities:
- Team Leadership:
- Lead and mentor a team of SREs, ensuring they have the resources and support needed to succeed
- Foster a culture of reliability and continuous improvement within the team
- System Reliability:
- Ensure the availability, performance, and scalability of systems and services
- Develop and implement strategies for monitoring and maintaining system health
- Incident Management:
- Oversee the response to incidents, ensuring quick resolution and minimal downtime
- Conduct post-mortems to identify root causes and prevent future incidents
- Automation and Tooling:
- Develop and maintain automation tools to reduce manual work and improve efficiency
- Implement and manage CI/CD pipelines to streamline deployments
- Collaboration:
- Work closely with development, operations, and product teams to ensure alignment on reliability goals
- Communicate effectively with stakeholders about system performance and reliability
- Risk Management:
- Identify and mitigate potential risks to system reliability
- Implement strategies to handle failures and ensure disaster recovery
- Technical Expertise:
- Experience with:
- Cloud platforms (AWS), containerization technologies (Docker & Kubernetes), API management (Apigee), Databases (Non-SQL: Casandra & SQL: Oracle, PostgreSQL & DB2), and CICD (Jenkins, Github)
- Other technologies, ELK Stack & APM (New Relic, Terraform)
- Proficiency in scripting languages like Python or Bash
- Problem-Solving:
- Strong analytical skills to diagnose and resolve complex system issues
- Ability to design and implement effective monitoring and alerting systems
- Leadership:
- Proven experience in leading and growing engineering teams
- Excellent communication and collaboration skills
- Automation:
- Expertise in automation tools and practices to reduce manual intervention
- Familiarity with CI/CD processes and tools
- Resilience Engineering:
- Knowledge of best practices in building resilient, self-healing systems
You received this message because you are subscribed to the Google Groups "Latest C2C Requirements2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to latest-c2c-requirements2+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/latest-c2c-requirements2/CAMjeKS_20EhUmT7GAsxSxnERVTRBjaVBZFYqjTKnj5JcSUrSHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.