Senior Site Reliability Engineer - AWS
Description:
- Provide leadership, mentoring, and sound judgment as the reliability engineering lead on the team.
- Design and maintain autonomous systems for building, deploying, testing, and operating Filevine products.
- Serve as the authoritative voice of reliability across the full software development lifecycle.
- Monitor, aggregate, dashboard, and alert on software and infrastructure events to ensure visibility and rapid response.
- Continuously improve CI/CD pipelines, automation scripts, playbooks, and tools to streamline operations and reduce resolution time.
- Identify and resolve gaps in system availability, performance, and security while strengthening the overall security posture.
- Document processes, architecture, procedures, and best practices to support team effectiveness.
- Research, adopt, or build reliable tools that improve engineer productivity.
- Collaborate with team members and stakeholders, mentor junior engineers, and participate in a 24/7 on-call rotation for production support and emergency response.
Requirements:
- 8+ years of hands-on technical experience in software engineering, infrastructure, or operations roles, including at least 4 years dedicated to Site Reliability Engineering.
- Strong curiosity, self-motivation, and a continuous learning mindset with proactive enthusiasm for improving systems and processes.
- Strong proficiency in Python, Bash, PowerShell, and other common SRE scripting and tooling technologies.
- Expert-level experience designing, building, and maintaining autonomous systems for build, deployment, testing, monitoring, and operations.
- Hands-on experience with AWS services such as EC2, Kubernetes/EKS, CloudWatch, Lambda, S3, and IAM.
- Proficiency in core SRE skills including monitoring and alerting, incident response, capacity planning, performance optimization, CI/CD enhancement, and reliability best practices.
- Bachelor’s degree in Computer Science, Information Systems, or a related field, or equivalent certifications such as AWS or Google Cloud Professional certifications, or substantial comparable direct work experience.
- Proven track record of independently driving reliability improvements, reducing toil through automation, and supporting highly available, scalable production systems in a fast-paced environment.
Benefits:
- $160,000 - $190,000 base salary.
- Eligible for a paid time off policy.
- Comprehensive benefits package.
- Medical, dental, and vision insurance for full-time employees.
- Maternity and paternity leave for full-time employees.
- Short- and long-term disability coverage.
- Opportunity to learn from a dedicated leadership team.
- Top-of-the-line company swag.
Apply tot his job Apply To this Job