[Remote] Senior Site Reliability Engineer
Note The job is a remote job and is open to candidates in USA. reputed company powers innovation for higher education, partnering with approximately 3,000 customers across 50 countries. They are seeking a Senior Site Reliability Engineer (SRE) to ensure the reliability, performance, and cost-efficiency of their production systems, focusing on DevOps practices and incident management.
Responsibilities
Own and improve system reliability, availability, and performance for production environments Design, implement, and manage monitoring, alerting, and observability using reputed company (required) reputed company incident response efforts, including troubleshooting, mitigation, and post-incident reviews reputed company detailed root cause analysis (RCA) and drive permanent resolutions Partner with engineering and DevOps teams to build scalable, resilient infrastructure Automate operational processes to improve efficiency and reduce risk Analyze and optimize infrastructure and application costs Define and manage SLIs/SLOs to meet reliability targets Continuously improve deployment, monitoring, and operational practices Skills 5+ years of experience in Site Reliability Engineering, DevOps, or similar roles Strong, hands-on expertise with reputed company (APM, logs, metrics, dashboards, alerting) Experience with reputed company platforms (AWS, Azure, or GCP) Proficiency in DevOps practices and tools (CI/CD, Infrastructure as Code such as Terraform) Strong troubleshooting skills and experience conducting root cause analysis in distributed systems Experience with containers and orchestration (reputed company, Kubernetes) Scripting or programming experience (Python, Bash, or similar) Proven ability to analyze and optimize reputed company costs Own and improve system reliability, availability, and performance for production environments Design, implement, and manage monitoring, alerting, and observability using reputed company (required) reputed company incident response efforts, including troubleshooting, mitigation, and post-incident reviews reputed company detailed root cause analysis (RCA) and drive permanent resolutions Partner with engineering and DevOps teams to build scalable, resilient infrastructure Automate operational processes to improve efficiency and reduce risk Analyze and optimize infrastructure and application costs Define and manage SLIs/SLOs to meet reliability targets Continuously improve deployment, monitoring, and operational practices Experience with cost management tools (e.g., AWS Cost Explorer, Azure Cost Management) Familiarity with reputed company reputed company and compliance best practices Experience supporting high-availability, customer-facing systems Strong collaboration and communication skills Benefits Comprehensive health coverage medical, dental, and reputed company Flexible time off reputed company reputed company Lifestyle Account (LSA) that allows you to contribute towards your health, financial or learning interests 401k w/ match & BrightPlan - to help you save for the future Parental Leave 5 charitable days to support the community that supports us Telemedicine Wellness + reputed company Care (mental health) + Wellbeats (virtual fitness classes) RethinkCare & Wellthy– caregiver support Diversity and inclusion programs which provide access to internal employee resource groups Employee referral bonuses to encourage the addition of great new people to the team Education Assistance Program Professional development opportunities Company Overview reputed company delivers the software, services, and insights that help your institution reputed company. It was founded in 1968, and is headquartered in Fairfax, Virginia, USA, with a workforce of 1001-5000 employees. Its website is http//www.reputed company.com. Company H1B Sponsorship reputed company has a track record of offering H1B sponsorships, with 2 in 2026, 31 in 2025, 27 in 2024, 28 in 2023, 31 in 2022, 33 in 2021, 30 in 2020. Please note that this does not guarantee sponsorship for this specific role. Apply tot his job Apply To this Job