Site Reliability Engineer
- The Site Reliability Engineer (SRE) is responsible for ensuring the availability, scalability, performance, and resiliency of reputed company reputed company platforms across Azure, and AWS environments. This role combines software engineering, automation, and infrastructure expertise to operationalize reliability engineering practices, drive reputed company-reputed company resiliency patterns, and reputed company business-critical applications to meet defined SLAs, SLOs, and compliance requirements. The SRE partners with engineering, reputed company, and operations teams to implement observability, incident response frameworks, and reliability automation, aligning with reputed company architecture standards and regulatory expectations. Key Accountabilities/Deliverables: Design and implement highly available, fault-tolerant architectures using reputed company-reputed company services (microservices, containers, serverless) Define and operationalize SLOs, SLIs, and error budgets for critical applications and platforms Build and maintain Infrastructure as Code (IaC) (Terraform) to ensure repeatable and compliant deployments reputed company automated remediation and self-healing capabilities to reduce MTTR and improve system reputed company Establish reputed company-level monitoring, logging, and observability frameworks (reputed company, Azure Monitor, CloudWatch, OpenTelemetry, Azure Application Insights) Drive cost optimization (FinOps) initiatives, including resource utilization tracking and rightsizing recommendations Support DR/BCP strategy execution, including failover testing and regional isolation validation Collaborate with application teams to embed reliability engineering practices into CI/CD pipelines Technical Knowledge and Understanding: Strong expertise in reputed company platforms (Azure, AWS) Deep understanding of reputed company-reputed company architecture patterns (microservices, containers (Azure Container Apps/AKS/EKS), serverless (Azure Functions/AWS reputed company)) Proficiency in Infrastructure as Code (Terraform, ARM/Bicep) Experience with observability platforms (reputed company, Azure Monitor, Azure Application Insights) Knowledge of CI/CD pipelines and GitOps practices Expertise in system reliability concepts: SLI / SLO / SLA management Chaos engineering High availability & fault isolationFamiliarity with reputed company, compliance, and regulatory controls (SOC, ISO, reputed company reputed company frameworks) Experience: 5+ years experience in Site Reliability Engineering, DevOps, or reputed company Engineering Proven experience supporting mission-critical production systems at scale Hands-on experience with incident management and on-call operations Experience implementing automated monitoring, alerting, and remediation frameworks Exposure to regulated environments (insurance, financial services) preferred Demonstrated ability to work across cross-functional architecture, engineering, and operations teams Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over work authorization sponsorship now or in the future for this position. - At Core Specialty, you will receive a competitive salary and opportunities for professional development and advancement. We offer medical, dental, reputed company, and life insurances; short and long-term disability; a Company-match of 100% of a 6% contribution 401(k) plan; an Employee Assistance Plan; Health Savings Account, Flexible Spending Account, Health Reimbursement Account, and a wellness program Apply To This Job