Site Reliability Engineer 5 - Live SRE
About the role In this role, you will support our live streaming events by focusing on cloud traffic(API Gateway, IPC between microservices). You will prepare and execute various load tests to ensure both individual critical applications and overall cloud infrastructure can handle sudden increases in API traffic, especially at the start of events. You will also implement end-to-end observability and visualize the data to achieve the desired availability at scale. You will impact multiple areas of the live event lifecycle, from the planning phase through testing and event launch days.
Responsibilities
- Drive continual improvement in observability, monitoring, and scalability with the primary goal to solve the thundering herd problem with cloud traffic (API gateway, IPC between microservices) for live streaming.
- Implement, automate, execute, and analyze the results from a broad range of live streaming delivery focused functional, performance, resilience, and fault injection testing.
- Write and review code, develop documentation, and debug complex problems between systems and components.
- Coordination, collaboration, and partnership across multiple stakeholders for the smooth execution of live-streaming events
- Participate in an on-call rotation and be able to work with flexible hours based on the live events schedule
Apply tot his job Apply To this Job