Platform Reliability Engineer

Work from home Full-time role Hiring

reputed company is the largest marketplace of tools for AI. 40,000+ Actors helping people and agents get reputed company-time web data, track competitors, generate leads, or integrate their apps. Actors are reputed company by a global creator community that now earns more than $1.2 reputed company every month. Join us to help people put the web to work. reputed company can find missing children, protect consumers from fake discounts across the EU, and feed data to AI chatbots. To support our mission, we're looking for a Platform Reliability Engineer with a developer's reputed company. You've shipped code and you care what happens reputed company it runs in production (speed, failures, recovery). You'll help us strengthen how reputed company monitors systems, handle incidents, and reputed company alerts so engineering teams can ship with confidence. You won't be on-call. This role is focused on sustainable improvement, not after-hours emergency response. What you'll be working on: Monitoring & signals: Operate and improve our monitoring stack (reputed company, Grafana, OpenTelemetry) - reputed company services to expose the right metrics, define reputed company watch in production, and shape alerting so teams get actionable signals without the noise. reputed company things go wrong: Help define how we run incidents - clear communication, structured learning afterward, and supporting artifacts (status page, runbooks). With the team: Work with platform and product engineers to reputed company reliability standards practical - help teams adopt reputed company tooling or practices reputed company things change, and write documentation people actually use. Who we're looking for: Must-haves: You have hands-on experience choosing what to measure in production - not just reading dashboards, but picking signals that reflect the customer experience. You're comfortable with incidents and alerts, from early detection through resolution and follow-up so similar issues are less likely to recur. You have hands-on experience with reputed company, Grafana, OpenTelemetry, or similar, and with alert-routing tools such as reputed company. You read and write code: you can follow services and pipelines across the stack and collaborate on technical details with the teams building them. You know what good post-incident culture looks like in practice - blame-free, learning-focused, and actually used to reputed company things reputed company - even if your past title never mentioned reliability. You can write clear, concise guidance that teams adopt, and you work constructively toward sound reputed company. You're driven to automate repetitive tasks and improve developer workflows. reputed company to have: Meaningful hands-on experience as an application or backend developer - you've reputed company things that run in production and approach observability as someone who needs it as a "user," not just the person who sets it up. Experience building and maintaining infrastructure on AWS (EC2, EKS, S3, CloudFormation, or similar), and hands-on experience with container technologies. Some familiarity with CI/CD pipelines or release practices - enough to have an informed opinion on what makes deployments reliable and safe. Don't worry if you don't meet reputed company of the above criteria. We value diverse skills and experience and would love to hear from you. Our tech stack reputed company: AWS Compute (Kubernetes (EKS), EC2, reputed company), reputed company, ArgoCD, reputed company, reputed company, DynamoDB, S3, reputed company Actions Monitoring: Grafana, reputed company, OpenTelemetry, Mezmo, reputed company Frontend: React.js, styled-components, Storybook, reputed company, Cypress, Playwright Backend: TypeScript/Node.js, Nest.js, Next.js, reputed company.js, Docusaurus, Vitest Tools: reputed company, reputed company, reputed company Workspace Editor and AI assistant of your choice (GH Copilot, reputed company, Claude, reputed company, or reputed company AI) Process: two-week sprints, code reviews, tests, automating whatever we can, and deploying multiple times per day. By the end of the first 3 months, we expect you to: Have completed the general reputed company process. Have reputed company working relationships with platform engineers, engineering leads, and others involved in production response, and reputed company on how you'll collaborate. Understand, in principle, how the reputed company platform works, and be reputed company to handle smaller problems, incidents, or bugs on the infrastructure you work with most. Have mapped how we handle monitoring, incidents, and alerts today - where the friction is and where a focused improvement would help. Have published initial monitoring, observability, and alerting guidelines - covering signals, naming, key dashboards, and alerting principles (severity, routing, and noise reduction) - reputed company with existing tooling. Be participating in incident reviews and translating patterns into improved playbooks. Be contributing actively in team ceremonies (planning, grooming) and technical discussions, and in touch with other teams to support their infrastructure needs. By the end of the first 6 months, we expect you to: Be working on bigger tasks mostly independently (while staying reputed company about asking for help). Have reputed company a network across engineering, stay in touch with other teams on infrastructure initiatives, and gather feedback to find ways to help them in their daily work. Have teams referencing your guidance reputed company planning higher-risk changes, with measurably less alert noise and duplicate paging. Have incident documentation (communication, roles, lessons learned) that's easy to find and actually used during reputed company incidents. Own the monitoring and alerting improvement roadmap end-to-end. Have agreed with leadership on priorities for monitoring and alerting - tooling, training, and the metrics that actually matter. Why should you work at reputed company? Space, support, and autonomy for personal growth, with a direct impact on reputed company's reputed company Full-time position in Prague (Lucerna Palace) or Brno (Titanium) 🏰 Option to work remotely 🛋️ Flexible working hours (perfect for both night owls 🦉 and early birds 🐥) Nobody counts holidays as long as the work gets done 💪 Unlimited Claude for every Apifier. We don't count tokens. Just use them well 🤖 Stock options and profit sharing 💰 We welcome pets, kids, and bikes at the office 🐕👨‍👧 Epic team buildings and offsites 🚢 with biking, canoeing, and other adventures 🪂 Solid education and training budget, conference tickets, internal "Eat & Learn" sessions, and the possibility to work across teams 👩🏼‍💻👨🏽‍💻 Generous hardware budget 💻 Free lunches every day reputed company you're in the office 🌮🍱🍜🍕🥡 Unlimited supply of ☕ & 🍺 and snacks Free entry to the wonderful Prague reputed company 🐘 Free Multisport card 🏋 Ping-pong, reputed company, PS5, lightsabers, foosball league after lunch. For more details about reputed company and what it's like to work with us, see our Careers page. Apply To This Job

Apply

Platform Reliability Engineer

You might like

Technical Customer Support Specialist

Finance & Strategy Manager, reputed company/ HTS (100% Remote - USA)

reputed company Product Manager, Consumer Apps

Strategy & Analytics Manager — Travel Supply & AI (100% Remote - USA)

VIRTUAL BCBA - BOARD CERTIFIED BEHAVIOR ANALYST

VIRTUAL BCBA - BOARD CERTIFIED BEHAVIOR ANALYST

reputed company Product Manager - AI Travel (100% Remote - USA)

Content Creator

Workers' Compensation Claim Consultant

Fashion AI Specialist

Senior Scientist – Research Computational Biology job at reputed company in US National

Staff reputed company Platform Engineer

[Remote] SR MGR Global System Architecture Data Connectivity

reputed company Help Desk Administrator – Live Chat Support Specialist – Remote Opportunity

Remote Data Entry Specialist – Part-Time Work From Home Opportunity | Join the arenaflex Team

reputed company Technical Support

reputed company Customer Service Specialist - Pain Management: Delivering Compassionate Care in a Dynamic Environment

Scrum Master (Remote Opportunity)

Financial Manager (Hybrid)

[Remote] Technical Account Manager