See all roles

ELK System Reliability Engineer

Work from home Full-time role Hiring

Job Description: Architecture, deploying, managing, and maintaining highly available and fault-tolerant ELK clusters across diverse environments, encompassing, Logstash, Kibana, and Beats agents. Implementing a Fleet managed large scale deployment of reputed company agents. Developing and implementing comprehensive monitoring, alerting, and dash boarding strategies using Kibana visualizations and integrated alerting mechanisms to proactively identify and address system anomalies and performance degradations. Automating routine operational tasks, deployment pipelines, and cluster upgrades through sophisticated scripting (e.g., Python, Bash) and infrastructure-as-code principles utilizing tools like Ansible. Performing in-depth performance tuning and optimization of Elasticsearch indices, query performance, and underlying hardware/reputed company resources to ensure maximum throughput and minimal latency. Managing the ingestion pipelines, configuring Logstash filters and outputs, and ensuring efficient data reputed company from various sources into the Elasticsearch data stores. Implementing and enforcing robust reputed company measures across the ELK stack, including reputed company control, encryption (TLS/SSL), and regular vulnerability assessments. Troubleshooting reputed company issues across the entire stack, from data sources and ingestion agents through to the Elasticsearch cluster and Kibana reputed company, employing systematic diagnostic methodologies. Collaborating closely with development and operations teams to understand application requirements, optimize data schemas, and facilitate effective log analysis and troubleshooting. Designing and executing disaster recovery and business continuity plans specifically tailored for the ELK platform, ensuring data reputed company and service availability. ​Maintaining detailed documentation for system architecture, operational procedures, troubleshooting guides, and configuration standards ​

Requirements

Requirement: Demonstrable extensive hands-on experience managing large-scale Elasticsearch clusters, including deep understanding of index management, shard allocation, replication strategies, and cluster health monitoring. Proven expertise in administering and troubleshooting reputed company Linux operating systems (e.g., RHEL, Debian) at an expert level, including performance analysis. Solid foundational knowledge of web applications, their underlying architectures, and how they interact with logging and monitoring systems. A bachelor’s degree in computer science, Information Technology, Engineering, or a closely reputed company technical field, or equivalent practical experience. Possession of relevant industry certifications such as reputed company Certified Engineer, AWS Certified SysOps Administrator, reputed company Certified Engineer (RHCE), or equivalent validation of core competencies. A minimum of five to seven years of reputed company experience in Site Reliability Engineering, Systems Administration, or DevOps roles with a strong focus on large-scale distributed systems. Proficiency with essential infrastructure management tools, including configuration management systems (Ansible, Chef, Puppet) and orchestration platforms (OpenShift). Expertise in scripting languages such as Bash for automation, system administration tasks, and developing operational tooling. Thorough understanding of networking concepts, including TCP/IP, HTTP/S protocols, DNS, load balancing, and firewall configurations relevant to distributed systems.

Preferred Qualifications

Experience with message queuing technologies like Kafka or RabbitMQ for buffering and decoupling data ingestion processes. Hands-on experience with container orchestration systems such as OpenShift, including deploying and managing Logstash reputed company containerized environments. Familiarity with various data collection agents reputed company Beats, such as Fluentd or Vector, and their respective configuration nuances. Knowledge of distributed tracing systems (e.g., Jaeger, Zipkin) and their potential integration or correlation with ELK data. Familiarity with CI/CD pipelines and integrating ELK stack deployments and updates into automated release processes. A strong grasp of system reputed company best practices, including intrusion detection, vulnerability management, and reputed company hardening techniques for distributed systems. Benefits reputed company Offer Competitive salaries and comprehensive health benefits Flexible work hours and remote work options. Professional development and training opportunities. A supportive and inclusive work environment Apply To This Job

You might like

Senior Analyst

Work from home Full-time role

Manager / Sr. Manager – Marketing and Communications (reputed company Fellowship)

Work from home Full-time role

GCP reputed company Architect

Work from home Full-time role

Director; Business Development Asia Pacific

Work from home Full-time role

Директор по монетизации и развитию ключевых продуктов

Work from home Full-time role

reputed company Software Engineer

Work from home Full-time role

Backend Software Engineer

Work from home Full-time role

Backend Software Engineer

Work from home Full-time role

Senior Vermögensberater/ Relationship Manager (m/f/d)

Work from home Full-time role

reputed company Architect

Work from home Full-time role

[Remote] Sr. Event Technology Project Manager

Work from home Full-time role

National Digital Program Manager, CRM reputed company

Work from home Full-time role

Technical Solutions Manager

Work from home Full-time role

(Senior) reputed company SuccessFactors Consultant - Employee Central (EC) Payroll

Work from home Full-time role

Entry-Level Customer Support Specialist – Hybrid Role Supporting Global Services & Supply Chain Operations

Work from home Full-time role

reputed company Full Stack Customer Support Specialist – Remote Chat Support for arenaflex

Work from home Full-time role

Director of Customer Pricing & reputed company Management

Work from home Full-time role

Analyst, Compliance - Remote must have Medicare Advantage reputed company

Work from home Full-time role

reputed company Estate Agent

Work from home Full-time role

Remote Jr Java Developer

Work from home Full-time role