See all roles

[Remote] Sys/Cloud Admin/Incident Response Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. i4DM is a company that provides Federal agencies with access to highly skilled professionals for complex mission challenges. They are seeking an experienced Sys/Cloud Admin/Incident Response Engineer to support enterprise monitoring operations, incident detection, and response activities for a mission-critical platform within the Department of Veterans Affairs environment.

Responsibilities

  • Administer, monitor, and support cloud and platform services, virtual infrastructure, and hosted applications to maintain system health, availability, and performance
  • Configure, tune, and maintain monitoring, logging, and alerting solutions to improve visibility across infrastructure, applications, and service dependencies
  • Validate alert accuracy, reduce noise, and help ensure operational issues are detected proactively through effective observability practices
  • Perform routine system administration tasks such as environment checks, service restarts, access support, patch coordination, and operational maintenance activities
  • Monitor incident queues and system alerts, perform initial triage, document impact, and execute defined escalation procedures for incidents affecting mission-critical services
  • Participate in major incident response activities, including troubleshooting, log review, coordination with engineering teams, and support for service restoration efforts
  • Follow incident response playbooks, severity models, and communication protocols to support timely resolution and accurate status reporting
  • Document incident timelines, actions taken, recovery steps, and supporting evidence to enable post-incident review and continuous improvement
  • Support coordination during operational events by working across infrastructure, application, DevSecOps, SRE, and service management teams
  • Provide clear, timely updates on incident status, service impact, troubleshooting progress, and recovery actions to internal stakeholders
  • Escalate issues appropriately based on impact, urgency, and established operational procedures
  • Maintain accurate operational records in ticketing, incident, and knowledge management systems
  • Partner with engineers and platform teams to improve dashboards, alerts, runbooks, and operational procedures supporting reliable service delivery
  • Identify recurring operational issues, alert gaps, and system weaknesses, and recommend practical improvements to reduce incident frequency and response time
  • Support automation efforts for routine operational tasks, alert correlation, remediation workflows, and incident response activities where applicable
  • Contribute to post-incident reviews, root cause analysis activities, and implementation of corrective or preventive actions
  • Help maintain operational reporting on incidents, system health, availability, and response metrics to support service-level objectives and operational reviews
  • Ensure incident records, escalation paths, standard operating procedures, and response documentation remain current and usable
  • Support compliance with operational policies, security requirements, and change management practices in cloud and enterprise environments
  • Participate in on-call or after-hours operational support, as required, in a 24x7 mission-driven environment

Skills

  • Bachelor's degree in Information Technology, Computer Science, Engineering, Cybersecurity, or a related field; equivalent relevant experience may be considered
  • 3+ years of experience in systems administration, cloud operations, site reliability, network operations, incident response, or enterprise production support roles
  • Hands-on experience supporting Windows and/or Linux server environments, cloud-hosted infrastructure, and enterprise application platforms
  • Experience with monitoring, logging, and observability tools used to detect, investigate, and troubleshoot service disruptions
  • Working knowledge of incident management processes, ticketing workflows, escalation practices, and service restoration procedures in ITIL-aligned environments
  • Ability to analyze logs, alerts, and system behavior to support troubleshooting and rapid issue resolution
  • Strong written and verbal communication skills, with the ability to document incidents and coordinate effectively across technical and non-technical stakeholders
  • Ability to work in a 24x7, SLA-driven environment and participate in operational response activities under time-sensitive conditions
  • Candidates must be eligible to obtain and maintain a Public Trust clearance
  • Experience supporting VA or other Federal Government environments, including familiarity with operational reporting, service management, and compliance expectations
  • Experience with cloud and platform technologies such as AWS, Azure, Kubernetes, container platforms, virtualization, or hybrid infrastructure
  • Familiarity with enterprise monitoring and observability platforms such as Splunk, Dynatrace, CloudWatch, Azure Monitor, Grafana, or similar tools
  • Experience using scripting or automation tools such as PowerShell, Python, Bash, or infrastructure automation frameworks to streamline operational tasks
  • Exposure to DevSecOps, Site Reliability Engineering (SRE), SAFe Agile, or modern incident response and post-incident review practices
  • Relevant certifications such as AWS Certified SysOps Administrator, Azure Administrator Associate, CompTIA Security+, ITIL Foundation, Splunk, or similar credentials

Company Overview

  • i4DM provides full range of information technology consulting services to government and commercial clients. It was founded in 2002, and is headquartered in Millersville, Maryland, USA, with a workforce of 51-200 employees. Its website is https://www.i4dm.com.
  • Apply To This Job

    You might like

    [Remote] PATIENT ACCOUNT ANALYST

    Work from home Full-time role

    [Remote] Temporary Part-Time Recruiting Coordinator

    Work from home Full-time role

    [Remote] B2B SaaS Account Executive

    Work from home Full-time role

    [Remote] Associate Site Reliability Engineer

    Work from home Full-time role

    [Remote] Customer Support Specialist - Remote

    Work from home Full-time role

    [Remote] Sourcing Recruiter (Remote) - North East Region

    Work from home Full-time role

    [Remote] Operations Director | Remote| Flexible Career Pivot

    Work from home Full-time role

    [Remote] Director, Operations Analytics

    Work from home Full-time role

    [Remote] Project Manager-HV Cables

    Work from home Full-time role

    [Remote] IBM ITX/ITXA Developer – NCPDP Healthcare

    Work from home Full-time role

    Export Control Analyst II

    Work from home Full-time role

    Related Services Operations Specialist

    Work from home Full-time role

    (Hybrid) Pharmacy - Client Success Coordinator

    Work from home Full-time role

    Credit Portfolio Manager IV (REMOTE)

    Work from home Full-time role

    Experienced Retail Customer Service Associate – Delivering Exceptional Customer Experiences in a Fast-Paced Environment at blithequark

    Work from home Full-time role

    Experienced Data Entry Clerk – Entry Level Opportunity for Career Growth and Development in a Dynamic and Supportive Environment at arenaflex

    Work from home Full-time role

    Influencer Marketing Manager - Netherlands (gn)

    Work from home Full-time role

    Encoder work from home job in Philippines – Amazon Store

    Work from home Full-time role

    Entry Level Associate Data Engineer Opportunity at Southwest Airlines - $26/Hour, Remote Friendly with Headquarters in Texas, USA

    Work from home Full-time role

    Residential & Airbnb Cleaners Wanted in West Palm Beach!

    Work from home Full-time role