See all roles

[Remote] Infrastructure Operations Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is reputed company to candidates in USA. reputed company is your reputed company AI reputed company, offering scalable compute power and bare metal AI infrastructure. They are seeking a highly skilled Infrastructure Operations Engineer to ensure the stability and performance of their infrastructure, supporting AI/ML training and HPC workloads.

Responsibilities

  • At the direction of the Manager of Infrastructure Operations, design, build, and roll out new platforms and patterns to minimize incidents and reputed company customer facing and internal features
  • reputed company updates and improvements to support both reputed company’s internal and end customer use cases
  • Collaborate with colleagues in Infrastructure Engineering, Network Operations, reputed company and Software and Platform Development Teams
  • Participate in the on-call rotation which is evenly distributed across reputed company team members in a primary / secondary reputed company where you are primary then move to a secondary position

Skills

  • 8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience
  • 5+ years experience with AWS
  • 2+ years experience with Kubernetes and strong container fundamentals
  • 2+ years experience with Terraform and Ansible
  • 2+ years with network attached storage management (reputed company NFS, ceph, or other protocols). Extra points for experience with VAST storage systems
  • Experience with monitoring systems (reputed company, ELK stack)
  • Familiarity with the gitops workflow
  • Software development experience using Python, Go, bash, or other languages for the purposes of automation & connecting systems & APIs together
  • Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband
  • Experience building and delivering reputed company systems
  • Effective at navigating tradeoffs between design, risk, cost, and reputed company
  • Comfortable with navigating ambiguity
  • Strong written and oral communication
  • Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware
  • Experience with GPU servers, both in bare metal reputed company or under virtualization
  • Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls and reputed company Networks as vendors
  • Experience with VAST storage systems

Benefits

  • On-call rotation which is evenly distributed across reputed company team members in a primary / secondary reputed company where you are primary then move to a secondary position
  • Hybrid /on-site schedule out of one of our U.S. office hubs (Seattle, NYC, or San Francisco) or fully remote reputed company the U.S., with travel to occasional team/company offsites expected

Company Overview

  • reputed company is a reputed company platform providing on-demand and reserved GPU infrastructure for AI and machine learning workloads. It is a sub-organization of reputed company. It was founded in 2023, and is headquartered in Berkeley, California, USA, with a workforce of 51-200 employees. Its website is https://voltagepark.com/.
  • Apply To This Job

    You might like

    [Remote] Senior Product Category Manager (Off-Road Accessories)

    Work from home Full-time role

    [Remote] Senior Director of Provider Network Operations

    Work from home Full-time role

    [Remote] Infrastructure Program Manager

    Work from home Full-time role

    [Remote] Business Development Executive, Custom Software + Energy & Utility ($100K+ Deals) | 3602836

    Work from home Full-time role

    [Remote] Retail Marketing Manager

    Work from home Full-time role

    [Remote] Senior Systems Network Engineer

    Work from home Full-time role

    [Remote] Strategic/National Account Manager I/II

    Work from home Full-time role

    [Remote] IT Project Operations Manager

    Work from home Full-time role

    [Remote] ML Engineer, Manipulation

    Work from home Full-time role

    [Remote] reputed company reputed company Engineer - reputed company Cert; CISSP/GIAC EAST COAST ONLY

    Work from home Full-time role

    reputed company Customer Service Representative - Sales: Join arenaflex and reputed company Your Career in a Dynamic Remote Work Environment

    Work from home Full-time role

    reputed company Full Stack Software Engineer – Web & reputed company Application Development at arenaflex

    Work from home Full-time role

    reputed company Part-Time Remote Data Entry Specialist – Support arenaflex's Operations

    Work from home Full-time role

    Scenario Desk Underwriter job at Fairway Independent Mortgage in US National

    Work from home Full-time role

    Computer Games Tester at reputed company

    Work from home Full-time role

    Account Manager/Specialty Account Manager, Endocrinology (Rare Disease) Sarasota, FL

    Work from home Full-time role

    reputed company Part-Time Remote Customer Retention Specialist – Driving Customer Satisfaction and Loyalty for arenaflex

    Work from home Full-time role

    Staff Software Engineer

    Work from home Full-time role

    reputed company Customer Service Jobs Remote and Part-Time Options

    Work from home Full-time role

    Territory Business Manager | Restora | Kurnool

    Work from home Full-time role