See all roles

[Remote] IBM Workload Scheduler Administration / Infrastructure Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Kastech Software Solutions Group is seeking a highly skilled IBM Workload Scheduler Administration / Infrastructure Engineer with 3–5+ years of experience. The role involves managing, maintaining, and optimizing enterprise batch scheduling infrastructure, ensuring high availability and reliable execution of critical business workloads.

Responsibilities

  • IBM Workload Scheduler Administration
  • Administer Production IBM Workload Scheduler (formerly Tivoli Workload Scheduler) environment:
  • 28,000 unique daily jobs
  • Approximately 350,000 daily job runs
  • 44 servers
  • Three additional change-control environments
  • Install, configure, administer, patch, and upgrade IWS components:
  • Master Domain Manager (MDM)
  • Dynamic Agents
  • Dynamic Pools
  • Dynamic Workload Console (DWC)
  • Change Management & Governance
  • Work closely with Product Owners and communicate workstreams through Jira
  • Manage job promotions using a Workload Application Template-based process
  • Perform safety and stability assessments for all job promotions
  • Manage change control across four separate environments
  • Enforce change management standards, policies, and governance
  • Platform Availability & Operations
  • Maintain and continuously improve Production platform uptime target of 99.17% per month
  • Follow SOPs, DevOps practices, and disciplined change-control processes
  • Coordinate platform-impacting communications to a user community of approximately 500 developers and data engineers
  • Support Production infrastructure consisting of:
  • 44 servers
  • MDM, DWC, and Agent environments
  • Troubleshooting & Support
  • Resolve:
  • Complex job failures
  • Performance bottlenecks
  • Agent-related issues
  • Infrastructure-related issues
  • Provide guidance on complex job scheduling designs to less experienced team members
  • Monitoring, Security & Compliance
  • Monitor scheduler platform health and performance
  • Manage database maintenance activities
  • Perform backup, disaster recovery, and monthly failover testing
  • Define and maintain:
  • Security policies
  • User authorizations
  • Authentication for Dynamic Workload Console (DWC)
  • Respond to:
  • Cybersecurity vulnerability assessments
  • PCI compliance audits
  • Other regulatory audit requests
  • Automation & DevOps
  • Design and implement Ansible-based automation solutions
  • Develop self-healing mechanisms to reduce unplanned outages
  • Coordinate with offshore teams performing SOP activities during non-business hours
  • Develop automation scripts using:
  • Python
  • IWS REST APIs

Skills

  • Ability to modernize, implement, install, configure, upgrade, migrate, develop, or design IBM Workload Scheduler (IWS) / IBM Workload Automation (IWA) solutions
  • Support migration activities across pre-production and production environments
  • Participate in knowledge transfer and documentation to enable team self-sufficiency
  • 3–5+ years of dedicated IBM Workload Scheduler administration experience
  • Responsible for managing, maintaining, and optimizing enterprise batch scheduling infrastructure
  • Primary environment hosted on Red Hat Enterprise Linux (RHEL)
  • Strong expertise in: IBM Workload Scheduler (IWS), Linux System Administration, Scripting and Automation
  • Focus on ensuring high availability and reliable execution of critical business workloads
  • Administer Production IBM Workload Scheduler (formerly Tivoli Workload Scheduler) environment: 28,000 unique daily jobs, Approximately 350,000 daily job runs, 44 servers, Three additional change-control environments
  • Install, configure, administer, patch, and upgrade IWS components: Master Domain Manager (MDM), Dynamic Agents, Dynamic Pools, Dynamic Workload Console (DWC)
  • Work closely with Product Owners and communicate workstreams through Jira
  • Manage job promotions using a Workload Application Template-based process
  • Perform safety and stability assessments for all job promotions
  • Manage change control across four separate environments
  • Enforce change management standards, policies, and governance
  • Maintain and continuously improve Production platform uptime target of 99.17% per month
  • Follow SOPs, DevOps practices, and disciplined change-control processes
  • Coordinate platform-impacting communications to a user community of approximately 500 developers and data engineers
  • Resolve: Complex job failures, Performance bottlenecks, Agent-related issues, Infrastructure-related issues
  • Provide guidance on complex job scheduling designs to less experienced team members
  • Monitor scheduler platform health and performance
  • Manage database maintenance activities
  • Perform backup, disaster recovery, and monthly failover testing
  • Define and maintain: Security policies, User authorizations, Authentication for Dynamic Workload Console (DWC)
  • Respond to: Cybersecurity vulnerability assessments, PCI compliance audits, Other regulatory audit requests
  • Design and implement Ansible-based automation solutions
  • Develop self-healing mechanisms to reduce unplanned outages
  • Coordinate with offshore teams performing SOP activities during non-business hours
  • Develop automation scripts using: Python, IWS REST APIs
  • Strong experience with IBM Workload Scheduler architecture, especially Dynamic Workload Broker, V10.1+, high availability of MDM's managing Fault Tolerant Agent and Dynamic Agent agent architectures
  • Strong conceptual understanding of Master Domain Manager (MDM), Backup MDM (BMDM), Dynamic Workload Console (DWC), Fault Tolerant Agent (FTA), Dynamic Agent (DA)
  • Strong grasp of conman CLI to monitor and control production plan, check job/job stream/resource status
  • Strong grasp of composer CLI to define, modify and extract scheduling objects
  • Strong grasp of planman CLI to control pre-production plan and GUI mirroring
  • Strong grasp of lifecycle of daily production planning process, phases of JNextplan/FINAL
  • Proficiency in navigating the DWC web-based GUI to monitor workloads, manage user access security, and define scheduling objects
  • Experience installing IWS components, applying Fix Packs, and Interim Fixes
  • Troubleshooting with logs under TWSDATA/stdlist, adjusting trace level for netman, batchman, writer, mailman, etc
  • Strong experience with IBM WebSphere Liberty
  • Strong grasp of reading messages.log, traces.log, FFDC logs
  • Strong grasp of configuring JVM heap sizes
  • Strong grasp of configuring tracing scope, tracing levels, tracing retention
  • Strong experience with Red Hat Enterprise Linux 8+
  • Deep familiarity with bash/shell commands for text processing (for example, grep, awk, sed), file manipulation, and system navigation
  • Ability to manage, start, stop, and troubleshoot SystemD services using systemctl and journalctl for IWS agents and MDM
  • Managing user accounts, groups, service accounts and deep knowledge of Linux file permissions (chmod, chown, ACL on local filesystems and NFS)
  • Ability to monitor system performance using tools like top, htop, vmstat, iostat, and sar to troubleshoot bottlenecks and platform unresponsiveness
  • Understanding of Logical Volume Manager (LVM) and filesystem usage
  • Checking TCP port availability, firewall rules (firewalld/iptables), and connectivity between MDM and Dynamic Agents using netstat, ss, ping, curl, etc
  • Managing SSL/TLS certificates, private keystores, public truststores, and working with Certificate Authority
  • Strong experience with scripting (Bash Shell, Python, etc.) for automation
  • Understanding of networking principles
  • Understanding of basic Oracle database administration, enough to troubleshoot with DBA's to prove when an issue is in Oracle
  • Understanding of basic SQL to query job metadata
  • Understanding of checking database connectivity
  • Understanding of AWS cloud infrastructure
  • Experience with using secrets manager (CyberArk PPM, Hashicorp Vault, or similar)

Company Overview

  • Kastech Software Solutions Group, incorporated in 2007 and headquartered in Richmond, Texas, is a leading global IT services and consulting company delivering technology-driven solutions to organizations across industries. It was founded in 2008, and is headquartered in Houston, Texas, USA, with a workforce of 1001-5000 employees. Its website is https://www.kastechssg.com.
  • Company H1B Sponsorship

  • Kastech Software Solutions Group has a track record of offering H1B sponsorships, with 13 in 2026, 94 in 2025, 65 in 2024, 101 in 2023, 124 in 2022, 171 in 2021, 119 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    You might like

    [Remote] Task Order Project Manager

    Work from home Full-time role

    [Remote] Product Sales Director

    Work from home Full-time role

    [Remote] Sr. Full Stack Engineer

    Work from home Full-time role

    [Remote] PeopleSoft Administrator

    Work from home Full-time role

    [Remote] Staff Analytics Engineer

    Work from home Full-time role

    [Remote] Data Analyst IV- #26-14118

    Work from home Full-time role

    [Remote] SEO Account Director

    Work from home Full-time role

    [Remote] SEO Account Director

    Work from home Full-time role

    [Remote] RATE ANALYST

    Work from home Full-time role

    [Remote] Director of Marketing (SF/LA/NYC)

    Work from home Full-time role

    Medical Record/Data Entry Clerk/Project Support

    Work from home Full-time role

    VP, Product

    Work from home Full-time role

    Experienced IT Business Analyst – Remote Data Entry and IT Administration Expert for arenaflex

    Work from home Full-time role

    Experienced Full Stack Software Engineer – Web & Cloud Application Development

    Work from home Full-time role

    Your Remote Runway to a New Career: Delta Airline Customer Support Representative

    Work from home Full-time role

    Home Health Facility Licensure & Regulatory Compliance Analyst

    Work from home Full-time role

    Key-Account-Manager (m/w/d) | 100 % Remote

    Work from home Full-time role

    (Live Chat Remote Jobs) Netflix Remote Jobs Customer service - Apply Now

    Work from home Full-time role

    Experienced Remote Customer Service Agent – Aviation Industry Expertise

    Work from home Full-time role

    Part-Time Veterinarian - Pittsburgh, PA

    Work from home Full-time role