GPU Kernel Developer – AI/ML

Work from home Full-time role Hiring

Role Summary We are seeking expert-level GPU Software Engineers to support a high-visibility platform initiative within the Maya program, focused on building software tooling on top of a custom compiler and SDK. The role involves developing, optimizing, and porting GPU kernels and AI workloads to a specialized hardware platform. This is a critical and time-sensitive engagement with immediate onboarding expectations and long-term roadmap alignment (~18 months).

Key Responsibilities

Develop GPU kernels for specialized hardware platforms using PyTorch/Triton frameworks
Build software solutions leveraging custom compiler and SDK capabilities
Design and implement kernel-level optimizations to control hardware execution behavior
Port open-source AI/ML models to custom SDK environments
Port and adapt high-performance computing benchmarks and stress workloads such as:
Linpack (High Performance Linpack)
BERT/benchmark-style workloads (referred as “Babu bench”)
• Develop stress testing and validation workloads aligned to hardware behaviour and platform validation
• Support testing and stress testing of current and next-generation hardware platforms
• Collaborate closely with platform architects and compiler teams to enhance system capabilities

Core Technical Skills (Must-Have) Programming & Frameworks

Python
C/C++ (systems-level programming)
PyTorch
Triton (Triton language / kernel development)

GPU & Systems Expertise

GPU kernel development (mandatory and critical)
Strong understanding of GPU architecture and compute optimization
Experience with compiler-based optimizations / runtime execution layers
Experience with custom SDKs or hardware abstraction layers

Performance & Workloads

Experience in:
GEMM kernel development (matrix multiplication kernels)
Porting ML models to new hardware platforms
Performance tuning and stress testing at system level

Nice-to-Have

Skills

Experience working with custom silicon / hardware platforms
Exposure to high-performance computing (HPC) workloads
Familiarity with:
Linpack benchmarks
AI workload benchmarking tools
• Experience in compiler optimization ecosystems

Engagement Model & Structure

Number of roles: 3 developers (initial hiring may start with 2)
Location flexibility:
Onsite / Offshore / Hybrid mix allowed
• Timeline:
Immediate start required
• Duration:
~18 months program duration with phased platform evolution

Key Differentiators (Critical Expectation)

This is NOT a DevOps / support / debugging role
Requires deep hands-on engineering expertise in:
Kernel programming
GPU workloads
ML framework internals
• Candidates must demonstrate build-level competence, not just theoretical knowledge

Apply tot his job Apply To this Job

Apply

GPU Kernel Developer – AI/ML

Key Responsibilities

Nice-to-Have

You might like

CICS Systems Programmer

Senior Cross-Platform Game Software Programmer (Remote)

Lead Gameplay Programmer

z/OS Systems Programmer (Part-Time or Full-Time)

Python/R Programmer

Software Engineer (Founding Team / 0–3 Years Experience)

Remote Coding Jobs – Evening Projects – Entry-Level Developers Welcome

Senior AV Programmer

Sr. SAS Programmer

Journeyman Computer Programmers

Inside Sales Representative – National Park Gifts & Souvenirs (Remote, Colorado)

Hardware Maintenance Project Manager (Contract)

Patient Care Associate 24 hour Evening Shift

Technology Service Desk Agent 1

Clary Cooking Image Collector -Romania

Experienced Full Stack Data Entry Specialist – Remote Operations Support for arenaflex

Sr. Manager, Software

Experienced Customer Service Representative – Insurance Policyholder Support and Billing

[Remote] Senior Back-end WordPress/PHP Developer

Experienced Customer Service Representative – Central Reservations – Remote Opportunity