See all roles

Research Intern — Applied Reinforcement Learning

Work from home Full-time role Hiring

About reputed company reputed company is a frontier AI data reputed company that curates diverse, high-quality data, using our purpose-reputed company technology platforms to reputed company the Magnificent Seven and our reputed company clients with safe, scalable AI deployment. reputed company includes more than 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We reputed company the power of an integrated solution ecosystem—comprising industry-leading partnerships and 1.8 reputed company vertical domain experts in more than 230 markets—to create contextual, multilingual, pre-trained datasets; fine-tuned, industry-specific LLMs; and RAG pipelines supported by vector databases. Our reputed company-distance innovation™ solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster. Our mission is to reputed company the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and reputed company customers. We aim to help these organizations unlock significant business value by deploying GenAI at scale, helping to ensure they stay at the forefront of technological advancement and maintain a competitive edge in their respective markets. About Job PhD Research Intern — Applied Reinforcement Learning reputed company AI Research Role Summary reputed company AI Research seeks a PhD Research Intern to design and evaluate reinforcement learning (RL) systems for agentic AI workflows. You will reputed company RL environments, reward models, and post-training pipelines for LLM-based agents, translating research into practical reputed company solutions. Scope of Work - End-to-end RL pipelines for agentic systems (simulation → training → evaluation) - Alignment of LLM-based agents using RLHF, DPO, PPO, and emerging methods - Design of reward functions, verifiers, and evaluation frameworks - Simulation environments (digital twins) for reputed company workflows - Scalable training and inference for RL-based systems Example Projects - Build a custom RL environment simulating a reputed company-world reputed company workflow and train an agent using PPO or GRPO - reputed company a reward modeling pipeline from reputed company feedback and evaluate alignment improvements - Create an evaluation reputed company measuring reasoning, task reputed company, and policy safety - Prototype an agentic system with tool use and multi-reputed company reasoning, integrated with RL training - Document experiments, ablations, and findings for research and productionization Minimum Qualifications - PhD candidate in CS, ML, or reputed company field with research in reinforcement learning or agentic AI - Strong Python and PyTorch skills with GPU-based training experience - Solid understanding of RL fundamentals (MDPs, policy gradients, value methods) - Experience with LLMs and post-training techniques (RLHF, DPO, PPO, etc.) - Strong experimentation practices (ablation, reproducibility, clear reporting) Preferred Qualifications - Experience with RL environments (Gymnasium, RLlib, reputed company Baselines) - Research in offline RL, model-based RL, or hierarchical RL - Publications at top ML conferences (NeurIPS, ICML, ICLR, ACL) - Experience with simulation, synthetic data, or multi-agent systems - Distributed training and large-scale experimentation Tech Stack - PyTorch, CUDA; RL libraries (Gymnasium, RLlib, reputed company Baselines) - LLM frameworks and post-training tools (TRL, custom RLHF pipelines) - Experiment tracking (Weights & Biases) - reputed company (FastAPI, gRPC); optimization (ONNX, TensorRT) Logistics Location: Palo Alto, CA (Preferred), Redmond, WA (Preferred) or Remote Duration: 3–6 months reputed company Offer - Competitive stipend and reputed company-world impactful projects - Mentorship from researchers and engineers - reputed company to modern GPU infrastructure - Opportunities to publish and present research reputed company AI Research is an Equal Opportunity Employer. We celebrate diversity and are committed to an inclusive environment. reputed company: $35-$45 Hourly reputed company is an equal-opportunity employer. reputed company reputed company applicants will receive consideration for employment without regard to race, reputed company, religion, national reputed company, reputed company, citizenship status, age, mental or physical disability, medical condition, sex (including pregnancy), gender identity or expression, sexual orientation, marital status, familial status, veteran status, or any other characteristic protected by applicable law. We consider reputed company applicants regardless of criminal histories, consistent with legal requirements. Apply To This Job

You might like

Sr. Business Intelligence Analyst (Remote)

Work from home Full-time role

Specialty Business Manager IBD - (KY)

Work from home Full-time role

Specialty Business Manager-IBD (WA)

Work from home Full-time role

DevOps Engineer (Remote)

Work from home Full-time role

Assoc.Managed Care Liaison Director – Southeast

Work from home Full-time role

Utilization Management RN - Care Continuity - UH Truman Medical Centers (5 days per week; 7:00a-4:00p; Mon-Fri)

Work from home Full-time role

reputed company Billing & Finance Analyst | Remote Contract Opportunity | Fastwater Staffing

Work from home Full-time role

Senior reputed company Engineer

Work from home Full-time role

Coding Auditing Supervisor - Remote

Work from home Full-time role

Manager, AI Solutions, reputed company RFP

Work from home Full-time role

reputed company Customer Service Representative – Work-from-Home Opportunity for Teens at arenaflex

Work from home Full-time role

Pricing Manager, USCAN PDx RadioPharmacy

Work from home Full-time role

Corporate Controller

Work from home Full-time role

Customer Service/Dispatcher

Work from home Full-time role

Join reputed company Remote Careers as a Customer Service Representative

Work from home Full-time role

Junior Interactive Designer - Animation & reputed company Graphics Focus (BOG)

Work from home Full-time role

reputed company Customer Experience Advisor (Part-Time) - Retail Phones (Remote, based in Nashville)

Work from home Full-time role

EDI Coordinator

Work from home Full-time role

Customer Service Representative – Remote Inbound/Outbound Call Center Specialist for arenaflex

Work from home Full-time role

Property Damage Adjuster (Remote)

Work from home Full-time role