See all roles

Software Architect, Agent Evaluation & Core Framework

Work from home Full-time role Hiring

About Datagrid Forget everything you know about AI assistants. At Datagrid, we’re building AI agents that actually do the work. We’re a team of passionate, hard-working builders, thinkers, and problem-solvers who are genuinely excited about what we do. Our mission is to supercharge the workday by turning complex data and tedious workflows into simple, automated actions. It’s an incredibly exciting time to join us—we’re growing fast, expanding our platform’s capabilities, and partnering with enterprise customers who want to 10x their teams’ output. We thrive on collaboration and are looking for people who are ready to make a tangible impact. If you want to be part of a team that’s not just talking about the future of AI but actively creating it, you’ve come to the right place. Our Values At Datagrid, our values guide how we work, build, and grow together. Act with Purpose: Everything we do is tied to our mission. You’ll see the impact of your work as we move quickly to solve meaningful problems for our customers. Own the Outcome: We believe in true ownership. You’ll take responsibility for your projects and see them through to success—empowered to make decisions that drive real results. Clarity without Ego: We value honesty, transparency, and trust. You can expect and provide direct feedback in an environment where candor sharpens our ideas and strengthens our team. Creativity with Purpose: Innovation is central to our culture. Your creative thinking will be valued and directed toward solving real-world challenges and creating lasting impact. About the role Datagrid Agents operate where our customers work-across Teams, Slack, and even SMS. Agents make multistep plans, leverage vectorized data from 100+ sources, use tools like Docusign, and manipulate the Datagrid app Software Architect, Agent Evaluation & Core Framework, is crucial because we cannot manually test the vast array of agent interactions and capabilities. You will own and drive extending our evaluation harness to provide actionable reports on agent regressions and improvements, directly impacting strategic direction and customer experience. A key part of this will be incorporating the best open-source benchmarks into our evaluation set, and figuring out how to Agentically generate evaluations that are representative of customer use cases. As you become established, you will also have the opportunity to make fundamental changes to the Core Framework to improve the way Agents reason, use tools, and collaborate with humans. What you'll do:

  • Work closely with an Ex Googler who built Gemini evals to create a harness for evaluating Agent performance , make that harness available both for local development an CI/CD pipeline, and set up alerts when Agents misbehave.
  • Influence and contribute to the extension of Datagrid’s Agentic capabilities.
  • Choose the best open/closed source components to build out the testing infra.
  • Integrate publicly available benchmarks such as RAGBench into the testing system.
  • Grant subject matter experts the ability to add to the test library using customer queries, manually authored cases, and synthetically generated questions.
  • Expose evaluation performance via alerts and dashboards

What we're looking for:

  • Proven track record of building test harnesses for Chat Agents from 0 ⇒ 1.
  • 10+ years of B2B software engineering experience.
  • Ability to write effective LLM prompts without assistance.
  • Proficiency with nodejs and server side frameworks such as NestJS or NextJS.
  • Familiarity with JavaScript frameworks such as React, Angular JS.
  • Experience with databases such as Weaviate and BigQuery.
  • Experience working with GCP or similar cloud providers.

Who we're looking for:

  • Experience with any LLM evaluation platform (Galileo, Arize, LangSmith Orq)
  • Background in B2B SaaS automation tools
  • Contributions to open-source AI projects or published research
  • Familiarity with prompt engineering or model evaluation

Pay Range and Benefits

  • Salary Range: $200,000 - $240,000
  • Generous equity compensation
  • Flexible vacation/time-off policy
  • All U.S. federal holidays observed, plus an additional company-wide Week of Rest in December
  • Competitive benefits package - 100% premium coverage for employees and generous coverage for dependents
  • Work-from-home stipend to support your ideal setup
  • 401(k) plan

The base pay range target for the role seniority described in this job description is between $200,000 - $240,000. Final offer amounts depend on multiple factors such as candidate experience and expertise, geographic location, total compensation, and market data. In addition to cash pay, full-time regular positions are eligible for equity, 401(k), health benefits, and other benefits; some of these benefits may be available for part-time or temporary positions. Apply tot his job Apply To this Job

You might like

Software Engineer Consultant

Work from home Full-time role

Software Engineering Manager – Identity & Directory Services (remote)

Work from home Full-time role

Manager, Software Manager (Barcelona)

Work from home Full-time role

Solution Architect Wright Patterson AFB, OH

Work from home Full-time role

Southwest Airlines Remote Positions $27/Hour – ...

Work from home Full-time role

Southwest Airlines Careers Remote $28/Hour

Work from home Full-time role

Southwest Airlines Remote Work From Home Jobs - No Experience

Work from home Full-time role

Southwest Airlines Remote Jobs $35/Hour

Work from home Full-time role

Southwest Airlines Customer Service Remote Jobs $18/Hr

Work from home Full-time role

Southwest Airline Remote Position $27/Hour

Work from home Full-time role

Hybrid Remote Customer Support Analyst - Columbia, SC: Exceptional Career Opportunity in a Dynamic and Supportive Team Environment

Work from home Full-time role

Field Sales Representative (Zagreb)

Work from home Full-time role

Population Health Clinical Pharmacist (IL)

Work from home Full-time role

Beauty Team Leader - Lincoln, NE

Work from home Full-time role

Enterprise Customer Success Manager – Driving Growth and Retention for arenaflex

Work from home Full-time role

Customer Care Specialist (Local 57) Carbon/Emery County - REMOTE WORK Sign-On Bonus $200 - #112709

Work from home Full-time role

Experienced Data Entry Operator – Work from Home Opportunity with arenaflex

Work from home Full-time role

Senior Data Engineer & Cloud (AWS)

Work from home Full-time role

Experienced Full Stack Customer Service Representative – Remote Chat Support for arenaflex

Work from home Full-time role

Experienced Part-Time Junior Data Entry Operator – Remote Opportunity with arenaflex

Work from home Full-time role