See all roles

Senior SQA Engineer (LLM) Remote Pakistan

Work from home Full-time role Hiring

Design and own the end-to-end QA strategy for the Conversational Banking Platform, covering functional, regression, performance, security, and AI-specific evaluation. Build and maintain golden datasets, eval suites, and LLM-as-judge frameworks to validate conversational quality across intents, languages, and tenants. Define the tenant onboarding QA gate, the certification checklist every new business unit must pass before going live. Establish regression strategies for prompt changes, model upgrades, retrieval index updates, and guardrail policy changes. Use Langfuse traces to drive evaluation: mine production failures, convert them into test cases, and close the loop with engineering. Test NeMo Guardrails configurations against jailbreaks, prompt injection, off-topic drift, and false-positive over-blocking. Validate governance and compliance behaviors: data residency, PII handling, regulated-product disclosures, and off-limits topics. Build automated test harnesses for Spring AI services, including tool-calling validation, RAG groundedness, and integration with Cosmos DB and MongoDB data layers. Partner with the Platform team on quality metrics, SLOs, and the platform eval scorecard. Coach feature engineers and tenant teams on writing their own evals, making platform-grade quality self-service over time. Tech Stack To Work AI/Application: Spring AI, Java/Spring Boot Data: Cosmos DB (vector and operational), MongoDB Observability and evaluation: Langfuse Governance and safety: NVIDIA NeMo Guardrails CI/CD: standard enterprise pipelines with automated quality gates Required Experience and Skills 6+ years in software QA, with at least 1–2 years testing LLM-based, RAG, or conversational AI systems in production. Hands-on experience with LLM observability and evaluation tools such as Langfuse, LangSmith, Arize, or Phoenix. Working knowledge of eval frameworks such as Ragas, DeepEval, Promptfoo, or TruLens — including metrics like faithfulness, groundedness, answer relevance, and context precision. Practical understanding of how to test non-deterministic systems: golden datasets, semantic similarity, LLM-as-judge, and statistical regression detection. Experience testing guardrail or policy frameworks (NeMo Guardrails, Guardrails AI, or similar). Solid foundation in API testing, automation frameworks (e.g., pytest, JUnit, Karate, RestAssured), and CI/CD integration. Familiarity with Spring and Spring Boot applications and JVM-based services. Comfortable writing queries against NoSQL stores (MongoDB, Cosmos DB) for test data setup and trace inspection. Strong written communication : able to produce clear test plans, defect reports, and tenant readiness assessments. Good to Have Experience in banking, financial services, or another regulated industry. Exposure to multi-tenant platforms: understanding how shared infrastructure changes the testing problem. Familiarity with red-teaming, adversarial prompt testing, and prompt injection defense. Working knowledge of vector databases, embedding models, and retrieval evaluation. Experience with multi-language conversational systems. Performance and load testing experience for AI workloads (token throughput, latency percentiles, cost per conversation). Contributions to open-source eval or AI testing tooling. Experience working with compliance, risk, or audit teams on AI assurance. Apply To This Job

You might like

Kundenservice Mitarbeiter mit Kfz - Background (d/m/w) - fully remote, deutschlandweit

Work from home Full-time role

Customer Service & Sales Representative - Italian - Lisbon

Work from home Full-time role

Stage – Controllo di Gestione (Corporate Internal Function)

Work from home Full-time role

Customer Service & Sales Representative - German - Lisbon

Work from home Full-time role

General Application — Future Opportunities at ng-voice (m/f/d)

Work from home Full-time role

Sustainable Finance Analytics & Strategy intern - HOME-BASED

Work from home Full-time role

Inside Sales Representative (Indian Market)

Work from home Full-time role

Software Development Eng.

Work from home Full-time role

Sustainable Finance Research intern - HOME-BASED

Work from home Full-time role

Sustainable Finance Research intern - HOME-BASED

Work from home Full-time role

Experienced Data Entry Operator (Remote) - High-Speed Typing and Secure Data Management

Work from home Full-time role

Experienced Full Stack Customer Support Representative – Online Remote Jobs (Full Time at arenaflex)

Work from home Full-time role

Career Opportunities: Human Resources Director - Yosemite Region - Aramark Destinations (640593)

Work from home Full-time role

Experienced Customer Service Specialist - Evening Support for arenaflex's Scheduling Department

Work from home Full-time role

Lead/Principal Product Designer

Work from home Full-time role

Experienced Remote Customer Service Specialist – Delivering Exceptional Arenaflex Experiences

Work from home Full-time role

Sales Representative - (Safety & Rescue Equipment) Illinois

Work from home Full-time role

Teacher- English

Work from home Full-time role

HR Coordinator (entry level), Contact Center-Hybrid Schedule

Work from home Full-time role

Experienced Data Entry Clerk – Administrative Support for arenaflex's QIC DME Team

Work from home Full-time role