LLM Reliability Engineer

Passionfruit

Passionfruit

Software Engineering, Data Science

Remote

Posted on May 21, 2026

About Passionfruit

Passionfruit is reshaping how marketing teams work. We started as a marketplace connecting brands with specialist marketers, and we're now building PIP, an AI-native platform that's changing how marketing operations actually get done.

We're not building generic AI tools and hoping marketers find them useful. We're building with marketing teams, because that's the only way to make something that genuinely works.

About PIP

PIP is our AI-native workspace for modern marketing operations. It centralises context, insights, and workflows into one intelligent platform - helping teams analyse performance, automate reporting, and extract meaningful answers from their data in minutes rather than days.

We ship fast, iterate based on real feedback, and scale deliberately. As we onboard larger enterprise customers and deepen LLM-driven workflows, reliability is the single biggest lever for trust and retention.

The Role

We're hiring an LLM Reliability Engineer to own the reliability layer of PIP - ensuring our AI systems consistently deliver useful, trustworthy, and production-ready experiences for users.

This role is focused on a newer and increasingly important challenge: understanding whether the AI is actually helping users achieve what they need, identifying where trust breaks down, and improving reliability before issues become churn.

You'll sit close to real user sessions, LLM traces, and production workflows - spotting silent failures, inconsistent outputs, frustrating responses, and workflows that stall. You'll work across observability, evaluations, and product feedback loops to improve the quality and resilience of our AI systems over time.

You'll be the person who notices that our top user this week was also our angriest user, and does something about it.

What You'll Do

  • Monitor LLM analytics, error tracking, and session replays (PostHog, Arize, AppSignal, Langfuse) to spot user frustration and silent failures before they're reported
  • Set up and tune sentiment analysis, error-clustering, anomaly alerts, and reliability dashboards so the team gets early signals on quality issues rather than learning about them weeks later
  • Run structured evaluations on AI outputs - consistency, accuracy, usefulness, hallucination rates, and task completion - across our agent library and core workflows
  • Build and maintain eval datasets and regression testing systems for prompts, retrieval pipelines, and agent behaviours
  • Partner with product and engineering to translate observed user friction into concrete prompt improvements, orchestration fixes, retrieval improvements, or product changes
  • Investigate edge cases and production failures across LLM workflows, identifying root causes and reliability gaps
  • Own a weekly view of platform reliability: what broke, what frustrated users, what trends are worsening, and what's actually been fixed

What We're Looking For

  • 4+ years in QA, Reliability Engineering, AI Operations, or a similar quality-focused role
  • A genuine instinct for edge cases - you find the things others didn't think to test
  • Comfort working with LLM-powered products and a strong appetite to go deeper on evals, tracing, observability, and reliability tooling
  • Familiarity with concepts like hallucination detection, prompt regression testing, response evaluation, and agent reliability
  • Strong attention to detail paired with the conciseness to communicate issues clearly to engineers
  • Ability to work independently - you'll help define what reliability means here - while collaborating closely with engineering, product, and CS
  • A product-focused mindset: you care about whether users trust and successfully use the system, not just whether outputs technically succeed
  • Comfortable in a fast-moving environment where requirements evolve weekly

Nice to Have

  • Hands-on experience with PostHog, Arize, Langfuse, LangSmith, Helicone, or similar LLM observability tools
  • Familiarity with prompt engineering, RAG systems, or agent orchestration frameworks
  • Prior software development experience (a plus, not mandatory)
  • Exposure to Elixir/LiveView or similar
  • Background working with data platforms, analytics tools, or AI-native products

Working Hours and Location

  • Location: London, Chancery Lane (3 days per week in-office)
  • Hours: 9–6pm

For more information or to apply to this role please contact jess@usepassionfruit.com