Careers

Most job boards are uninspiring. Ours is not. Here, you’ll find some fantastic opportunities to work in some of the best startups in the UK led by diverse founding teams. Don’t take our word for it, take a look below! You can filter the available roles by title, company or location and don’t forget to sign up for our newsletter to get periodic updates on new career opportunities as and when they pop up!

All the best with your applications!

LLM Reliability Engineer

Passionfruit

Software Engineering, Data Science

Remote

Posted on May 21, 2026

Apply now

About Passionfruit

Passionfruit is reshaping how marketing teams work. We started as a marketplace connecting brands with specialist marketers, and we're now building PIP, an AI-native platform that's changing how marketing operations actually get done.

We're not building generic AI tools and hoping marketers find them useful. We're building with marketing teams, because that's the only way to make something that genuinely works.

About PIP

PIP is our AI-native workspace for modern marketing operations. It centralises context, insights, and workflows into one intelligent platform - helping teams analyse performance, automate reporting, and extract meaningful answers from their data in minutes rather than days.

We ship fast, iterate based on real feedback, and scale deliberately. As we onboard larger enterprise customers and deepen LLM-driven workflows, reliability is the single biggest lever for trust and retention.

The Role

We're hiring an LLM Reliability Engineer to own the reliability layer of PIP - ensuring our AI systems consistently deliver useful, trustworthy, and production-ready experiences for users.

This role is focused on a newer and increasingly important challenge: understanding whether the AI is actually helping users achieve what they need, identifying where trust breaks down, and improving reliability before issues become churn.

You'll sit close to real user sessions, LLM traces, and production workflows - spotting silent failures, inconsistent outputs, frustrating responses, and workflows that stall. You'll work across observability, evaluations, and product feedback loops to improve the quality and resilience of our AI systems over time.

You'll be the person who notices that our top user this week was also our angriest user, and does something about it.

What You'll Do

Monitor LLM analytics, error tracking, and session replays (PostHog, Arize, AppSignal, Langfuse) to spot user frustration and silent failures before they're reported
Set up and tune sentiment analysis, error-clustering, anomaly alerts, and reliability dashboards so the team gets early signals on quality issues rather than learning about them weeks later
Run structured evaluations on AI outputs - consistency, accuracy, usefulness, hallucination rates, and task completion - across our agent library and core workflows
Build and maintain eval datasets and regression testing systems for prompts, retrieval pipelines, and agent behaviours
Partner with product and engineering to translate observed user friction into concrete prompt improvements, orchestration fixes, retrieval improvements, or product changes
Investigate edge cases and production failures across LLM workflows, identifying root causes and reliability gaps
Own a weekly view of platform reliability: what broke, what frustrated users, what trends are worsening, and what's actually been fixed

What We're Looking For

4+ years in QA, Reliability Engineering, AI Operations, or a similar quality-focused role
A genuine instinct for edge cases - you find the things others didn't think to test
Comfort working with LLM-powered products and a strong appetite to go deeper on evals, tracing, observability, and reliability tooling
Familiarity with concepts like hallucination detection, prompt regression testing, response evaluation, and agent reliability
Strong attention to detail paired with the conciseness to communicate issues clearly to engineers
Ability to work independently - you'll help define what reliability means here - while collaborating closely with engineering, product, and CS
A product-focused mindset: you care about whether users trust and successfully use the system, not just whether outputs technically succeed
Comfortable in a fast-moving environment where requirements evolve weekly

Nice to Have

Hands-on experience with PostHog, Arize, Langfuse, LangSmith, Helicone, or similar LLM observability tools
Familiarity with prompt engineering, RAG systems, or agent orchestration frameworks
Prior software development experience (a plus, not mandatory)
Exposure to Elixir/LiveView or similar
Background working with data platforms, analytics tools, or AI-native products

Working Hours and Location

Location: London, Chancery Lane (3 days per week in-office)
Hours: 9–6pm

For more information or to apply to this role please contact jess@usepassionfruit.com

Apply now

See more open positions at Passionfruit