Leftover state is one of the easiest ways to make a Playwright suite look flaky when the real problem is test data drift. A test passes locally, then fails in CI because the user already exists, the cart still has items, the record was soft-deleted, or the background job has not finished cleaning up the last run. When that happens repeatedly, engineers start rerunning builds instead of fixing the real issue.

The fix is not just “delete the data at the end.” Real Playwright test data cleanup needs a plan for seeding, isolation, teardown, and failure recovery. If you build those patterns into your suite, you get repeatable CI runs, smaller debugging loops, and far fewer tests that depend on hidden state from a previous scenario.

What repeatable actually means in test automation

Repeatability in browser testing is more than “the same test passed twice.” In practice, it means:

  • A test can run in any order without changing another test’s outcome.
  • A test can retry without leaving junk behind.
  • A full suite can run on a fresh environment or an already-used environment and still behave predictably.
  • Failures are traceable to the app or test logic, not to leftover fixtures.

That lines up with the broader goals of test automation and continuous integration: fast feedback, low manual intervention, and stable signal. Browser tests are especially sensitive because they often cross UI, API, database, cache, and async job boundaries.

If a test depends on “whatever state the app happened to have,” it is not really isolated. It is borrowing confidence from previous runs.

Start by classifying the data your tests touch

Before writing cleanup code, separate your test data into a few buckets.

1. Ephemeral UI data

These are records created only for the current test, such as a draft order, temporary project, uploaded file, or invitation.

Typical cleanup pattern: delete the record through API or database teardown after the test.

2. Shared seeded data

This is baseline data that multiple tests can rely on, such as a default organization, a standard admin account, or a known product catalog.

Typical cleanup pattern: do not mutate it directly. Reset it to a known seed between runs or clone it per test.

3. Environment data

This includes feature flags, email inboxes, rate limits, third-party sandbox state, and jobs already in queues.

Typical cleanup pattern: clear or namespace it before the suite starts, and verify the environment is actually clean.

4. External side effects

Think emails, webhooks, payment intents, object storage uploads, search index entries, and analytics events.

Typical cleanup pattern: isolate by namespace, use disposable sandboxes, or provide explicit delete endpoints.

Once you know which category you are dealing with, the cleanup strategy becomes much clearer.

Prefer isolation over cleanup whenever you can

Cleanup is a safety net. Isolation is the primary defense.

The easiest class of flaky tests to fix is the one where each test gets its own unique identifiers. For example, use a unique email, project name, or tenant ID on every run. That avoids collisions with earlier tests and makes parallel execution safer.

A simple convention looks like this:

import { test, expect } from '@playwright/test';

function uniqueEmail() { return qa+${Date.now()}-${Math.random().toString(16).slice(2)}@example.com; }

test('creates a new account', async ({ page }) => {
  const email = uniqueEmail();
  await page.goto('/signup');
  await page.getByLabel('Email').fill(email);
  await page.getByLabel('Password').fill('TempPass123!');
  await page.getByRole('button', { name: 'Create account' }).click();
  await expect(page.getByText('Welcome')).toBeVisible();
});

This does not solve everything, but it prevents the classic “already exists” failure and gives every test a clear ownership boundary.

Good isolation patterns

  • Unique IDs, emails, and resource names per test
  • Per-test browser context storage state
  • Per-worker tenants, accounts, or namespaces
  • Ephemeral databases or schema-per-worker setups
  • Sandboxed third-party accounts for integration-heavy flows

When isolation is not enough

You still need cleanup if tests create expensive or quota-limited resources, if a backend process keeps running after the test ends, or if data lives longer than the test process.

Seed test data in layers, not all at once

A lot of suites fail because seeding is either too thin or too broad. A thin seed leaves tests to build too much state themselves. A broad seed makes every environment setup slow and brittle.

A better approach is layered seeding:

  1. Base seed for stable reference data, like roles, plans, permissions, or catalog entries.
  2. Scenario seed for a specific suite or test group, like an organization with three users and two open tickets.
  3. Test-local seed for the exact data created by one test.

This keeps your seed files smaller and easier to reason about. It also reduces the chance that one test depends on a giant fixture nobody fully understands.

Seed through APIs when possible

If your app already has admin or internal APIs for creating entities, use them instead of clicking through the UI for setup. UI setup is valuable for end-to-end coverage, but it is a poor choice for expensive preconditions.

A common pattern is to create state via API, then validate via UI:

import { test, expect } from '@playwright/test';
test('shows an invited member in the team list', async ({ page, request }) => {
  const response = await request.post('/api/test-support/teams', {
    data: { name: 'QA Team', members: ['sam@example.com'] }
  });
  expect(response.ok()).toBeTruthy();

await page.goto(‘/teams/qa-team’); await expect(page.getByText(‘sam@example.com’)).toBeVisible(); });

That pattern gives you fast setup and realistic UI coverage without making your suite depend on fragile, slow prep steps.

Build cleanup into the test shape, not just the after hook

Many Playwright suites rely on afterEach or afterAll and assume that is enough. It helps, but it is not a complete strategy.

Use try/finally for critical cleanup

If a test creates data that could break later tests, clean it up in a finally block. This is especially useful for long scenario tests where multiple steps can fail.

import { test, expect } from '@playwright/test';
test('creates and removes a project', async ({ page, request }) => {
  let projectId: string | undefined;

try { const res = await request.post(‘/api/projects’, { data: { name: qa-project-${Date.now()} } }); projectId = (await res.json()).id;

await page.goto(`/projects/${projectId}`);
await expect(page.getByRole('heading', { name: /project/i })).toBeVisible();   } finally {
if (projectId) {
  await request.delete(`/api/projects/${projectId}`);
}   } });

This is boring code, but boring is good. Cleanup should be predictable.

Make teardown idempotent

Your delete logic should tolerate already-deleted records, missing records, and partially-created data. If DELETE /api/projects/:id returns 404 because the test already removed it, that should not fail the suite.

That idempotence matters in retries, parallel runs, and interrupted CI jobs.

Keep teardown close to setup

If a helper creates data, it should usually return enough information to clean it up. Avoid “global cleanup scripts” that have to guess what the test created. Guessing is how important records get deleted by accident, or never deleted at all.

Use namespaces for anything shared across runs

Some data is intentionally shared, especially in staging-like environments. In that case, isolate by namespace instead of pretending the environment is disposable.

Good namespace choices:

  • runId from CI
  • branch name plus build number
  • worker index for parallel runs
  • a UUID stored on the tenant or organization record

Examples:

  • qa-ci-10482
  • feature-login-fix-18
  • worker-3

Then filter everything through that namespace, including database rows, email addresses, uploaded files, object storage paths, and webhook payloads.

If your suite can run in parallel, every shared resource should answer one question quickly: “Which run owns this?”

Handle retry-safe cleanup for flaky CI jobs

Retries are useful, but they can also hide state problems. If a failed test leaves a resource behind, a rerun might fail in a different place, or worse, pass and leave the underlying issue unresolved.

A retry-safe suite should do three things:

  1. Recreate its own preconditions on every attempt.
  2. Treat missing cleanup targets as normal.
  3. Detect and delete its own leftovers before starting.

Example, cleanup before setup

This is especially helpful when tests use predictable identifiers.

import { test } from '@playwright/test';

test.beforeEach(async ({ request }) => { await request.delete(‘/api/test-support/reset?scope=current-worker’); });

That endpoint should reset only the namespace owned by the current worker, not the whole environment. In a shared CI environment, broad reset endpoints are a common source of accidental damage.

Don’t confuse UI resets with real data resets

Clicking a “Reset” button in the app may feel like cleanup, but it is often just another UI interaction. It might be subject to the same page load problems, permission problems, or asynchronous delays as the test itself.

Use UI reset flows when they are part of the product behavior you want to validate. Use API or direct data-layer cleanup when the goal is maintaining a repeatable test environment.

A practical rule of thumb:

  • UI cleanup for validating user-facing workflows
  • API cleanup for fast and reliable teardown
  • DB cleanup for full control, when your team owns the tradeoff

Database cleanup patterns that actually scale

If your organization allows direct database access in test environments, you can get strong repeatability. The trick is to avoid making the database a secret dependency that only one engineer understands.

1. Transaction rollback for isolated tests

For unit-like integration tests, wrap each test in a transaction and roll it back after execution. This is fast, but it only works when the browser flow and the backend share the same transaction boundary, which they often do not.

2. Truncate and reseed between suites

This works well for full end-to-end environments where you can afford a reset between runs. Be careful with foreign keys, background jobs, and test parallelism.

3. Schema-per-worker

Each Playwright worker gets its own database schema. This is one of the cleanest ways to support parallel runs, because workers cannot step on each other’s rows.

4. Tenant-per-test or tenant-per-worker

Multi-tenant apps often benefit from a dedicated tenant namespace per worker. It keeps business logic realistic while preventing collisions.

Watch for hidden state outside the database

A suite can still be non-repeatable even if the tables are clean.

Common hidden state sources:

  • browser localStorage and cookies
  • server-side caches
  • search indexes
  • message queues
  • email inboxes
  • file storage buckets
  • rate limits or API quotas
  • feature flag assignments

For Playwright, it is often worth starting each test with a fresh browser context and controlled storage state.

import { test } from '@playwright/test';
test('starts from a clean session', async ({ browser }) => {
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto('/');
  await context.close();
});

Fresh contexts are not enough if the backend still thinks the user is logged in, but they remove a whole category of client-side leftovers.

Make cleanup visible in CI logs

When teardown fails silently, the next test run pays the price. Log enough to answer these questions:

  • What resource was created?
  • Which test or worker created it?
  • Was deletion attempted?
  • Did the cleanup endpoint return success or a handled not-found?

A lightweight test-support API can make this much easier. If you build internal helpers for test data, consider a small admin surface that can list and delete resources by test namespace. That saves engineers from hunting down the data manually after a failure.

A practical cleanup checklist for Playwright suites

Use this as a baseline before you blame the app for flakiness:

  • Create unique data for each test wherever possible
  • Seed stable reference data once per environment or worker
  • Use API-level setup instead of UI setup for expensive preconditions
  • Return IDs from helpers so teardown can delete the exact resources created
  • Make delete operations idempotent
  • Use namespaces, not shared names, for parallel CI runs
  • Reset cookies, localStorage, and storage state between tests
  • Clear emails, queues, files, and feature flags if they affect the scenario
  • Keep cleanup close to setup in the same test or helper
  • Log test-owned data so failures can be diagnosed quickly

When a low-code workflow can help

Not every team wants to hand-maintain every setup and teardown path in code. In some organizations, a more visual workflow is easier to keep current, especially when QA and product need to inspect or edit the steps later.

That is where a tool like Endtest can be useful as a complementary option. It uses agentic AI and editable, step-based test workflows, which can make data setup and maintenance more approachable for teams that prefer to adjust test steps without editing source code directly. Its self-healing tests also aim to reduce locator-related breakage, which is a different problem from test data cleanup, but often shows up in the same flaky-test conversation.

The important distinction is that no tool removes the need for good data strategy. Whether you are writing Playwright code or maintaining step-based tests, the suite still needs clear ownership of its data, namespace boundaries, and teardown rules.

Choosing the right cleanup strategy for your team

Use this decision guide:

If your tests are mostly UI flows with modest data needs

  • Prefer unique data and API teardown
  • Keep browser contexts fresh
  • Avoid database-level complexity unless you need it

If your suite runs in parallel at CI scale

  • Assign per-worker namespaces or schemas
  • Reset shared state before the run
  • Make teardown idempotent and fast

If your app integrates with email, queues, billing, or storage

  • Add cleanup hooks for every external side effect
  • Use sandbox accounts or disposable endpoints
  • Track the created resources explicitly

If failures often happen after retries

  • Focus on setup and cleanup symmetry
  • Ensure every attempt starts from a known namespace
  • Verify cleanup on both success and failure paths

Final thought

Playwright test data cleanup is less about one magic afterEach and more about designing repeatable ownership of state. The best suites do not assume the world is clean, they make the world clean enough for the test they are about to run.

If you isolate well, seed carefully, and tear down predictably, CI gets quieter and debugging gets faster. That is the difference between a suite that only works on a good day and one the whole team can trust.