June 16, 2026
Why Browser Tests Pass in Local Dev but Fail in CI: The Hidden Environment Drift Checklist
A practical checklist for environment drift causes behind browser tests that pass locally but fail in CI, including fonts, timezone, viewport, caching, and browser versions.
If you have spent any time with browser automation, you have probably seen this pattern: the test is green on a laptop, green on a teammate’s laptop, and red in CI for reasons that feel annoyingly specific. The app did not “randomly” break. More often, the test and the environment drifted apart in ways that are invisible until a runner, a container, or a different browser build exposes them.
This checklist is for the boring differences that cause expensive failures. Not the dramatic app bugs, but the quiet environment gaps, fonts, timezone, viewport defaults, cache behavior, browser versions, and execution timing. If your team keeps asking why browser tests fail in CI but pass locally, this is the trail to follow.
Browser automation and test automation both rely on the same basic idea, that a repeatable script can observe and assert behavior in a controlled environment. Continuous integration is where that idea gets tested against reality, because CI rarely looks exactly like a developer machine. For background on the broader concepts, see software testing, test automation, and continuous integration.
What environment drift actually means
Environment drift is any difference between where a test was authored and where it is executed. Some drift is expected, like different operating systems. Some is accidental, like a CI container missing a font package or a browser auto-updating on a developer machine.
In browser testing, drift shows up as:
- Different rendering, because fonts or GPU settings changed
- Different timing, because the CI runner is slower or more contended
- Different date and time behavior, because timezone and locale differ
- Different browser behavior, because versions and channels are not aligned
- Different storage state, because cookies, localStorage, service workers, and caches are not equivalent
- Different viewport and device assumptions, because the test runs headless at one size locally and another in CI
The most frustrating CI failures are often not flaky tests in the abstract. They are deterministic tests running in a different world.
The practical goal is not to eliminate every difference. It is to know which differences matter to your suite, then remove, pin, or neutralize them.
Start with a quick triage order
When a browser test passes locally and fails in CI, do not start by rewriting assertions. Start by checking the environment in this order:
- Browser version and channel
- Viewport and device emulation
- Timezone, locale, and language settings
- Fonts and rendering dependencies
- Cache, cookies, and persistent storage
- CPU, memory, and parallelism pressure
- Network shape, including mocks and service workers
- Test data and reset behavior
That order matters because the early items often explain the later ones. A browser version mismatch can change layout behavior, and a viewport mismatch can make a “visible” element disappear behind responsive CSS. If you fix timing before you fix browser parity, you can spend hours tuning waits around the wrong root cause.
Checklist: browser tests fail in CI but pass locally
Use this as a working checklist, not a one-time audit. A lot of teams run through it once, then forget to keep the CI image and local tooling in sync.
1) Pin the browser version, do not trust “latest”
Browser automation is sensitive to version changes. A local machine may auto-update Chrome, while CI uses a fixed container image or an older package cache. That mismatch can change:
- Layout and paint timing
- Media query behavior
- Focus handling
- File upload dialogs and clipboard permissions
- Shadow DOM and accessibility tree behavior
- Headless mode quirks
What to check:
- Is the same browser major version used locally and in CI?
- Is your test runner using the system browser or a bundled browser?
- Are you mixing stable, beta, and canary channels across environments?
What to do:
- Pin browser versions where possible
- Document the version in the repo or build image
- Make local execution use the same browser channel as CI for the relevant suite
Example with Playwright, which can install and use a specific browser build:
bash npx playwright install –with-deps npx playwright test
If you are using Selenium, you should also think about driver compatibility, because browser and driver mismatches can fail in ways that look like app bugs but are really infrastructure issues.
2) Match viewport defaults exactly
A surprising number of “CI-only” failures are actually responsive layout failures. A test written on a 1440 pixel wide laptop may never notice that the CI runner is using a default viewport that triggers a tablet or mobile layout.
Common symptoms:
- Buttons disappear into a hamburger menu
- Sticky headers cover click targets
- Text wraps and moves the assertion target
- Elements shift position between screenshot and click
- Mobile-specific overlays block interaction
What to check:
- Local browser window size
- Headless default viewport size
- Device scale factor
- Whether the runner is truly headless or just running in a minimized window
What to do:
- Set viewport explicitly in the test config
- Use the same device profile if the suite targets a device class
- Avoid relying on the browser window’s default dimensions
Playwright example:
import { defineConfig } from '@playwright/test';
export default defineConfig({ use: { viewport: { width: 1440, height: 900 }, }, });
If a test only passes on one viewport, that is usually not a test stability problem. It is a product behavior problem, and the test is doing its job.
3) Align timezone and locale
Timezone drift creates some of the most misleading failures because the UI may look correct while the assertions are not. Dates, cutoff logic, relative time labels, and business rules tied to local midnight are common culprits.
Common symptoms:
- “Today” or “tomorrow” labels change unexpectedly
- Date pickers select the wrong day
- Scheduled events appear one day earlier or later
- Snapshot tests shift because locale formatting differs
- API-backed UI values disagree with frontend-rendered values
What to check:
- System timezone in CI
- Browser timezone emulation
- Locale and language headers
- Date formatting libraries reading local machine defaults
What to do:
- Force a known timezone in CI and local runs
- Make tests avoid clock-sensitive assertions when possible
- Freeze time in test data when business logic depends on date boundaries
Playwright supports timezone and locale configuration:
use: {
timezoneId: 'UTC',
locale: 'en-US'
}
A good rule, if the feature is timezone-sensitive, the test should say so explicitly. Hidden assumptions are where flakiness breeds.
4) Check fonts, rendering libraries, and OS packages
Fonts are one of the most underappreciated sources of rendering drift. CI containers often lack the same font families installed on developer laptops, which changes text width, wrapping, and sometimes even element height.
Common symptoms:
- Text wraps differently in CI screenshots
- Buttons or badges expand and push content
- Baseline alignment differs enough to affect pixel-based assertions
- Headings take up more vertical space and hide nearby elements
What to check:
- Which fonts are installed in the CI image?
- Is font fallback happening?
- Are you comparing screenshots across different OS families?
- Does the application rely on custom web fonts that load slowly in CI?
What to do:
- Install the same font packages in CI that your app expects
- Prefer DOM assertions over pixel-perfect screenshots unless visual diffs are the goal
- Wait for font loading when the test depends on final layout
A useful habit is to inspect the failed screenshot and compare the actual layout metrics, not just the pixels. If the text wraps in CI but not locally, the issue is often font substitution, not a broken selector.
5) Neutralize cache and persistent state differences
A local test can accidentally pass because the browser has warm caches, existing cookies, or storage state from a previous run. CI runners, by contrast, often start from clean containers, which is good, until the test secretly depended on persisted state.
Common symptoms:
- Service workers serving stale content locally or in CI only
- Auth tests passing on a developer machine because cookies already exist
- Feature flags not matching because local storage is reused
- Asset loading behavior changing after cache misses
- Tests passing after the first run but failing on a clean machine
What to check:
- Does the test reuse browser context state?
- Are cookies or localStorage persisted between runs?
- Does a service worker alter network responses?
- Is the suite cleaning data between tests or only between jobs?
What to do:
- Run critical suites in a fresh browser context every time
- Clear storage explicitly when the test depends on a clean state
- Make test setup idempotent
- If you depend on a warm cache, say so and isolate that case
Example idea for a clean Playwright context:
typescript
const context = await browser.newContext();
const page = await context.newPage();
If a login test only works with existing cookies, it is not really a login test. It is a session reuse test.
6) Compare headless and headed behavior
Headless execution is often the default in CI, but local development may use headed mode. That difference can expose timing, scrolling, and focus issues.
Common symptoms:
- Clicks work headed but not headless
- Hover menus behave differently
- Smooth scrolling changes timing enough to break assertions
- Elements are “visible” to the DOM but not interactable in the viewport
What to check:
- Are local runs using headed mode while CI uses headless?
- Are animations enabled in one environment and disabled in another?
- Does focus management depend on visible browser chrome?
What to do:
- Reproduce locally in headless mode
- Run the same browser flags in both environments
- Consider disabling animations in test mode when UI motion is irrelevant to the feature
A lot of flaky interaction tests are really visibility and scroll state problems. Headless mode just makes them easier to see.
7) Watch CPU, memory, and parallelism pressure
CI runners are usually shared, throttled, or optimized for throughput instead of interactivity. That changes timing in ways a laptop rarely experiences.
Common symptoms:
- Waits that are barely sufficient locally time out in CI
- Animations take longer and leave elements in transition states
- Multiple browser workers compete for CPU and memory
- Web apps with heavy client-side rendering miss interaction windows
What to check:
- How many tests run in parallel?
- Does CI use a smaller runner than your workstation?
- Are browser workers shared with build steps?
- Are timeouts tuned for local convenience rather than CI reality?
What to do:
- Keep CI timeouts realistic, but not so large they hide regressions
- Reduce parallelism if the runner is resource constrained
- Separate build and browser test stages when contention is high
- Measure where tests spend time, setup, navigation, rendering, or assertions
If a test only passes when the machine is quiet, it is not stable enough for a shared pipeline.
8) Eliminate hidden network assumptions
Local dev environments often have fast connections, cached DNS, authenticated proxies, or backend services running on the same machine. CI may have slower network access, different DNS resolution, or mocked services that behave slightly differently.
Common symptoms:
- API responses arrive later than the test expects
- Third-party assets block rendering in CI
- Auth redirects behave differently behind proxies
- Network intercepts in local runs do not match CI traffic
What to check:
- Are backend services mocked the same way in both places?
- Does CI have access to all required domains?
- Are network retries hiding flaky upstream behavior locally?
- Are service workers or intercepts masking real requests?
What to do:
- Make the network contract explicit in test setup
- Stub external dependencies consistently
- Do not rely on undocumented local proxy rules
- Record and replay only where it is appropriate and maintainable
When debugging, compare the actual request logs from both environments. A “click did nothing” complaint is often a request that never fired, or one that returned a different payload than expected.
9) Reproduce CI locally instead of guessing
The fastest way to end environment drift arguments is to run the closest possible version of CI on a developer machine.
What to check:
- Can you run the same container image locally?
- Can you mimic the same browser, timezone, locale, and user?
- Does the failure still occur with the same CPU and memory limits?
What to do:
- Use the CI container or image locally when possible
- Run the same test command, not a custom variant
- Keep the local and CI entry points aligned
GitHub Actions, for example, makes it straightforward to define the job environment in code. A simple workflow should look more like a reproducible system and less like a magic box:
name: browser-tests
on: [push, pull_request]
jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 - run: npm ci - run: npx playwright install –with-deps - run: npm test
If local reproduction is impossible, that is a signal to simplify the CI setup or document the environment more explicitly.
A practical drift audit you can run this week
If you need a lightweight process, start with this audit on your most flaky browser suite:
Environment parity audit
- Compare browser versions between local and CI
- Compare viewport defaults and device emulation settings
- Compare timezone, locale, and language settings
- Compare installed fonts and OS packages
- Compare headless versus headed execution flags
- Compare storage reset behavior, cookies, localStorage, sessionStorage
- Compare parallelism and resource allocation
- Compare network stubs, proxies, and service worker behavior
- Compare test data setup and teardown
Test design audit
- Are assertions tied to visible text that can wrap or reflow?
- Are selectors stable, or do they depend on layout-specific structure?
- Do waits observe real state changes, not arbitrary sleep values?
- Does the test encode assumptions about date, time, or locale?
- Does the test pass because of warm state rather than setup?
Pipeline audit
- Is the CI image versioned and updated deliberately?
- Do local dev scripts use the same browser runner as CI?
- Is the failure artifact captured with enough context to compare runs?
- Can a developer re-run the same job setup locally within minutes?
If you cannot explain the environment in one page, you probably cannot trust it to be consistent.
When to fix the test, when to fix the environment
Not every CI failure means the environment is wrong. Sometimes the test is brittle. The trick is deciding which side to change.
Fix the environment when:
- Browser versions differ unintentionally
- Fonts, timezone, or viewport are inconsistent
- The test only fails because CI is missing a dependency or package
- The application behavior should be the same, but the runner is not
Fix the test when:
- It assumes too much about timing
- It depends on exact pixel positions without good reason
- It relies on state left behind by previous tests
- It fails under normal responsive behavior
- It uses selectors that break when minor markup changes happen
A good test suite is not just green, it is explainable. If a test cannot survive a clean run in the right environment, the first job is to make that environment visible and reproducible.
A simple rule for teams
Treat browser test failures like a configuration bug until proven otherwise. That mindset keeps teams from wasting time patching around unstable assertions when the real issue is environment drift.
For QA managers, this means standardizing runner images and documenting parity expectations. For SDETs, it means building tests that declare their assumptions, instead of inheriting them accidentally. For DevOps engineers, it means versioning the runtime with the same care you apply to application dependencies. For frontend engineers, it means understanding that a layout bug can be hidden until CI runs with a different browser, font stack, or viewport.
Closing checklist
Before you label a browser suite flaky, verify these basics:
- Same browser version locally and in CI
- Same viewport, device scale, and headless mode
- Same timezone and locale
- Same fonts and OS dependencies
- Same cache and storage reset behavior
- Same resource constraints, or at least known ones
- Same network stubs, proxies, and backend data shape
- Same test command and setup path
When those pieces line up, genuine product bugs become easier to spot, and fake flakiness becomes easier to remove. That is the real payoff of test parity: fewer mysteries, fewer reruns, and less time spent arguing with a pipeline that was never running the same world you were.