Running the regression tests

Three standalone harnesses ship in tools/regression-tests/ and validate the public HTTP surface, recurring-cron behaviour, and end-to-end load performance. Each one runs against any live PerfLocale install — local, staging, or production — and needs only WP-CLI + curl.

The three harnesses live alongside the existing tools/concurrency-tests/ suite (which covers internal locking + race conditions) and are excluded from the wp.org-distributed zip by .distignore — they’re developer / operator tooling, not runtime code.

HarnessWhat it catchesRuntime
rest-contracts.phpAPI auth-bypass regressions, response-shape breakage, missing permission_callback on new endpoints~5s per site
cron-timing.phpSchedule registration drift, handler unwiring, dead feature-flag toggles, multisite per-blog scoping~10s per site
load-test.shStatus-code regressions under concurrent load, end-to-end latency drift, 5xx spikes, p95 degradation~2-5min per scenario

Prerequisites

  • A working WP-CLI install pointing at the site (wp --info should print the WordPress version).
  • curl + awk + xargs (standard on every Linux / macOS host).
  • For multisite: an admin user (uid 1 on most installs).

No PHP testing framework, no Composer, no MySQL service, no Docker. The harnesses use wp eval-file to run inside the live WP runtime, so they see the same database state any real request would see.

REST API contract regression — rest-contracts.php

Uses rest_do_request() to invoke endpoints in-process. No HTTP overhead, no nonce dance, no auth cookies. Covers 27 routes across 7 controllers:

  • ConfigController — public bootstrap endpoint (when edge_integration_enabled is on), ETag revalidation, 304 Not Modified flow.
  • LanguagesController — public list, admin-gated create/update/delete, response shape (id, slug, name, is_default required keys), invalid-slug rejection.
  • JobsController — admin-gated list + get + delete + cancel, unknown-uuid 404.
  • TranslationsController — type+id endpoint, bad-type rejection, negative-id rejection.
  • MachineTranslateController — write endpoint, auth gating, missing-required-field rejection.
  • StringsController + TranslationMemoryController — read endpoints.
  • Sweeps: unknown route → 404, unsupported method → 405, every plugin route declares a permission_callback (except the WP-auto-registered namespace index).
cd /path/to/wordpress
wp eval-file wp-content/plugins/perflocale/tools/regression-tests/rest-contracts.php --user=1

Expected output: each section prints ok N - description for every passing assertion, then a final === SUMMARY: N checks, 0 failures === line. Any not ok line is a real regression to investigate.

Multisite: the harness automatically iterates over the first two blogs via switch_to_blog(), so a single invocation covers main + a subsite.

When to run:

  • Before merging any PR that touches src/Api/.
  • After upgrading WordPress (catches WP-side REST API behaviour changes).
  • After adding a new REST endpoint — verify it shows up in the “every route has a permission_callback” sweep.

Recurring-cron regression — cron-timing.php

For each scheduled hook the plugin registers — perflocale_jobs_gc, perflocale_jobs_watchdog, perflocale_lock_cleanup, perflocale_mt_quality_score — verifies:

  • Schedule is registered after Bootstrap::ensure_recurring_schedules() fires (engine-agnostic: checks Action Scheduler and WP-Cron).
  • Handler is wired ($wp_filter entry exists for the hook).
  • Force-firing the hook via do_action() produces the documented side effect — e.g. planted-then-pruned 90-day-old job; planted-then-reaped expired lock row.
  • Toggling the relevant feature flag schedules / unschedules its hook (mt_quality_score_enabled is opt-in; flipping it should make the schedule appear / disappear).
  • Schedule timestamps are within the recurring-interval window (rejects clock-skew, drift, or never-scheduled).
  • On multisite: schedules are correctly scoped to the current blog id via the args.
wp eval-file wp-content/plugins/perflocale/tools/regression-tests/cron-timing.php --user=1

Note on dev sites: if a local site has been dormant for a while, Action Scheduler actions may be pending but overdue (4-5 days late) because nothing has triggered the AS queue runner. The harness treats “pending but overdue” as scheduled-ok — AS will fire them as soon as traffic resumes. Only schedules that are missing entirely are flagged.

End-to-end load test — load-test.sh

Bash + curl parallel-request runner. Fires N concurrent HTTP requests per scenario via xargs -P, captures end-to-end timing via curl -w '%{time_total}', computes min / p50 / p95 / p99 / max / avg from the timing samples. No external dependencies — not even k6, Apache Bench, or PHP.

Four scenarios, all GET:

  1. / — default-language home (the most-trafficked URL).
  2. /de/ — translated-language home (URL-routing hot path; 301 if the language isn’t configured, which is healthy).
  3. /wp-json/perflocale/v1/config — REST config endpoint (cacheable via ETag; 404 if edge_integration_enabled is off, which is also healthy).
  4. /wp-json/perflocale/v1/languages — REST list endpoint (small payload, hot read path).
# Quick smoke (1000 total requests, ~30s wall time)
tools/regression-tests/load-test.sh --site=test --concurrency=10 --requests=100

# Heavier sustained load (10,000 total, ~3-5min)
tools/regression-tests/load-test.sh --site=mutest --concurrency=25 --requests=500

# Add new sites to the SITE_URL map at the top of the script if your
# test hostnames differ from the defaults.

Reading the output

Each scenario prints status-code counts and percentile timings:

── Scenario: Default-language home
   Samples: 100  Status codes: 200=100
   min=0.094s  p50=0.378s  p95=0.441s  p99=0.472s  max=0.472s  avg=0.367s

The harness flags:

  • 5xx server errorsalways investigate. Real plugin bug or stack failure.
  • 4xx client errors — expected when a route is gated behind a setting (e.g., /config is 404 unless edge_integration_enabled=true). Check whether the gate is intentional.
  • p95 above 1 second — investigate. Usually points to either a real plugin perf regression OR a stack-level limit (single PHP-FPM worker serialising requests, slow MySQL, cold OPcache).

Interpreting dev-environment numbers

On a local dev stack (Local-by-Flywheel, MAMP, Lando, etc.) the load test usually surfaces stack limits, not plugin perf. A typical pattern: home page p50 = 7ms (page-cache hit) but REST endpoints p50 = 2s (single PHP-FPM worker serialising all 10 concurrent requests). On production with a worker pool of 10-50, the same scenarios stay sub-100ms.

To get a meaningful baseline on your own host:

  • Run with concurrency ≤ your PHP-FPM pm.max_children setting (or whatever the equivalent is in your stack).
  • Warm caches first — do one throwaway run, then capture the second.
  • Compare relative changes across PerfLocale versions, not absolute numbers across hosts.

CI integration

The rest-contracts.php and cron-timing.php harnesses are fast enough to run on every PR. The plugin’s existing .github/workflows/ci.yml covers syntax-lint + PHPCS + POT generation + the WordPress Plugin Check across WP 6.4-latest × PHP 8.1-8.2; adding the regression harnesses would need a MySQL service + a WP install step but is otherwise straightforward:

# Sketch of the CI job for the REST + cron harnesses
- uses: actions/checkout@v4
- name: Set up MySQL
  run: |
    sudo systemctl start mysql
    mysql -uroot -proot -e "CREATE DATABASE wp_test;"
- name: Bootstrap WordPress + plugin
  run: |
    wp core download --path=/tmp/wp
    cd /tmp/wp
    wp config create --dbname=wp_test --dbuser=root --dbpass=root --dbhost=127.0.0.1
    wp core install --url=http://localhost --title=Test --admin_user=admin [email protected] --admin_password=p
    ln -s $GITHUB_WORKSPACE wp-content/plugins/perflocale
    wp plugin activate perflocale
- name: Run REST contract regression
  working-directory: /tmp/wp
  run: wp eval-file wp-content/plugins/perflocale/tools/regression-tests/rest-contracts.php --user=1
- name: Run cron timing regression
  working-directory: /tmp/wp
  run: wp eval-file wp-content/plugins/perflocale/tools/regression-tests/cron-timing.php --user=1

The load test is harder to fit into CI — meaningful numbers need a steady-state stack, and CI runners are noisy. Run it locally before tagging a release.

  • Reliability & Circuit Breakers — the runtime guarantees the regression harnesses verify.
  • WP-CLI Commandswp perflocale addon doctor, wp perflocale jobs list, etc. for operator-side ad-hoc inspection.
  • tools/concurrency-tests/ in the plugin repo — the older suite that exercises internal locking + race conditions via pcntl_fork. Heavier setup (needs burst1k.sh + scenario files), longer runtime, complements the three harnesses above.