Running the regression tests
Three standalone harnesses ship in tools/regression-tests/ and validate the public HTTP surface, recurring-cron behaviour, and end-to-end load performance. Each one runs against any live PerfLocale install — local, staging, or production — and needs only WP-CLI + curl.
The three harnesses live alongside the existing tools/concurrency-tests/ suite (which covers internal locking + race conditions) and are excluded from the wp.org-distributed zip by .distignore — they’re developer / operator tooling, not runtime code.
| Harness | What it catches | Runtime |
|---|---|---|
rest-contracts.php | API auth-bypass regressions, response-shape breakage, missing permission_callback on new endpoints | ~5s per site |
cron-timing.php | Schedule registration drift, handler unwiring, dead feature-flag toggles, multisite per-blog scoping | ~10s per site |
load-test.sh | Status-code regressions under concurrent load, end-to-end latency drift, 5xx spikes, p95 degradation | ~2-5min per scenario |
Prerequisites
- A working WP-CLI install pointing at the site (
wp --infoshould print the WordPress version). curl+awk+xargs(standard on every Linux / macOS host).- For multisite: an admin user (uid
1on most installs).
No PHP testing framework, no Composer, no MySQL service, no Docker. The harnesses use wp eval-file to run inside the live WP runtime, so they see the same database state any real request would see.
REST API contract regression — rest-contracts.php
Uses rest_do_request() to invoke endpoints in-process. No HTTP overhead, no nonce dance, no auth cookies. Covers 27 routes across 7 controllers:
ConfigController— public bootstrap endpoint (whenedge_integration_enabledis on), ETag revalidation, 304 Not Modified flow.LanguagesController— public list, admin-gated create/update/delete, response shape (id, slug, name, is_defaultrequired keys), invalid-slug rejection.JobsController— admin-gated list + get + delete + cancel, unknown-uuid 404.TranslationsController— type+id endpoint, bad-type rejection, negative-id rejection.MachineTranslateController— write endpoint, auth gating, missing-required-field rejection.StringsController+TranslationMemoryController— read endpoints.- Sweeps: unknown route → 404, unsupported method → 405, every plugin route declares a
permission_callback(except the WP-auto-registered namespace index).
cd /path/to/wordpress
wp eval-file wp-content/plugins/perflocale/tools/regression-tests/rest-contracts.php --user=1Expected output: each section prints ok N - description for every passing assertion, then a final === SUMMARY: N checks, 0 failures === line. Any not ok line is a real regression to investigate.
Multisite: the harness automatically iterates over the first two blogs via switch_to_blog(), so a single invocation covers main + a subsite.
When to run:
- Before merging any PR that touches
src/Api/. - After upgrading WordPress (catches WP-side REST API behaviour changes).
- After adding a new REST endpoint — verify it shows up in the “every route has a permission_callback” sweep.
Recurring-cron regression — cron-timing.php
For each scheduled hook the plugin registers — perflocale_jobs_gc, perflocale_jobs_watchdog, perflocale_lock_cleanup, perflocale_mt_quality_score — verifies:
- Schedule is registered after
Bootstrap::ensure_recurring_schedules()fires (engine-agnostic: checks Action Scheduler and WP-Cron). - Handler is wired (
$wp_filterentry exists for the hook). - Force-firing the hook via
do_action()produces the documented side effect — e.g. planted-then-pruned 90-day-old job; planted-then-reaped expired lock row. - Toggling the relevant feature flag schedules / unschedules its hook (
mt_quality_score_enabledis opt-in; flipping it should make the schedule appear / disappear). - Schedule timestamps are within the recurring-interval window (rejects clock-skew, drift, or never-scheduled).
- On multisite: schedules are correctly scoped to the current blog id via the args.
wp eval-file wp-content/plugins/perflocale/tools/regression-tests/cron-timing.php --user=1Note on dev sites: if a local site has been dormant for a while, Action Scheduler actions may be pending but overdue (4-5 days late) because nothing has triggered the AS queue runner. The harness treats “pending but overdue” as scheduled-ok — AS will fire them as soon as traffic resumes. Only schedules that are missing entirely are flagged.
End-to-end load test — load-test.sh
Bash + curl parallel-request runner. Fires N concurrent HTTP requests per scenario via xargs -P, captures end-to-end timing via curl -w '%{time_total}', computes min / p50 / p95 / p99 / max / avg from the timing samples. No external dependencies — not even k6, Apache Bench, or PHP.
Four scenarios, all GET:
/— default-language home (the most-trafficked URL)./de/— translated-language home (URL-routing hot path;301if the language isn’t configured, which is healthy)./wp-json/perflocale/v1/config— REST config endpoint (cacheable via ETag;404ifedge_integration_enabledis off, which is also healthy)./wp-json/perflocale/v1/languages— REST list endpoint (small payload, hot read path).
# Quick smoke (1000 total requests, ~30s wall time)
tools/regression-tests/load-test.sh --site=test --concurrency=10 --requests=100
# Heavier sustained load (10,000 total, ~3-5min)
tools/regression-tests/load-test.sh --site=mutest --concurrency=25 --requests=500
# Add new sites to the SITE_URL map at the top of the script if your
# test hostnames differ from the defaults.Reading the output
Each scenario prints status-code counts and percentile timings:
── Scenario: Default-language home
Samples: 100 Status codes: 200=100
min=0.094s p50=0.378s p95=0.441s p99=0.472s max=0.472s avg=0.367sThe harness flags:
✗5xx server errors — always investigate. Real plugin bug or stack failure.⚠4xx client errors — expected when a route is gated behind a setting (e.g.,/configis404unlessedge_integration_enabled=true). Check whether the gate is intentional.⚠p95 above 1 second — investigate. Usually points to either a real plugin perf regression OR a stack-level limit (single PHP-FPM worker serialising requests, slow MySQL, cold OPcache).
Interpreting dev-environment numbers
On a local dev stack (Local-by-Flywheel, MAMP, Lando, etc.) the load test usually surfaces stack limits, not plugin perf. A typical pattern: home page p50 = 7ms (page-cache hit) but REST endpoints p50 = 2s (single PHP-FPM worker serialising all 10 concurrent requests). On production with a worker pool of 10-50, the same scenarios stay sub-100ms.
To get a meaningful baseline on your own host:
- Run with concurrency ≤ your PHP-FPM
pm.max_childrensetting (or whatever the equivalent is in your stack). - Warm caches first — do one throwaway run, then capture the second.
- Compare relative changes across PerfLocale versions, not absolute numbers across hosts.
CI integration
The rest-contracts.php and cron-timing.php harnesses are fast enough to run on every PR. The plugin’s existing .github/workflows/ci.yml covers syntax-lint + PHPCS + POT generation + the WordPress Plugin Check across WP 6.4-latest × PHP 8.1-8.2; adding the regression harnesses would need a MySQL service + a WP install step but is otherwise straightforward:
# Sketch of the CI job for the REST + cron harnesses
- uses: actions/checkout@v4
- name: Set up MySQL
run: |
sudo systemctl start mysql
mysql -uroot -proot -e "CREATE DATABASE wp_test;"
- name: Bootstrap WordPress + plugin
run: |
wp core download --path=/tmp/wp
cd /tmp/wp
wp config create --dbname=wp_test --dbuser=root --dbpass=root --dbhost=127.0.0.1
wp core install --url=http://localhost --title=Test --admin_user=admin [email protected] --admin_password=p
ln -s $GITHUB_WORKSPACE wp-content/plugins/perflocale
wp plugin activate perflocale
- name: Run REST contract regression
working-directory: /tmp/wp
run: wp eval-file wp-content/plugins/perflocale/tools/regression-tests/rest-contracts.php --user=1
- name: Run cron timing regression
working-directory: /tmp/wp
run: wp eval-file wp-content/plugins/perflocale/tools/regression-tests/cron-timing.php --user=1The load test is harder to fit into CI — meaningful numbers need a steady-state stack, and CI runners are noisy. Run it locally before tagging a release.
Related test infrastructure
- Reliability & Circuit Breakers — the runtime guarantees the regression harnesses verify.
- WP-CLI Commands —
wp perflocale addon doctor,wp perflocale jobs list, etc. for operator-side ad-hoc inspection. tools/concurrency-tests/in the plugin repo — the older suite that exercises internal locking + race conditions viapcntl_fork. Heavier setup (needsburst1k.sh+ scenario files), longer runtime, complements the three harnesses above.