Skip to content

feat(cache): shared two-tier (L1+L2) read-through cache for singleton config#15066

Draft
valentijnscholten wants to merge 8 commits into
DefectDojo:devfrom
valentijnscholten:feature/unified-settings-cache
Draft

feat(cache): shared two-tier (L1+L2) read-through cache for singleton config#15066
valentijnscholten wants to merge 8 commits into
DefectDojo:devfrom
valentijnscholten:feature/unified-settings-cache

Conversation

@valentijnscholten

@valentijnscholten valentijnscholten commented Jun 23, 2026

Copy link
Copy Markdown
Member

What

Adds a small in-process (L1) read-through cache for global, low-cardinality singleton config (System_Settings & friends): each request/task reads a singleton from the DB at most once instead of on every access.

In processes that handle many findings this repeated reading was a real cost — a hash_code recalculation of 35,000 findings went from 24s to 12s just by memoizing the singleton read per request/task.

The cache stores plain dicts, not pickled model instances.

End result:

  • Authorization queries: cached per request (and thus per user) with @cache_for_request — a small pre-existing in-memory thread-local, proven in OS for a while. Reduces per-request query counts.

  • System settings singletons: cached in-process (L1) with a short TTL, reset at every request and task boundary. Obvious savings in multi-finding processing such as imports, hash_code calc, tag inheritance, dedupe. Now also works in celery tasks — the per-task reset keeps a reused worker from serving stale config.

New primitive — dojo/caching.py

  • dojo_settings_cache(key) — read-through over a per-thread in-process L1 cache. Resolves L1 → getter; a None result is treated as "no value" and is never cached.
  • L1 is reset at each request start (middleware) and each task start (celery), so it is effectively request/task-scoped: a reused worker/uwsgi thread never serves a value cached during a prior request/task (e.g. a since-changed System_Settings).
  • invalidate_dojo_settings_cache(key) drops this thread's L1 entry (called on save).
  • model_to_cache_dict / cache_dict_to_model — store/rebuild plain dicts, never pickled model instances.

Single tier, configured by one env (→ settings.dist.py):

  • DD_SETTINGS_CACHE_L1_TTL (seconds, default 30, -1 makes the decorator a pass-through).

Why there is no shared/L2 (Redis) tier

An earlier iteration layered a shared django.core.cache (Redis) L2 under L1. It was removed:

  • Not needed for correctness. The one hard requirement is that a celery worker sees current config (e.g. enable_deduplication) promptly. The per-task L1 reset already guarantees that by re-reading the DB each task — a shared L2 + cross-process invalidation is not required.
  • Cost outweighed benefit for these singletons. L2 brought a Redis dependency, pickled-vs-dict storage with .v2 version-keying, cross-process invalidation, and Redis-down fragility. The headline win (avoiding repeated reads inside a bulk loop) comes from L1, not L2.
  • Trade-off: one small indexed DB read per singleton per request/task instead of amortising across processes via Redis — fine for low-cardinality, signal-invalidated config.

DD_CACHE_URL and the Redis default django.core.cache backend remain for general framework caching; the singleton cache no longer reads or writes it.

System_Settings reads through L1

System_Settings.objects.get() is served from L1 as a dict rebuilt into a read-only instance per call. Write paths (objects.get(no_cache=True)) are unchanged. Busted on save via an import-time @receiver(post_save) so it fires in requests, Celery, commands and tests. The legacy per-request manager middleware is slimmed to just L1-reset + the load-error banner and renamed DojoSettingsManagerMiddleware.

Tests

  • unittests/test_caching.py covers L1 memoization, -1 pass-through, None-not-cached, and invalidation.
  • Query-count tests (test_metrics_queries, test_false_positive_history_logic, test_tag_inheritance_perf, test_importers_performance) pin the cache via @override_settings(SETTINGS_CACHE_L1_TTL=30) so counts are deterministic and reflect one DB read per request/task; counts were updated where the per-task reset legitimately changes them (e.g. eager imports re-read per task, as a real async worker would).

No behavior change on non-cache paths. No new dependencies (RedisCache, if used for the default backend, is built into Django).

@github-actions github-actions Bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests labels Jun 23, 2026
@valentijnscholten valentijnscholten force-pushed the feature/unified-settings-cache branch from 4ee0039 to b7ba078 Compare June 24, 2026 09:58
… config

- new dojo/caching.py: dojo_settings_cache(key, timeout) read-through over an
  in-process L1 tier (off|thread|process, TTL'd) on top of django.core.cache (L2).
  Resolves L1 -> L2 -> getter; None is never cached. invalidate_dojo_settings_cache;
  model_to_cache_dict/cache_dict_to_model (store dicts, not pickled instances).
  Configurable: DD_SETTINGS_CACHE_ENABLED / DD_CACHE_L1_MODE / DD_CACHE_L1_TTL /
  DD_CACHE_L2_TTL (-1 disables a tier).
- System_Settings read path served through the cache (dict, rebuilt per call);
  write paths (no_cache=True) unchanged; busted on save via post_save receiver.
- disabled in unit tests (compose overrides) so assertNumQueries counts stay
  deterministic; the cache itself is covered by unittests/test_caching.py.

No new dependencies; no behavior change on non-cache paths.
@valentijnscholten valentijnscholten force-pushed the feature/unified-settings-cache branch from b7ba078 to 322bf6f Compare June 24, 2026 11:09
…validation)

The L1+L2 singleton cache (dojo/caching.py) needs its L2 tier to be a
cross-process store so cache invalidation (e.g. on System_Settings save in
uwsgi) propagates to celery workers. Without it Django defaults to per-process
LocMemCache, so a worker keeps serving a stale System_Settings (e.g.
enable_deduplication=False) and async dedup silently no-ops -> dedupe/
close_old_findings_dedupe/questionnaire integration tests fail.

- Add DD_CACHE_URL env; when set, configure django.core.cache RedisCache.
- Default DD_CACHE_URL=redis://valkey:6379/1 in compose (uwsgi/celery/beat).
- Mount source into the integration celeryworker so it runs the same code as
  uwsgi (enables local reproduction of worker-side cache behavior).
Sharing defectdojo_media_integration_tests between uwsgi and celeryworker
races on mkdir media/threat at startup (file exists), flaking the worker
container. The mount was only a local-repro aid and is unnecessary in CI,
where the worker runs the image built from PR source. Revert to the original
worker definition; the Redis L2 fix is unaffected.
Unit tests run with no Redis (DD_SETTINGS_CACHE_L2_TTL=-1) but inherited
DD_CACHE_URL from base compose, so System_Settings post_save invalidation
called cache.delete against a nonexistent valkey -> rest-framework setUpClass
ConnectionError. invalidate_dojo_settings_cache now skips cache.delete when
L2 is disabled (consistent with the read path, which already skips L2). Also
blank DD_CACHE_URL in the unit-test overrides so the backend is LocMemCache.
Adds test_invalidate_skips_l2_when_disabled.
…banner

The decorator (dojo.caching) now owns all System_Settings caching, so the
middleware no longer keeps its own per-request thread-local instance. It only:
(a) resets the request-scoped L1 tier at request start, (b) surfaces a DB-read
error for the banner. Renamed DojoSytemSettingsMiddleware ->
DojoSettingsManagerMiddleware (it governs the shared settings cache lifecycle,
not just System_Settings). Manager.get() reads straight through the cache.

To keep assertNumQueries deterministic without the thread-local, unit tests now
run with L1 ON (request/test-scoped) and L2 OFF instead of both disabled: L1 is
reset per request (middleware) and per test (dojo_test_case setUp via the new
refresh_system_settings_cache helper, replacing middleware.load()). This mirrors
the old thread-local's per-request memoization. Rewrote the middleware tests for
the new contract (cache via decorator, instance rebuilt per call, L1 reset, error
banner).
The L1 settings cache is reset per celery task (DojoAsyncTask), so eager-mode
tests now faithfully mirror real async workers: each task re-reads
System_Settings (from L2) instead of sharing one request-wide read as the old
middleware thread-local did. assertNumQueries baselines shift accordingly:
- Single cached reads now save a query (metrics 27->25/41->40, fp_history 7->6).
- Bulk create/import where many per-finding tasks run inline read once per task
  (tag_inheritance create_100 3124->3224; zap import/reimport +3-4; importer
  performance +2-8). In production these run in workers, off the request path.
Counts verified locally against the CI unit-test cache config (L1 on, L2 off).
The query-count assertions must measure in-process (L1) behavior deterministically,
independent of whether a shared L2 (Redis) cache is configured/warm in the run
environment. Relying on the unit-test compose env (DD_SETTINGS_CACHE_L2_TTL=-1)
left the counts env-dependent (e.g. running via run-unittest in dev mode, or the
manage.py-test perf path, would see L2 on). Add an explicit
@override_settings(SETTINGS_CACHE_L1_TTL=30, SETTINGS_CACHE_L2_TTL=-1) on the
count-asserting classes so they are self-contained. No count change (compose
already disabled L2 in CI).
The singleton settings cache (dojo/caching.py) becomes a single in-process L1
tier; the shared django.core.cache (L2) tier is removed. Cross-process/worker
freshness is provided entirely by resetting L1 at each request and task boundary
(middleware + DojoAsyncTask), so each request/task reads a singleton from the DB
at most once and never serves a value cached during a prior request/task.

Why no L2 for settings singletons:
- The dedupe correctness fix only needs each task to see current config, which
  the per-task L1 reset already guarantees by re-reading the DB -- a shared L2 +
  cross-process invalidation is not required for correctness.
- L2 added real cost and complexity for little gain on these tiny, hot
  singletons: a Redis dependency, pickled-vs-dict storage and version-keying,
  cross-process invalidation, and Redis-down fragility -- while the headline win
  (avoiding repeated reads within a bulk loop) comes from L1, not L2.
- Trade-off: one small indexed DB read per singleton per request/task instead of
  amortising across processes via Redis. Acceptable for low-cardinality config.

Removes SETTINGS_CACHE_L2_TTL and the L2 read/set/delete paths. DD_CACHE_URL and
the Redis default django cache backend stay (used by other framework caching).
Unit-test overrides drop the now-defunct L2 TTL var. test_caching rewritten to
the L1-only contract.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docker settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant