Skip to content

(feat): per-IOC locking#165

Open
anderslindho wants to merge 6 commits into
masterfrom
feat/per-ioc-locking
Open

(feat): per-IOC locking#165
anderslindho wants to merge 6 commits into
masterfrom
feat/per-ioc-locking

Conversation

@anderslindho
Copy link
Copy Markdown
Contributor

No description provided.

Previously abort() returned the error on CancelledError and raised a
new one on other failures, leaving the chain errored so any queued
commit (notably the disconnect transaction) was never executed —
IOCs could stay Active in CF after an upload failure.

Co-authored-by: Sky Brewer <sky.brewer@ess.eu>
Comment thread server/tests/unit/test_recast.py
Comment thread server/recceiver/cf/processor.py Outdated
Comment thread server/tests/unit/cf/test_processor.py Outdated
Comment thread server/recceiver/recast.py
Comment thread server/recceiver/cf/processor.py Outdated
Comment thread server/recceiver/application.py Outdated
Copy link
Copy Markdown
Contributor

@jacomago jacomago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can get better testing for this.

I'm thinking we could make the multi-ioc container tests could have maxActive = 2, since there are 4 iocs.
Would also be nice to have an IOC with more channels than the commitMax which is defaulted to 5000, could simply set it to 1 and make sure tests work?

Comment thread server/recceiver/cf/processor.py
Comment thread server/recceiver/cf/processor.py
Comment thread server/recceiver/application.py Outdated
anderslindho and others added 4 commits May 18, 2026 12:53
The single global DeferredLock serialised all IOC commits behind one
queue. Under load, the lock depth determined how long IOCs waited to
be marked active or inactive, and one stuck commit blocked every other.
Per-IOC locks let different IOCs commit in parallel while keeping
same-IOC transactions serialised.

_state_lock (threading.Lock) guards the shared iocs/channel_ioc_ids
dicts against concurrent thread writes. The CF push receives a targeted
snapshot of the channels relevant to this commit (records being deleted
plus the IOC's current channel set) so snapshot cost scales with the
commit rather than total channel count.

_ioc_channels tracks which channels belong to each IOC for O(1)
disconnect cost; registration deduplicates to prevent double-counting.

stopService drains all in-flight per-IOC locks before running the
clean_on_stop sweep, preventing Active pushes from racing the sweep.

The commit path moves to @inlineCallbacks so retry waits use
task.deferLater — no sleeping worker threads between attempts.

Also introduces CFUpdateAbortedError to distinguish exhausted CF
retries from cancellation, and fixes chain_error to surface unexpected
exceptions rather than swallowing them.

Co-authored-by: Sky Brewer <sky.brewer@ess.eu>
deferToThread is monkeypatched to return synchronously-resolved
Deferreds so the full _commit_with_lock callback chain can be
exercised without a reactor. The prepare_result tuple matches the
(ioc_info, record_info_by_name, records_to_delete, iocs_snap,
ciids_snap) signature of _prepare_commit; the push phase is
controlled via _push_to_cf_async. Covers lock identity, iocid
routing, prune lifecycle, and all four chain_error paths.

Co-authored-by: Sky Brewer <sky.brewer@ess.eu>
The global lock is gone; update demo.conf to reflect that a stuck
pushAlwaysRetry now only holds that IOC's lock rather than blocking all
commits, fix an incorrect default value comment, and correct a
long-standing typo in the same pass.

Also rename single-letter CastFactory identifiers (P/P2 → proto/waiting,
addr → _addr) to satisfy linter, and correct the _stop_service_with_lock
docstring which still claimed the lifecycle lock prevents concurrent
commits (commits now use per-IOC locks and are drained explicitly before
the clean_on_stop sweep).
application.py unconditionally overwrote session.trlimit with the config
value, which defaulted to 0 (no limit), silently negating the trlimit=5000
class default added in 89be20c. Reference CollectionSession.trlimit directly
so the default cannot silently drift if the class default ever changes,
and add a regression test.
@anderslindho anderslindho force-pushed the feat/per-ioc-locking branch from 538b445 to c51af3e Compare May 18, 2026 10:57
Twisted's default thread pool size is 10. With maxActive IOC commits
in-flight, each holding a pool thread for the CF HTTP call, the pool
becomes the bottleneck before maxActive connections are ever fully
utilised. Set the pool to maxActive + 10 (headroom for clean_service
and other thread users) so throughput scales with the configured
connection limit rather than the thread pool default.
@anderslindho anderslindho force-pushed the feat/per-ioc-locking branch from c51af3e to d35922d Compare May 19, 2026 07:33
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@jacomago jacomago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think we should have more integration tests where the ioc count is purposefully above the configuration maxActive etc. Maybe even just alter the configuraiton for the multi-ioc tests to have maxActive =1 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants