ARTEMIS-6008 Prevent CriticalAnalyzer from killing the broker while failing over by sijonelis · Pull Request #6533 · apache/artemis

sijonelis · 2026-06-22T10:16:36Z

issues.apache.org/jira/browse/ARTEMIS-3664 fixed an issue with Critical Analyzer killing the broker during slow startup.

However, during live/backup pair failover when the backup node is in the STARTED state, the Critical Analyzer would still evaluate the node as unhealthy under large journal/slow network disk conditions and kill it.

This change proposes to switch the Critical Analyzer to LOG mode while the backup node is activating.

I have debated between adding a new ServerState and adding a boolean flag (as done in this PR). Finally, I have decided to go with the flag as this change is a minor bug fix and should not warrant a significantly wider change of adding a new ServerState, however I would gladly do that if asked by the maintainers.

jbertram · 2026-06-22T14:44:09Z

I think instead of checking the server state you could just check ActiveMQServer#isActive and that would work for both the primary and the backup use-cases and you wouldn't need to add another field. What do you think?

clebertsuconic · 2026-06-23T02:20:25Z

can you rebase your commit, without a merge commit

basicall:

git pull --rebase upstream main

And squash your commits as a single one?

sijonelis · 2026-06-23T02:22:25Z

I think instead of checking the server state you could just check ActiveMQServer#isActive and that would work for both the primary and the backup use-cases and you wouldn't need to add another field.

This is a great suggestion @jbertram. I have tried playing around with uglier stuff (such as adding a new ServerState) before settling on the new variable as the "least invasive" approach, but your suggestion is exactly what I was looking for all along.

Adopted it and also rebased the commit.

Note that i had to use reflection to access the latch it in the test, but I saw this approach used elsewhere in the project's tests so it shouldnt be an issue to use it here as well

…ctivating after cluster primary is killed

jbertram · 2026-06-24T02:49:30Z

      final CriticalAnalyzerPolicy criticalAnalyzerPolicy = configuration.getCriticalAnalyzerPolicy();
      CriticalAction criticalAction = switch (criticalAnalyzerPolicy) {
         case HALT -> criticalComponent -> {
-            if (ActiveMQServerImpl.this.state == SERVER_STATE.STARTING) {


I think it would be more straight-forward to just change all the checks of state to use isActive() instead of adding a new method that does the same.

Fair enough, done!

jbertram · 2026-06-24T03:38:10Z

Technically speaking reflection will work here. However, plumbing already exists to test this in a more robust way, e.g.:

private void testTooLongToStart(CriticalAnalyzerPolicy policy) throws Exception { try (AssertionLoggerHandler loggerHandler = new AssertionLoggerHandler()) { ConfigurationImpl configuration = new ConfigurationImpl(); configuration.setCriticalAnalyzerPolicy(policy); configuration.setCriticalAnalyzer(true); configuration.setPersistenceEnabled(false); ActiveMQServerImpl server = new ActiveMQServerImpl(configuration); addServer(server); CountDownLatch latch = new CountDownLatch(1); server.registerActivateCallback(new ActivateCallback() { @Override public void preActivate() { try { latch.await(); } catch (InterruptedException e) { throw new RuntimeException(e); } } }); CompletableFuture.runAsync(() -> { try { server.start(); } catch (Exception e) { e.printStackTrace(); } }); Wait.waitFor(() -> server.getCriticalAnalyzer() != null); CriticalAnalyzerAccessor.fireActions(server.getCriticalAnalyzer(), new CriticalComponentImpl(server.getCriticalAnalyzer(), 2)); assertTrue(loggerHandler.findText("AMQ224116")); assertFalse(server.isActive()); // should not be changed latch.countDown(); server.stop(); } }

By starting the broker async and forcing it to stall using a custom activation callback we don't need any reflection and we test the actual condition rather than an approximation.

Agree. Updated the test

… method

jbertram · 2026-06-24T13:37:41Z

Nice work @sijonelis. Thanks for the PR!

ViliusS · 2026-06-24T14:06:49Z

We talked with @sijonelis about this PR offline. We wanted to include one more change before merging. The following log message is now technically not 100% correct:

@Message(id = 224116, value = "The component {0} is not responsive during start up. The Server may be taking too long to start", format = Message.Format.MESSAGE_FORMAT)

maybe change it to

@Message(id = 224116, value = "The component {0} is not responsive. The Server may be taking too long to load", format = Message.Format.MESSAGE_FORMAT)

@jbertram do you think it's worth opening additional PR for that?

jbertram · 2026-06-24T17:25:59Z

@jbertram do you think it's worth opening additional PR for that?

Sure. Do you plan on sending a PR?

ViliusS · 2026-06-24T18:13:41Z

Yes, @sijonelis or me will do that.

sijonelis changed the title ~~Artemis 6008~~ ARTEMIS-6008 Prevent CriticalAnalyzer from killing the broker while failing over Jun 22, 2026

sijonelis force-pushed the ARTEMIS-6008 branch from 783cbee to 5ec680f Compare June 23, 2026 02:28

ARTEMIS-6008 Prevent CriticalAnalyzer from killing the broker while a…

66294f5

…ctivating after cluster primary is killed

sijonelis force-pushed the ARTEMIS-6008 branch from 5ec680f to 66294f5 Compare June 23, 2026 03:43

jbertram reviewed Jun 24, 2026

View reviewed changes

jbertram requested changes Jun 24, 2026

View reviewed changes

sijonelis added 2 commits June 24, 2026 13:06

ARTEMIS-6008 Update the check to be in place rather than use a helper…

ebcc9f5

… method

ARTEMIS-6008 Update test case

0f51369

sijonelis requested a review from jbertram June 24, 2026 05:23

jbertram merged commit 6d91acd into apache:main Jun 24, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARTEMIS-6008 Prevent CriticalAnalyzer from killing the broker while failing over#6533

ARTEMIS-6008 Prevent CriticalAnalyzer from killing the broker while failing over#6533
jbertram merged 3 commits into
apache:mainfrom
sijonelis:ARTEMIS-6008

sijonelis commented Jun 22, 2026 •

edited

Loading

Uh oh!

jbertram commented Jun 22, 2026

Uh oh!

clebertsuconic commented Jun 23, 2026

Uh oh!

sijonelis commented Jun 23, 2026 •

edited

Loading

Uh oh!

jbertram Jun 24, 2026

Uh oh!

sijonelis Jun 24, 2026

Uh oh!

jbertram Jun 24, 2026

Uh oh!

sijonelis Jun 24, 2026

Uh oh!

Uh oh!

jbertram commented Jun 24, 2026

Uh oh!

ViliusS commented Jun 24, 2026

Uh oh!

jbertram commented Jun 24, 2026

Uh oh!

ViliusS commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

sijonelis commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbertram commented Jun 22, 2026

Uh oh!

clebertsuconic commented Jun 23, 2026

Uh oh!

sijonelis commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbertram Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

sijonelis Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

jbertram Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

sijonelis Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jbertram commented Jun 24, 2026

Uh oh!

ViliusS commented Jun 24, 2026

Uh oh!

jbertram commented Jun 24, 2026

Uh oh!

ViliusS commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sijonelis commented Jun 22, 2026 •

edited

Loading

sijonelis commented Jun 23, 2026 •

edited

Loading