-
Notifications
You must be signed in to change notification settings - Fork 81
fix: fixed flaky/failing acceptance tests #3777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: fixed flaky/failing acceptance tests #3777
Conversation
Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
e09476b
to
23c5fb7
Compare
Test Results 21 files ±0 260 suites ±0 34m 58s ⏱️ +17s For more details on these failures, see this check. Results for commit 38b18bc. ± Comparison against base commit bcd473b. ♻️ This comment has been updated with latest results. |
Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
23c5fb7
to
1199e06
Compare
await relay.pollForValidTransactionReceipt(createChildTx.hash); | ||
|
||
// added delay to wait for balances properly settled on MN | ||
await Utils.wait(1500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a double wait? If createChildTx
is finished, doesn't it mean that balances are already settled on the MN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Luis, thanks for the question. Without the extra wait there still occasional balance mismatches on production networks. I believe that once a transaction reaches consensus, the balances are updated in the Merkle tree almost instantly however for the Mirror Node to reflect those updates, there’s still a slight delay. Without the wait it should work fine, but given the busy traffic on production networks, it seems the Mirror Node sometimes takes longer to catch up, leading the flakyness we're dealing with. That's why I added the extra wait to stablize the results. Not ideal but flexible for different production environemnt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, really helpful. Maybe we should expand the comment to include a bit of this explanation?
Also, we don't remove the flakiness entirely, this is just a mitigation. What if we poll the MN to ensure the mentioned transaction is visible from it? This way, we are 100% sure to have solved the problem. I remember seeing this mechanism in tests in the Smart Contract repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you’re right—a comment would definitely help.
As for the extra wait doesn’t eliminate the flakiness entirely, You’re correct that it’s more of a mitigation to improve stability. If the production environment needs more than 1500 ms, the issue can still occur, though that’s rare—but still possible.
Regarding polling the Mirror Node to confirm the transaction is visible, that’s actually what’s happening under the hood with await relay.pollForValidTransactionReceipt(initialFundsTx.hash)
, which essentially calls eth_getTransactionReceipt
to fetch the receipt from the Mirror Node.
Maybe we should consider polling the Mirror Node for updated balances instead? Kinda feel like I have done this before but it was quite cumbersome somehow but let me see what I can do.
You're right we should aim for a more robust fix rather than just a mitigation. Will push in an update for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced offline with @acuarica and discovered that the flakiness was caused by a cached value of the latest block number. The new solution implements a polling mechanism to ensure the block is correctly updated after contract execution, resulting in stable and consistent outcomes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, going to approve it.
Signed-off-by: Logan Nguyen <logan.nguyen@swirldslabs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice stuff!
for (let i = 0; i < 5; i++) { | ||
if (blockNumAfterCreateChildTx > blockNumBeforeCreateChildTx) { | ||
break; | ||
} | ||
await Utils.wait(1500); | ||
blockNumAfterCreateChildTx = await await relay.call(RelayCalls.ETH_ENDPOINTS.ETH_BLOCK_NUMBER, [], requestId); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of things: remove double await
, include console.log
have feedback when block number is properly updated, move the blockNumAfterCreateChildTx
assignment to have it only once.
for (let i = 0; i < 5; i++) { | |
if (blockNumAfterCreateChildTx > blockNumBeforeCreateChildTx) { | |
break; | |
} | |
await Utils.wait(1500); | |
blockNumAfterCreateChildTx = await await relay.call(RelayCalls.ETH_ENDPOINTS.ETH_BLOCK_NUMBER, [], requestId); | |
} | |
for (let i = 0; i < 5; i++) { | |
blockNumAfterCreateChildTx = await relay.call(RelayCalls.ETH_ENDPOINTS.ETH_BLOCK_NUMBER, [], requestId); | |
if (blockNumAfterCreateChildTx > blockNumBeforeCreateChildTx) { | |
console.log("Block number updated succesfully") | |
break; | |
} | |
await Utils.wait(1500); | |
} |
@@ -72,7 +72,7 @@ describe('@api-batch-2 RPC Server Acceptance Tests', function () { | |||
const FEE_SCHEDULE_FILE_CONTENT_UPDATED = | |||
'0a280a0a08541a061a0440a8953a0a0a08061a061a0440889d2d0a0a08071a061a0440b0b63c120208011200'; // Eth gas = 953000 | |||
|
|||
let blockNumberAtStartOfTests = 0; | |||
let blockNumAfterCreateChildTx = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move this declaration closer to the loop
Description:
"eth_getBalance" with block number in the last 15 minutes for account that has performed contract deploys/calls
test in batch2 due to race condition during initializationRelated issue(s):
Fixes #3665
Fixes #3666
Notes for reviewer:
Checklist