feat(ckbtc): bump limit on concurrent withdrawals #4804

mducroux · 2025-04-14T09:51:48Z

XC-322 (ckBTC): Bump limit on concurrent withdrawals (1,000 -> 5,000).

These changes allow the ckBTC minter to process 5,000 concurrent BTC withdrawal requests, compared to 1,000 before. Additionally, this PR adds latency metrics for the sign_with_ecdsa management call (for both the success and failure cases) used in the withdrawal process. This will help track the behavior of tECDSA in case of load spikes.

…c to 5000

…alls

lpahlavi

Thanks for the PR @mducroux, overall LGTM! Just one minor comment from my side.

lpahlavi · 2025-04-14T09:57:39Z

rs/bitcoin/ckbtc/minter/tests/tests.rs

nit: I would add a failure testcase here

Good idea, I've added it in the unit tests in rs/bitcoin/ckbtc/minter/src/updates/tests.rs

gregorydemay

Thanks @mducroux for this PR! Mostly minor comments, the main one relates to the kind of buckets we need to measure the latency of tECDSA

gregorydemay · 2025-04-15T07:13:55Z

rs/bitcoin/ckbtc/minter/src/management.rs

@@ -328,6 +328,9 @@ pub async fn sign_with_ecdsa(
        sign_with_ecdsa, EcdsaCurve, EcdsaKeyId, SignWithEcdsaArgument,
    };

+    // Record start time of method execution for metrics


nit: I would remove that comment as I don't think it brings much (one can just check where that variable is used)

Suggested change

// Record start time of method execution for metrics

gregorydemay · 2025-04-15T07:20:35Z

rs/bitcoin/ckbtc/minter/src/metrics.rs

 thread_local! {
    pub static GET_UTXOS_CLIENT_CALLS: Cell<u64> = Cell::default();
    pub static GET_UTXOS_MINTER_CALLS: Cell<u64> = Cell::default();
    pub static UPDATE_CALL_LATENCY: RefCell<BTreeMap<NumUtxoPages,LatencyHistogram>> = RefCell::default();
    pub static GET_UTXOS_CALL_LATENCY: RefCell<BTreeMap<(NumUtxoPages, CallSource),LatencyHistogram>> = RefCell::default();
    pub static GET_UTXOS_RESULT_SIZE: RefCell<BTreeMap<CallSource,NumUtxosHistogram>> = RefCell::default();
+    pub static SIGN_WITH_ECDSA_LATENCY: RefCell<BTreeMap<MetricsResult, LatencyHistogram>> = RefCell::default();


Note that this will use the same values for BUCKETS_MS as for UPDATE_CALL_LATENCY; however, the situation is different:

UPDATE_CALL_LATENCY is used in to measure the latency inside update_balance, which may involve several cross-net calls to the bitcoin canister, hence the expected latency here could be quite high.

SIGN_WITH_ECDSA_LATENCY measures the latency of sign_with_ecdsa which is on the same subnet (since our canisters are on the fiduciary subnet) so that call does not involve cross-net calls and so the latency could be much lower. I think there it would be best to ask #eng-crypto what they think reasonable buckets would look like for end-to-end latency of tECDSA on the fiduciary subnet (cc @andreacerulli ).

The signature latency distribution from this dashboard shows values distributed roughly as follows: min: 0.5s, avg: 2s, max: 15s. This is for signatures coming from pre-generated values. Based on this, I would suggest 8 buckets with roughly exponential values: [1, 2, 4, 6, 8, 12, 20, inf]. WDYT @gregorydemay?

…poses

…anagement.rs to lib.rs

ninegua · 2025-04-16T02:58:33Z

rs/bitcoin/ckbtc/minter/src/updates/retrieve_btc.rs

@@ -22,7 +22,7 @@ use icrc_ledger_types::icrc1::transfer::{TransferArg, TransferError};
 use icrc_ledger_types::icrc2::transfer_from::{TransferFromArgs, TransferFromError};
 use num_traits::cast::ToPrimitive;

-const MAX_CONCURRENT_PENDING_REQUESTS: usize = 1000;
+const MAX_CONCURRENT_PENDING_REQUESTS: usize = 5000;


What is the reason to bump this? Did we observe that the previous limit (1000) was reached, or closed to be reached?

If I'm not mistaken, the canister message queue has a hard limit of 500. If each pending request represents a pending inter-canister call which takes up one spot in the queue (reservation for the return result), then it is already impossible to reach 1000.

Turns out that I was mistaken... the requests are actually batched with a max batch size of 100, and batches are signed and submitted every 5 seconds. So it is unlikely to hit the message queue limit, unless each ecdsa sign call takes too long to complete (and there is another limit on the total outstanding signature request imposed by threshold signing protocol).

Just another thought, maybe add some metrics to track failure to sign, and failure to send? It'll be interesting to see if we actually hit the limit of making signatures.

Failure to sign should already be there but I can add a failure to send metric indeed

mducroux added 2 commits April 11, 2025 11:37

feat(ckbtc): increase max concurrent pending requests for retrieve_bt…

7bb1a2e

…c to 5000

feat(ckbtc): add latency observation for sign_with_ecdsa management c…

6f430ed

…alls

mducroux requested a review from a team as a code owner April 14, 2025 09:51

github-actions bot added the feat label Apr 14, 2025

lpahlavi reviewed Apr 14, 2025

View reviewed changes

mducroux marked this pull request as draft April 14, 2025 12:58

gregorydemay requested review from a team, ninegua and gregorydemay April 15, 2025 07:10

gregorydemay reviewed Apr 15, 2025

View reviewed changes

mducroux added 3 commits April 15, 2025 13:39

feat(ckbtc): add runtime parameter to sign_with_ecdsa for testing pur…

019ec4f

…poses

test(ckbtc): add metrics observation tests for sign_with_ecdsa latency

cc4eb7c

refactor(ckbtc): move sign_with_ecdsa management canister call from m…

bfd9001

…anagement.rs to lib.rs

mducroux marked this pull request as ready for review April 15, 2025 14:23

ninegua reviewed Apr 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ckbtc): bump limit on concurrent withdrawals #4804

feat(ckbtc): bump limit on concurrent withdrawals #4804

mducroux commented Apr 14, 2025 •

edited by jira bot

Loading

lpahlavi left a comment

lpahlavi Apr 14, 2025

mducroux Apr 16, 2025

gregorydemay left a comment

gregorydemay Apr 15, 2025

gregorydemay Apr 15, 2025

mducroux Apr 15, 2025 •

edited

Loading

ninegua Apr 16, 2025 •

edited

Loading

ninegua Apr 16, 2025 •

edited

Loading

ninegua Apr 16, 2025

mducroux Apr 16, 2025

feat(ckbtc): bump limit on concurrent withdrawals #4804

Are you sure you want to change the base?

feat(ckbtc): bump limit on concurrent withdrawals #4804

Conversation

mducroux commented Apr 14, 2025 • edited by jira bot Loading

lpahlavi left a comment

Choose a reason for hiding this comment

lpahlavi Apr 14, 2025

Choose a reason for hiding this comment

mducroux Apr 16, 2025

Choose a reason for hiding this comment

gregorydemay left a comment

Choose a reason for hiding this comment

gregorydemay Apr 15, 2025

Choose a reason for hiding this comment

gregorydemay Apr 15, 2025

Choose a reason for hiding this comment

mducroux Apr 15, 2025 • edited Loading

Choose a reason for hiding this comment

ninegua Apr 16, 2025 • edited Loading

Choose a reason for hiding this comment

ninegua Apr 16, 2025 • edited Loading

Choose a reason for hiding this comment

ninegua Apr 16, 2025

Choose a reason for hiding this comment

mducroux Apr 16, 2025

Choose a reason for hiding this comment

mducroux commented Apr 14, 2025 •

edited by jira bot

Loading

mducroux Apr 15, 2025 •

edited

Loading

ninegua Apr 16, 2025 •

edited

Loading

ninegua Apr 16, 2025 •

edited

Loading