Skip to content

feat(ckbtc): Add get_utxos_cache to reduce latency of update_balance calls #4788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ninegua
Copy link
Member

@ninegua ninegua commented Apr 11, 2025

XC-314: ckBTC: Cache update_balance results

The call to the bitcoin canister's get_utxos method takes up the majority time of a update_balance call.
Sometimes a client may call update_balance repeatedly, which adds to the workload of the ckBTC minter canister.
The result of get_utxos would remain unchanged until the bitcoin canister receives a new block, which makes it a good candidate for caching.

This PR implements an internal get_utxos_cache that will reuse previous call results (if available and not expired) without making new get_utxos calls. Entries are expired based on a pre-configured get_utxos_cache_expiration setting.

Caching get_utxos results is safe for update_balance because:

  1. UTXOs are immutable data, and their values would never change.
  2. The set of UTXOs are unique to an address, and never shared between addresses.
  3. The update_balance call is looking for new UTXOs from the result, so re-using older results only delays the finding. In particular, cached results may include transaction outputs that could have already been spent, but this won't affect update_balance because only the minter could spend them, which means they are already known and not new.

@ninegua ninegua requested a review from a team as a code owner April 11, 2025 13:39
@github-actions github-actions bot added the feat label Apr 11, 2025
@ninegua ninegua marked this pull request as draft April 11, 2025 13:39
@ninegua ninegua removed the request for review from a team April 11, 2025 13:40
@ninegua ninegua added the CI_ALL_BAZEL_TARGETS Runs all bazel targets and uploads them to S3 label Apr 15, 2025
@ninegua ninegua marked this pull request as ready for review April 15, 2025 06:11
@ninegua ninegua requested a review from a team April 15, 2025 06:11
@ninegua ninegua requested review from a team as code owners April 15, 2025 06:14
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this pull request affects the behavior of any canister owned by
the Governance team, remember to update the corresponding
unreleased_changes.md file(s).

To acknowldge this reminder (and unblock the PR), dismiss this
code review by going to the bottom of the pull request page, and
supply one of the following reasons:

  1. Done.

  2. No canister behavior changes.

github-actions[bot]

This comment was marked as duplicate.

@ninegua ninegua changed the base branch from paulliu/ckbtc-cleanup-types to master April 15, 2025 06:15
@ninegua ninegua removed request for a team April 15, 2025 06:15
@ninegua ninegua removed request for a team April 15, 2025 06:15
@ninegua ninegua force-pushed the paulliu/ckbtc-utxo-cache branch from c5fde3e to f5079ba Compare April 15, 2025 07:11
@ninegua ninegua dismissed github-actions[bot]’s stale review April 15, 2025 07:17

No NNS governance canister behavior changes.

@ninegua ninegua changed the title feat(ckbtc): Add a get_utxos_cache to reduce latency of update_balance calls feat(ckbtc): Add get_utxos_cache to reduce latency of update_balance calls Apr 15, 2025
Copy link
Contributor

@mducroux mducroux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR @ninegua! Some minor comments on my side.

Copy link
Member Author

@ninegua ninegua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mducroux for a careful review! I should have addressed all the concerns, please let me know if the changes look good to you. Thanks!

Copy link
Contributor

@mducroux mducroux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @ninegua! Thanks for addressing my comments

@ninegua ninegua requested review from a team, lpahlavi and gregorydemay April 15, 2025 15:28
Copy link
Member

@gregorydemay gregorydemay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @ninegua for this PR! Mostly some minor comments and understanding questions, otherwise code looks very good!

@@ -208,6 +208,9 @@ type InitArgs = record {

/// The canister id of the KYT canister (deprecated, use btc_checker_principal instead).
kyt_principal: opt principal;

/// The expiration duration (in nanoseconds) for cached entries in the get_utxos cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit1: any reason why to have this in nanoseconds? We don't have that precision, so I would say something like seconds should be good enough no? I think this would also help when verifying a proposal to ensure that we don't have an ungodly amount of zeroes to count.

nit2: for clarity when verifying for example a proposal, we could add the unit as a suffix to the field name, e.g. get_utxos_cache_expiration_seconds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Changed to seconds.

@@ -1340,3 +1352,96 @@ impl From<u64> for Timestamp {
Self(timestamp)
}
}

#[derive(Clone, Eq, PartialEq, Ord, PartialOrd, Debug, CandidType, Deserialize, Serialize)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understanding question: why do we need CandidType and Serialize+Deserialize? I would expect this to just be stored on the heap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! They are not needed. Removed.

@@ -1340,3 +1352,96 @@ impl From<u64> for Timestamp {
Self(timestamp)
}
}

#[derive(Clone, Eq, PartialEq, Ord, PartialOrd, Debug, CandidType, Deserialize, Serialize)]
struct Timestamped<Inner> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

len
}

pub fn set_expiration(&mut self, expiration: Duration) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method potentially invalidate a lot of entries (if the expiration is much shorter than the previous one). Shouldn't we call prune inside the method?

Comment on lines +178 to +179
s.get_utxos_cache.prune(*now);
s.get_utxos_cache.insert(req, res.clone(), *now)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Rust doc of CacheWithExpiration

/// More specifically, entries are inserted with a timestamp, and
/// then all existing entries with a timestamp less than `t - expiration` are removed before
/// the new entry is inserted.

I would have assumed that prune is actually part of insert. What about making it a single method? Right now I would need to remember to call prune before doing an insert which seems a bit error-prone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I had the prune inside insert. But later I removed it and forgot to change the documentation.

Although I understand where you are coming from, but decoupling insert and prune is more flexible. For example, in tests, I don't have to worry about the order of insertions, when making assertions on len().

Alternatively I could provide an additional method called insert_no_prune(), and let the regular insert() do the pruning before insertion. WDYT?

});
}
}
pub fn insert<T: Into<Timestamp>>(&mut self, key: Key, value: Value, now: T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as commented below I would make prune to be called in insert

@@ -156,6 +156,7 @@ pub async fn get_utxos<R: CanisterRuntime>(
runtime: &R,
) -> Result<GetUtxosResponse, CallError> {
async fn bitcoin_get_utxos<R: CanisterRuntime>(
now: &mut u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why having this parameter when now could be queried with runtime.time within the method bitcoin_get_utxos?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tests, we drive the time by mocking the runtime.time() function, and increment the time value in there. So whenever runtime.time() is called, time will change. I had to avoid unnecessary calls to runtime.time() here so as not to break existing tests. Frankly mocking time this way is quite fragile but I don't have a better idea. WDYT?

utxos.append(&mut response.utxos);
num_pages += 1;
}

observe_get_utxos_latency(utxos.len(), num_pages, source, start_time, runtime.time());
observe_get_utxos_latency(utxos.len(), num_pages, source, start_time, now);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we even register the latency in case of a cache hit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! If we do register, the histogram graph will have a bar of latency 0, maybe also a good comparison?

ninegua and others added 3 commits April 17, 2025 01:39
Co-authored-by: gregorydemay <112856886+gregorydemay@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI_ALL_BAZEL_TARGETS Runs all bazel targets and uploads them to S3 @cross-chain-team feat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants