Skip to content

RSDK-10579: Robot client update #425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 30, 2025

Conversation

lia-viam
Copy link
Collaborator

This is an update/minor overhaul to robot client that fixes some of the arch issues (and hopefully crashes) that were showing up after the logging PR. Tbd if the atomic<bool> should be replaced with something else.

@lia-viam lia-viam requested a review from a team as a code owner April 28, 2025 20:06
@lia-viam lia-viam requested review from njooma and stuqdog and removed request for a team April 28, 2025 20:06
@acmorrow
Copy link
Member

Just a general question, but does this render #424 obsolete?


boost::log::core::get()->add_sink(console_sink_);
VIAM_SDK_LOG(debug) << "Initialized console logging";
void LogManager::enable_console_logging() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really follow the logging changes too closely, so apologies if this is covering old ground, but what are the safety / concurrency guarantees around enabling and disabling console logging? If I have many RobotClient's coming into and out of existence in multiple threads, does it work right? It looks like the RobotClient::impl destructor puts back console logging. Does that mean there can only be one RobotClient?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the RobotClient::impl destructor puts back console logging.

That's right. It's a narrow window, and might all just happen with the } of main, but if someone destroys a RobotClient it's no longer sending logs to RDK, so we fall back to printing logs on the console. If we're running as a module, this will revert us back to the ugly \_ printing that was in effect before the logging PR

Does that mean there can only be one RobotClient?

Paraphrasing a discussion with @stuqdog , and I'm going to add a bunch of explanatory comments about this stuff. But basically the key is that non-default destruction in RobotClient::impl::~impl is wrapped in an if(log_sink) which...

  • only happens if RDK logging is enabled,
  • which only happens if RobotClient::connect_logging is called
  • which is a private method only accessible from ModuleService (the only place it gets called)
  • which is only getting called by us when RDK spins up an executable as a module

so under all these hypotheses, we can say that uniqueness and non-concurrency is guaranteed, and that the in-and-out-of-existence multiple threads thing can't happen in this case

@@ -295,7 +275,7 @@ std::shared_ptr<RobotClient> RobotClient::with_channel(ViamChannel channel,
robot->refresh_interval_ = options.refresh_interval();
robot->should_refresh_ = (robot->refresh_interval_ > 0);
if (robot->should_refresh_) {
robot->threads_.emplace_back(&RobotClient::refresh_every, robot);
robot->threads_.emplace_back(&RobotClient::refresh_every, robot.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if this ends up being a duplicate comment: I know I left a note here but github seems to have eaten it somehow.

I think this means that the RobotClient will no longer be rendered immortal by its background refresh threads. So, doesn't that mean that this PR would need to include the changes to switch to a condvar and notification in order to interrupt and join them? I think this goes to my question of how this PR relates to #424 .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the key here is "no longer rendered immortal" -- I don't think that was ever the case even beforehand. I mean of course our old approach was UB, what with ~thread getting called manually in close, but even without that, automatic destructors would be destructing vector<thread> before destructing the RobotClient itself, so all the shared_ptr would be gone by then

@lia-viam
Copy link
Collaborator Author

also @acmorrow re: #424 the answer is maybe, going to run some tests after the changes in this PR are polished/finalized

@lia-viam lia-viam requested a review from acmorrow April 29, 2025 18:18
Copy link
Member

@acmorrow acmorrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM if the intention is that this land in a state where tearing down the client will endure a wait until the refresh thread wakes again. I think I'd want to see that work ticketed and acted on sooner rather than later though.

Alterntively, this review could go another round and switch over to using a condvar to effect the sleep, and then the thread could be awakened by signaling the condvar.

@@ -23,6 +23,8 @@ bool isStatusCancelled(int status) noexcept {
return status == ::grpc::StatusCode::CANCELLED;
}

void set_name(...) {} // NOLINT(cert-dcl50-cpp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cert warning is funny. If you make it a template with a parameter pack instead, I assume that doesn't work because it breaks the overload resolution / template instantiation balance that you are leveraging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly! a variadic is always considered the worst/last choice for overload resolution so it's needed in this case

for (auto& thread : threads_) {
thread.join();
if (refresh_thread_.joinable()) {
refresh_thread_.join();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this join will need to block until the refresh thread awakens from std::this_thread::sleep_for(refresh_interval_) in refresh_every.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, recording for posterity that we will try to swap with a proper cond var wait but merging now to fix the latest release

@lia-viam lia-viam merged commit 565cf4c into viamrobotics:main Apr 30, 2025
4 checks passed
@lia-viam lia-viam deleted the robot-client-update branch April 30, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants