-
Notifications
You must be signed in to change notification settings - Fork 23
RSDK-10579: Robot client update #425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just a general question, but does this render #424 obsolete? |
|
||
boost::log::core::get()->add_sink(console_sink_); | ||
VIAM_SDK_LOG(debug) << "Initialized console logging"; | ||
void LogManager::enable_console_logging() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really follow the logging changes too closely, so apologies if this is covering old ground, but what are the safety / concurrency guarantees around enabling and disabling console logging? If I have many RobotClient's coming into and out of existence in multiple threads, does it work right? It looks like the RobotClient::impl
destructor puts back console logging. Does that mean there can only be one RobotClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the RobotClient::impl destructor puts back console logging.
That's right. It's a narrow window, and might all just happen with the }
of main
, but if someone destroys a RobotClient
it's no longer sending logs to RDK, so we fall back to printing logs on the console. If we're running as a module, this will revert us back to the ugly \_
printing that was in effect before the logging PR
Does that mean there can only be one RobotClient?
Paraphrasing a discussion with @stuqdog , and I'm going to add a bunch of explanatory comments about this stuff. But basically the key is that non-default destruction in RobotClient::impl::~impl
is wrapped in an if(log_sink)
which...
- only happens if RDK logging is enabled,
- which only happens if
RobotClient::connect_logging
is called - which is a
private
method only accessible fromModuleService
(the only place it gets called) - which is only getting called by us when RDK spins up an executable as a module
so under all these hypotheses, we can say that uniqueness and non-concurrency is guaranteed, and that the in-and-out-of-existence multiple threads thing can't happen in this case
src/viam/sdk/robot/client.cpp
Outdated
@@ -295,7 +275,7 @@ std::shared_ptr<RobotClient> RobotClient::with_channel(ViamChannel channel, | |||
robot->refresh_interval_ = options.refresh_interval(); | |||
robot->should_refresh_ = (robot->refresh_interval_ > 0); | |||
if (robot->should_refresh_) { | |||
robot->threads_.emplace_back(&RobotClient::refresh_every, robot); | |||
robot->threads_.emplace_back(&RobotClient::refresh_every, robot.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if this ends up being a duplicate comment: I know I left a note here but github seems to have eaten it somehow.
I think this means that the RobotClient will no longer be rendered immortal by its background refresh threads. So, doesn't that mean that this PR would need to include the changes to switch to a condvar and notification in order to interrupt and join them? I think this goes to my question of how this PR relates to #424 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the key here is "no longer rendered immortal" -- I don't think that was ever the case even beforehand. I mean of course our old approach was UB, what with ~thread
getting called manually in close
, but even without that, automatic destructors would be destructing vector<thread>
before destructing the RobotClient
itself, so all the shared_ptr
would be gone by then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM if the intention is that this land in a state where tearing down the client will endure a wait until the refresh thread wakes again. I think I'd want to see that work ticketed and acted on sooner rather than later though.
Alterntively, this review could go another round and switch over to using a condvar to effect the sleep, and then the thread could be awakened by signaling the condvar.
@@ -23,6 +23,8 @@ bool isStatusCancelled(int status) noexcept { | |||
return status == ::grpc::StatusCode::CANCELLED; | |||
} | |||
|
|||
void set_name(...) {} // NOLINT(cert-dcl50-cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cert warning is funny. If you make it a template with a parameter pack instead, I assume that doesn't work because it breaks the overload resolution / template instantiation balance that you are leveraging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly! a variadic is always considered the worst/last choice for overload resolution so it's needed in this case
for (auto& thread : threads_) { | ||
thread.join(); | ||
if (refresh_thread_.joinable()) { | ||
refresh_thread_.join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this join
will need to block until the refresh thread awakens from std::this_thread::sleep_for(refresh_interval_)
in refresh_every
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, recording for posterity that we will try to swap with a proper cond var wait but merging now to fix the latest release
This is an update/minor overhaul to robot client that fixes some of the arch issues (and hopefully crashes) that were showing up after the logging PR. Tbd if the
atomic<bool>
should be replaced with something else.