-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Check for large runners #15471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for large runners #15471
Conversation
Technically, we could do this: diff --git a/.github/workflows/compile-queries.yml b/.github/workflows/compile-queries.yml
index 430807c95d..ddb3aa794f 100644
--- a/.github/workflows/compile-queries.yml
+++ b/.github/workflows/compile-queries.yml
@@ -12,19 +12,11 @@ jobs:
choose-runner:
runs-on: ubuntu-latest
outputs:
- mac-12: ${{ steps.results.outputs.MAC_12 }}
- ubuntu-latest: ${{ steps.results.outputs.UBUNTU_LATEST }}
- windows-latest: ${{ steps.results.outputs.WINDOWS_LATEST }}
+ mac-12: ${{ vars.use-large-mac-runners-for-speed && 'macos-12-xl' || 'macos-12' }}
+ ubuntu-latest: ${{ vars.use-large-linux-runners-for-speed && 'ubuntu-latest-xl' || 'ubuntu-latest' }}
+ windows-latest: ${{ vars.use-large-win-runners-for-speed && 'windows-latest-xl' || 'windows-latest' }}
steps:
- - id: results
- run: |
- echo "UBUNTU_LATEST=$UBUNTU_LATEST" >> $GITHUB_OUTPUT
- echo "MAC_12=$MAC_12" >> $GITHUB_OUTPUT
- echo "WINDOWS_LATEST=$WINDOWS_LATEST" >> $GITHUB_OUTPUT
- env:
- UBUNTU_LATEST: ${{ vars.use-large-linux-runners-for-speed && 'ubuntu-latest-xl' || 'ubuntu-latest' }}
- MAC_12: ${{ vars.use-large-mac-runners-for-speed && 'macos-12-xl' || 'macos-12' }}
- WINDOWS_LATEST: ${{ vars.use-large-win-runners-for-speed && 'windows-latest-xl' || 'windows-latest' }}
+ - run: true
compile-queries:
needs: choose-runner
The downside is that then which runner is selected is absolutely opaque because outputs aren't logged anywhere and nothing in a job run explicitly shows the viewer the If a non-standard runner is selected, there actually is some information indicating it, but for people who don't use non-standard runners it might not be obvious:
(We used to use self-hosted runners a lot not many months ago and I've apparently quickly forgotten these things,, and even though I interact with this and other repositories occasionally, I clearly haven't remembered this detail.) |
ab54657
to
a666577
Compare
@angelapwen I don't suppose you could get someone to look over this one as well? With this (and the other), I have two failures, but they're things I can't see a way to address: In file included from external/binlog/include/binlog/PrettyPrinter.cpp:1:
In file included from external/binlog/include/binlog/PrettyPrinter.hpp:6:
In file included from external/binlog/include/binlog/Time.hpp:6:
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/chrono:2320:48: error: call to consteval function 'std::chrono::hh_mm_ss::_S_fractional_width' is not a constant expression
static constexpr unsigned fractional_width = {_S_fractional_width()}; https://github.com/check-spelling-sandbox/codeql/actions/runs/7724725863/job/21057437161
This workflow (Ruby: Run QL Tests) seems to consistently die. I don't understand why. https://github.com/check-spelling-sandbox/codeql/actions/runs/7724725861 qltest (1/2) My guess is that this one involves exceeding some resource allocations and being killed by a manager. But I can't tell (it's consistent, I've tried rerunning it). The only outstanding one is: compile-queries which has been running for 4 hours (it's possible it will not fit in the 6 hour window). If at all possible, it'd be great if this could be run using a matrix to get some parallel runners -- forks aren't likely to pay for large runners, but they can generally use a matrix of additional runners (my org does hit limits from some really rude repositories, but usually I'm only dealing w/ one or two of those repositories at a time). |
The |
I've seen some workflows that actively shoot a bunch of ancillary services in order to free memory :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! We understand the problem and agree that the current experience might not be optimal. We appreciate your efforts to improve the developer experience for CodeQL contributors.
We aim to avoid adding complexity and maintenance to this repo as much as possible. Therefore, we suggest changing the approach so the original repo does not need to add any variables to continue working as usual. We feel that accepting this PR as-is will add a (admittedly small, but >0) maintenance burden to this repo, which might set a precedent that could create maintenance issues on our end in the long run.
We suggest to explore the possibility of approaching this in a way that the extra maintenance, if any, falls on the side of the forkers, not the original repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thanks for your time reviewing this and addressing our concerns. This approach removes some the maintenance burden we want to avoid, that's great 👍🏾
Still, our internal tests are showing that actually this approach will not fix the DevExp for forkers, as using non-large runners to run those workflows result in OOM kills or timeouts most of the times. We appreciate it can have the opposite results: confusing forkers with ❌ tests that actually are not broken, just run on inadequate runners.
To move forward, we suggest to consider disabling entirely those workflows for non github
owned forks, using something like if: ${{ github.repository_owner == 'github' }}
. That would mitigate the original issue of leaving workflows pending in non-github forks, and as a secondary (but much sought) benefit, the code will be simpler.
I have a fix for the oom thing in #15564, and I haven't hit timeouts... |
Variable | Runner to use if set to `1` -|- `use_large_linux_runners_for_speed` | `ubuntu-latest-xl` `use_large_mac_runners_for_speed` | `macos-12-xl` `use_large_win_runners_for_speed` | `windows-latest-xl`
Right, there are not timeouts, that's incorrect on my part. Our tests show checks taking almost 3 hours to run with small runners, and some OOMs, not timeouts. |
The resource usage is definitely a thing. The fact that personal repositories have 0 monitoring ability is also a thing (but GitHub should just fix that) -- I'm doing all the work for this PR in an org so that I can at least see things. I personally like being able to test things in my forks before I make public facing PRs to upstream repositories -- I'm human, and I make mistakes, but I prefer to not show my mistakes too publicly if I can avoid doing so. That said, it would be possible for someone like me to change the (But definitely, having the auto-tune change integrated would make things much less frustrating, as that side is quite surprising and it was not particularly obvious how to fix it.) |
Agreed! I'd suggest then we close this PR in favor of #15584 - we can always revisit this decision in the future if any of the mentioned premises changes. |
fixes #15464
1
use_large_linux_runners_for_speed
ubuntu-latest-xl
use_large_mac_runners_for_speed
macos-12-xl
use_large_win_runners_for_speed
windows-latest-xl
For simplicity, I'm copy-pasting the entire block to each workflow that needs it, this means that updating the rules can be done w/ a search+replace instead of worrying about things getting out of sync. The cost of checking each of them is minimal (especially vs the cost of running on a slow runner or waiting forever for a runner that will never happen).