-
Notifications
You must be signed in to change notification settings - Fork 310
[searchfox_api] Ignore the search results that are scattered in too many files #4854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi! 🤖 The test below is automatically generated and serves as a regression test for this PR because it:
def test_search_ignores_excessive_paths():
import logging
from unittest.mock import patch, MagicMock
from bugbug.code_search.searchfox_api import search
# Set up logging to capture warnings
logger = logging.getLogger('bugbug.code_search.searchfox_api')
logger.setLevel(logging.WARNING)
log_capture = MagicMock()
logger.addHandler(logging.StreamHandler(log_capture))
# Mock the utils.get_session().get() to return a mock response
mock_response = MagicMock()
mock_response.text = """
var results = {
"normal": {
"Definitions (symbol_name)": [
{"path": "file1.js", "lines": [{"line": "function symbol_name() {}"}]},
{"path": "file2.js", "lines": [{"line": "function symbol_name() {}"}]},
{"path": "file3.js", "lines": [{"line": "function symbol_name() {}"}]},
{"path": "file4.js", "lines": [{"line": "function symbol_name() {}"}]},
{"path": "file5.js", "lines": [{"line": "function symbol_name() {}"}]},
{"path": "file6.js", "lines": [{"line": "function symbol_name() {}"}]},
{"path": "file7.js", "lines": [{"line": "function symbol_name() {}"}]},
{"path": "file8.js", "lines": [{"line": "function symbol_name() {}"}]},
{"path": "file9.js", "lines": [{"line": "function symbol_name() {}"}]},
{"path": "file10.js", "lines": [{"line": "function symbol_name() {}"}]},
{"path": "file11.js", "lines": [{"line": "function symbol_name() {}"}]}
]
}
};
"""
mock_response.raise_for_status = MagicMock()
with patch('bugbug.utils.get_session', return_value=MagicMock(get=MagicMock(return_value=mock_response))):
# Call the search function with a symbol name that would trigger too many paths
result = search('dummy_commit_hash', 'symbol_name')
# Assert that the result is empty due to too many paths
assert result == []
# Assert that a warning was logged
log_capture.write.assert_any_call("Too many paths found for symbol symbol_name: 11 paths. Skipping this symbol.\n") If you find this regression test useful, feel free to insert it to your test suite. This is part of our research at the ZEST group of University of Zurich in collaboration with Mozilla. Click to see which lines were covered.diff --git a/bugbug/code_search/searchfox_api.py b/bugbug/code_search/searchfox_api.py
--- a/bugbug/code_search/searchfox_api.py
+++ b/bugbug/code_search/searchfox_api.py
@@ -5,6 +5,7 @@
import io
import json
+import logging# ✅ Covered by above test
import re
from typing import Iterable, Literal
@@ -19,6 +20,8 @@
register_function_search,
)
from bugbug.repository import SOURCE_CODE_TYPES_TO_EXT
+# ✅ Covered by above test
+logger = logging.getLogger(__name__)# ✅ Covered by above test
def get_line_number(elements: Iterable[HtmlElement], position: Literal["start", "end"]):
@@ -201,6 +204,13 @@
definitions.append(value)
paths = list(set(definition["path"] for definition in definitions))
+ if len(paths) > 10:# ✅ Covered by above test
+ logger.warning(# ✅ Covered by above test
+ "Too many paths found for symbol %s: %d paths. Skipping this symbol.",# ✅ Covered by above test
+ symbol_name,# ✅ Covered by above test
+ len(paths),# ✅ Covered by above test
+ )# ✅ Covered by above test
+ return []# ✅ Covered by above test
return sum((get_functions(commit_hash, path, symbol_name) for path in paths), [])
Line coverage* achieved: 100.0% * Line coverage is calculated over the lines added in this PR. |
@@ -209,6 +209,13 @@ def search(commit_hash, symbol_name): | |||
definitions.append(value) | |||
|
|||
paths = list(set(definition["path"] for definition in definitions)) | |||
if len(paths) > 10: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard would it be to collect statistics about this? I guess we don't have an infrastructure in place for that, we should build it at some point
Will mitigate, but not fix, #4842