Description
Feature or enhancement
Proposal:
Summary
The re
module should add the string indexing pos
/endpos
optional parameters to the top-level convenience functions of re.search(), re.match(), re.fullmatch(), re.findall(), and re.finditer()
. This would enable users to use pos/endpos string indexing without needing to first compile the regex to a pattern.
>>> # Current state, requires compiling a pattern
>>> pattern = re.compile('abc')
>>> pattern.search('012abc678', pos=3, endpos=6)
<re.Match object; span=(3, 6), match='abc'>
>>> # Improved Functionality
>>> re.search('abc', '012abc678', pos=3, endpos=6)
<re.Match object; span=(3, 6), match='abc'>
Here's a sample diff that would match up with the underlying C functionality.
Rationale
There are a number of methods for in the Python Regex Pattern
class that support optional positional arguments (pos
/endpos
):
Pattern.search(string[, pos[, endpos]])
Pattern.match(string[, pos[, endpos]])
Pattern.fullmatch(string[, pos[, endpos]])
Pattern.findall(string[, pos[, endpos]])
Pattern.finditer(string[, pos[, endpos]])
Additionally, Python provides access to these pattern methods as top-level convenience functions in the module itself:
re.search()
re.match()
re.fullmatch()
re.findall()
re.finditer()
However, these top-level convenience functions do not support the optional positional arguments. If anyone wants to utilize the optional positional arguments, they must first compile a pattern with re.compile()
and then call the method with the optional argument.
But all the convenience functions do is 1) compile the pattern and then 2) call the method. Here's an example directly from the re.py source:
def match(pattern, string, flags=0):
"""Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).match(string)
Looking at the underlying C Code for these methods, the method defines pos
and endpos
as 0
and PY_SSIZE_T_MAX
respectively. It only changes the values if the arg parser detects the presence of either pos
or endpos
.
Example C code from match()
(indentation adjusted for readability):
static PyObject *
_sre_SRE_Pattern_match(PatternObject *self, PyTypeObject *cls, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames)
{
(...)
Py_ssize_t pos = 0;
Py_ssize_t endpos = PY_SSIZE_T_MAX;
(...)
pos = ival;
(...)
endpos = ival;
(...)
return_value = _sre_SRE_Pattern_match_impl(self, cls, string, pos, endpos);
Equivalent functionality could be added to the top level module functions by simply adding two new optional arguments to each of the related functions.
Here's a sample of what it would look like for match()
import sys
def match(pattern, string, flags=0, pos=0, endpos=sys.maxsize):
"""Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).match(string, pos=pos, endpos=endpos)
As linked above, here's a gist with a full implementation. It's a very simple change, overall: https://gist.github.com/adamsilkey/9a243427c9645d00505cca08e27a931b
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
Here is a link to previous discussion: discuss.python.org/t/41143
While the request did not generate any discussion, the post itself was supported by three individuals, including two core devs. Additionally, no one responded weighing a strong objection to the idea or a concern with the implementation.