Skip to content

Top-level functions of re.module should support the string indexing pos/endpos argumentsย #113304

Open
@adamsilkey

Description

@adamsilkey

Feature or enhancement

Proposal:

Summary

The re module should add the string indexing pos/endpos optional parameters to the top-level convenience functions of re.search(), re.match(), re.fullmatch(), re.findall(), and re.finditer(). This would enable users to use pos/endpos string indexing without needing to first compile the regex to a pattern.

>>> # Current state, requires compiling a pattern
>>> pattern = re.compile('abc')
>>> pattern.search('012abc678', pos=3, endpos=6)
<re.Match object; span=(3, 6), match='abc'>

>>> # Improved Functionality
>>> re.search('abc', '012abc678', pos=3, endpos=6)
<re.Match object; span=(3, 6), match='abc'>

Here's a sample diff that would match up with the underlying C functionality.

Rationale

There are a number of methods for in the Python Regex Pattern class that support optional positional arguments (pos/endpos):

  • Pattern.search(string[, pos[, endpos]])
  • Pattern.match(string[, pos[, endpos]])
  • Pattern.fullmatch(string[, pos[, endpos]])
  • Pattern.findall(string[, pos[, endpos]])
  • Pattern.finditer(string[, pos[, endpos]])

Additionally, Python provides access to these pattern methods as top-level convenience functions in the module itself:

  • re.search()
  • re.match()
  • re.fullmatch()
  • re.findall()
  • re.finditer()

However, these top-level convenience functions do not support the optional positional arguments. If anyone wants to utilize the optional positional arguments, they must first compile a pattern with re.compile() and then call the method with the optional argument.

But all the convenience functions do is 1) compile the pattern and then 2) call the method. Here's an example directly from the re.py source:

def match(pattern, string, flags=0):
    """Try to apply the pattern at the start of the string, returning
    a match object, or None if no match was found."""
    return _compile(pattern, flags).match(string)

Looking at the underlying C Code for these methods, the method defines pos and endpos as 0 and PY_SSIZE_T_MAX respectively. It only changes the values if the arg parser detects the presence of either pos or endpos.

Example C code from match() (indentation adjusted for readability):

static PyObject *
_sre_SRE_Pattern_match(PatternObject *self, PyTypeObject *cls, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames)
{
(...)
    Py_ssize_t pos = 0;
    Py_ssize_t endpos = PY_SSIZE_T_MAX;
(...)
    pos = ival;
(...)
    endpos = ival;
(...)
    return_value = _sre_SRE_Pattern_match_impl(self, cls, string, pos, endpos);

Equivalent functionality could be added to the top level module functions by simply adding two new optional arguments to each of the related functions.

Here's a sample of what it would look like for match()

import sys

def match(pattern, string, flags=0, pos=0, endpos=sys.maxsize):
    """Try to apply the pattern at the start of the string, returning
    a match object, or None if no match was found."""
    return _compile(pattern, flags).match(string, pos=pos, endpos=endpos)

As linked above, here's a gist with a full implementation. It's a very simple change, overall: https://gist.github.com/adamsilkey/9a243427c9645d00505cca08e27a931b

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

Here is a link to previous discussion: discuss.python.org/t/41143

While the request did not generate any discussion, the post itself was supported by three individuals, including two core devs. Additionally, no one responded weighing a strong objection to the idea or a concern with the implementation.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions