Skip to content

Regex line anchors much slower than global anchors #2429

Open
@nirvdrum

Description

@nirvdrum

While investigating regex performance, I've noticed that line anchors are much slower than global anchors, even when a string consists of a single line. Most regexps that I've encountered that use anchors use the line variants. I've tried with a variety of patterns and the effect is visible in each case:

# frozen_string_literal: true

require 'benchmark/ips'

SUBJECT = "abc"

Benchmark.ips do |x|
  x.report('blank? check (global)') do
    SUBJECT.match?(/\A[[:space:]]*\z/)
  end

  x.report('blank? check (line)') do
    SUBJECT.match?(/^[[:space:]]*$/)
  end

  x.report('exact match (global)') do
    SUBJECT.match?(/\Aabc\z/)
  end

  x.report('exact match (line)') do
    SUBJECT.match?(/^abc$/)
  end

  x.report('wildcard check (global)') do
    SUBJECT.match?(/\A.*\z/)
  end

  x.report('wildcard check (line)') do
    SUBJECT.match?(/^.*$/)
  end

  x.report('start only (all)') do
    SUBJECT.match?(/\Aabc/)
  end

  x.report('start only (line)') do
    SUBJECT.match?(/^abc/)
  end

  x.report('end only (all)') do
    SUBJECT.match?(/abc\z/)
  end

  x.report('end only (line)') do
    SUBJECT.match?(/abc$/)
  end
end
> jt ruby -v regexp-anchor-benchmarks.rb
Using TruffleRuby with Graal: mxbuild/truffleruby-jvm-ce
$ /home/nirvdrum/dev/workspaces/truffleruby-ws/truffleruby/mxbuild/truffleruby-jvm-ce/languages/ruby/bin/ruby \
  --experimental-options \
  --core-load-path=/home/nirvdrum/dev/workspaces/truffleruby-ws/truffleruby/src/main/ruby/truffleruby \
  -v \
  regexp-anchor-benchmarks.rb
truffleruby 21.3.0-dev-ac4aa22b, like ruby 2.7.3, GraalVM CE JVM [x86_64-linux]
Warming up --------------------------------------
blank? check (global)
                         5.800M i/100ms
 blank? check (line)    13.234M i/100ms
exact match (global)    12.170M i/100ms
  exact match (line)     9.117M i/100ms
wildcard check (global)
                        13.231M i/100ms
wildcard check (line)
                        12.099M i/100ms
    start only (all)    14.506M i/100ms
   start only (line)    10.191M i/100ms
      end only (all)    13.645M i/100ms
     end only (line)    10.353M i/100ms
Calculating -------------------------------------
blank? check (global)
                        178.517M (± 4.5%) i/s -    893.147M in   5.018551s
 blank? check (line)    131.881M (± 4.3%) i/s -    661.704M in   5.027756s
exact match (global)    136.003M (± 8.2%) i/s -    681.540M in   5.052745s
  exact match (line)     99.360M (± 7.9%) i/s -    501.429M in   5.084205s
wildcard check (global)
                        126.612M (± 5.8%) i/s -    635.083M in   5.034580s
wildcard check (line)
                        122.111M (± 7.2%) i/s -    617.074M in   5.086444s
    start only (all)    140.055M (± 5.9%) i/s -    710.804M in   5.096991s
   start only (line)    100.616M (± 8.1%) i/s -    499.345M in   5.005298s
      end only (all)    134.511M (± 8.4%) i/s -    668.615M in   5.014038s
     end only (line)     97.928M (±10.4%) i/s -    486.580M in   5.031653s

NB: I have not investigated the \Z anchor.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions