Open
Description
While investigating regex performance, I've noticed that line anchors are much slower than global anchors, even when a string consists of a single line. Most regexps that I've encountered that use anchors use the line variants. I've tried with a variety of patterns and the effect is visible in each case:
# frozen_string_literal: true
require 'benchmark/ips'
SUBJECT = "abc"
Benchmark.ips do |x|
x.report('blank? check (global)') do
SUBJECT.match?(/\A[[:space:]]*\z/)
end
x.report('blank? check (line)') do
SUBJECT.match?(/^[[:space:]]*$/)
end
x.report('exact match (global)') do
SUBJECT.match?(/\Aabc\z/)
end
x.report('exact match (line)') do
SUBJECT.match?(/^abc$/)
end
x.report('wildcard check (global)') do
SUBJECT.match?(/\A.*\z/)
end
x.report('wildcard check (line)') do
SUBJECT.match?(/^.*$/)
end
x.report('start only (all)') do
SUBJECT.match?(/\Aabc/)
end
x.report('start only (line)') do
SUBJECT.match?(/^abc/)
end
x.report('end only (all)') do
SUBJECT.match?(/abc\z/)
end
x.report('end only (line)') do
SUBJECT.match?(/abc$/)
end
end
> jt ruby -v regexp-anchor-benchmarks.rb
Using TruffleRuby with Graal: mxbuild/truffleruby-jvm-ce
$ /home/nirvdrum/dev/workspaces/truffleruby-ws/truffleruby/mxbuild/truffleruby-jvm-ce/languages/ruby/bin/ruby \
--experimental-options \
--core-load-path=/home/nirvdrum/dev/workspaces/truffleruby-ws/truffleruby/src/main/ruby/truffleruby \
-v \
regexp-anchor-benchmarks.rb
truffleruby 21.3.0-dev-ac4aa22b, like ruby 2.7.3, GraalVM CE JVM [x86_64-linux]
Warming up --------------------------------------
blank? check (global)
5.800M i/100ms
blank? check (line) 13.234M i/100ms
exact match (global) 12.170M i/100ms
exact match (line) 9.117M i/100ms
wildcard check (global)
13.231M i/100ms
wildcard check (line)
12.099M i/100ms
start only (all) 14.506M i/100ms
start only (line) 10.191M i/100ms
end only (all) 13.645M i/100ms
end only (line) 10.353M i/100ms
Calculating -------------------------------------
blank? check (global)
178.517M (± 4.5%) i/s - 893.147M in 5.018551s
blank? check (line) 131.881M (± 4.3%) i/s - 661.704M in 5.027756s
exact match (global) 136.003M (± 8.2%) i/s - 681.540M in 5.052745s
exact match (line) 99.360M (± 7.9%) i/s - 501.429M in 5.084205s
wildcard check (global)
126.612M (± 5.8%) i/s - 635.083M in 5.034580s
wildcard check (line)
122.111M (± 7.2%) i/s - 617.074M in 5.086444s
start only (all) 140.055M (± 5.9%) i/s - 710.804M in 5.096991s
start only (line) 100.616M (± 8.1%) i/s - 499.345M in 5.005298s
end only (all) 134.511M (± 8.4%) i/s - 668.615M in 5.014038s
end only (line) 97.928M (±10.4%) i/s - 486.580M in 5.031653s
NB: I have not investigated the \Z
anchor.