Description
Crate version: 1.11.0
Example code: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c4b4cfe18c2e6413444e53315de33b27 (used for snippets below and extra checks)
The behavior of the crate when trying to use the ASCII character class syntax [[:foo:]]
with invalid character classes is somewhat confusing. A friend was trying to use [[:XID_Start:]]
to check whether _
(underscore/low line) was included in the XID_Start character class (it's not), and was confused when it returned true.
let expr = regex::Regex::new(r"[[:XID_Start:]]").unwrap();
dbg!(expr.is_match("_")); // true
The correct syntax, \p{XID_Start}
, does work correctly:
let correct = regex::Regex::new(r"\p{XID_Start}").unwrap();
dbg!(correct.is_match("a")); // true
dbg!(correct.is_match("1")); // false
dbg!(correct.is_match("_")); // false
It seems that when the class is invalid for an ASCII character class (regex
§ ASCII character classes), it falls back to marking any character present within the brackets as true:
dbg!(expr.is_match(":")); // true
dbg!(expr.is_match("X")); // true
dbg!(expr.is_match("x")); // false
dbg!(expr.is_match("a")); // true
dbg!(expr.is_match("b")); // false
dbg!(expr.is_match("[")); // false
dbg!(expr.is_match("]")); // false
I'm not entirely sure what regex
is actually interpreting this sequence as, but, assuming this is intentional behavior, I think that it might be something that is worth documenting in the aforementioned section on ASCII character classes in the docs, as the behavior is not immediately intuitive.