Description
Elasticsearch Version
8.0.0
Installed Plugins
No response
Java Version
bundled
OS Version
n/a
Problem Description
Running query_string queries containing wildcards against a text field configured with a char_filter that may introduce extra tokens to analysis ('-' -> ' '
for example) can miss results. This is because the part of QueryStringQueryParser
that handles wildcard queries only applies normalization to its input, rather than full analysis. So given the input foo-b*r
, the wildcard query path will apply the char_filter to create foo b*r
, and then create a wildcard query on that term - but at index time the token foo-bar
will have been split into two tokens, so no match will be found.
Note that prefix queries do apply full analysis and so a query for foo-ba*
would correctly match the input.
Steps to Reproduce
DELETE test_index
PUT test_index
{
"mappings": {
"properties" : {
"title" : {
"type" : "text",
"analyzer" : "filtered"
}
}
},
"settings" : {
"index" : {
"analysis": {
"char_filter" : {
"hyphens" : {
"type": "pattern_replace",
"pattern": "([a-zA-Z])-([a-zA-Z])",
"replacement": "$1 $2"
}
},
"analyzer": {
"filtered" : {
"char_filter" : [ "hyphens" ],
"filter" : [
"lowercase"
],
"tokenizer" : "standard"
}
}
}
}
}
}
PUT test_index/_doc/1
{
"title": "foo-bar"
}
GET test_index/_validate/query?explain=true
{
"query": {
"query_string" : {
"fields": [ "title" ],
"query": "foo-b*r"
}
}
}
The resulting query is a wildcard query against the term foo b*r
.
Logs (if relevant)
No response