Skip to content

[ML] Inconsistent categorization_filters content for ML jobs in Kibana #917

Open
@elasticmachine

Description

@elasticmachine

Original comment by @lcawl:

On 5.4 and 5.5 builds from s3, there is disparity with respect to the categorization_filters content.

For example, when I run the following API in the Dev Tools tab in Kibana:

PUT _xpack/ml/anomaly_detectors/it_ops_logs
{
  "description": "IT Ops application logs",
  "analysis_config": {
    "categorization_field_name": "message",
    "bucket_span":"30m",
    "detectors": [
      {
      "function": "count",
      "by_field_name": "mlcategory",
      "detector_description": "Unusual message counts"
    }],
    "categorization_filters":"\\[statement:.*\\]"
  },
  "analysis_limits": {
      "categorization_examples_limit": 5
  },
  "data_description": {
    "time_field": "time",
    "time_format": "epoch_ms"
  }
}

It returns the following:

...
"analysis_config": {
    "bucket_span": "30m",
    "categorization_field_name": "message",
    "categorization_filters": [
      """\[statement:.*\]"""
    ]
...

Ditto when I run GET _xpack/ml/anomaly_detectors/it_ops_logs from the Dev Tools tab.

If I run it from the command line, however, I get the following:

curl -u elastic:changeme -XGET 'localhost:9200/_xpack/ml/anomaly_detectors/it_ops_logs'
{"count":1,"jobs":[{"job_id":"it_ops_logs","job_type":"anomaly_detector","job_version":"5.5.0","description":"IT Ops application logs","create_time":1497295377972,"analysis_config":{"bucket_span":"30m","categorization_field_name":"message","categorization_filters":["\\[statement:.*\\]"],"detectors":[{"detector_description":"Unusual message counts","function":"count","by_field_name":"mlcategory","detector_rules":[],"detector_index":0}],"influencers":[]},"analysis_limits":{"categorization_examples_limit":5},"data_description":{"time_field":"time","time_format":"epoch_ms"},"model_snapshot_retention_days":1,"results_index_name":"shared"}]}Lisas-MBP:~ lcawley$ 

Note that the categorization_filters content matches what I specified in the PUT command in this case.

The "categorization_filters":"""[statement:.*]""" does not actually seem to be invalid (I can successfully create a job in the Dev Tools with that syntax, it's just a bit confusing that it's returning a different syntax in Dev Tools vs command line.

For interest's sake, to get this to work in the advanced job wizard I must use the following syntax:
filter1
.... the categorization_filters property then has the desired format in the Edit JSON tab:
filter2

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions