Skip to content

Improvement: improve multiple series creation (especially on ingester startup) #11078

@colega

Description

@colega

What is the problem you are trying to solve?

Sometimes ingesters have to create a lot of series, we already see some lock contention on MemPostings.Add() for tenants who have a lot of churn, but the worst case is when an ingester is either started from scratch or becomes writeable again.

In this last case we've seen cases of write requests waiting for seconds on MemPostings.Add(), since we have to create a couple of million series in matter of seconds, and each one will take an exclusive mutex.

It's not a very expensive operation, but the synchronization process wastes a lot of CPU and clears the precious caches.

Additionally, since we have hundreds of goroutines waiting to get the exclusive write mutex, it means that no reads can proceed during that time, also increasing the latency of the read path.

Which solution do you envision (roughly)?

Instead of creating series one by one, create all the samples for a given request and optimize the amount of times the mutex is taken and the time spend under that mutex.

I wrote a simple implementation of a BatchSeriesRefs best-effort method on the Appender: it would take a list of series, and try to create the ones that are valid and don't exist yet. It would also copy the labels as necessary, since we do heavy usage of unsafe strings in Mimir.

The implementation in Mimir is straightforward, most of the code is required to keep the allocations low reusing builders and labels.

In my first approach I just added a MemPostings.AddBatch that creates 512 index entries each time it takes the mutex (that number is shared with other thigns MemPostings does like garbage-collecting, we don't want to hold the mutex for too long if the caller wants to create 1M series). I think this is a good first approach: we could do better, we could have a pool of GOMAXPROCS workers and create series in parallel minimizing the time spent with the mutex taken, but I would leave that to the future versions.

Another improvement we could do is to batch the series lookup in the stripeSeries: we don't see mutex contention there, however there's likely some synchronization overhead we don't see.

Have you considered any alternatives?

No response

Any additional context to share?

There's one drawback I identified we're now skipping the shortcut in headAppender.Append that prevented series from being created when the sample received can't be appended to the head.

This means that if a tenant is sending too old data we'll be creating their series without appending any samples associated to them.

How long do you think this would take to be developed?

Not sure

What are the documentation dependencies?

No response

Proposer?

No response

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions