Skip to content

Latest commit

 

History

History

druid-otlp-format

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Apache Druid extension for OpenTelemetry singals ingestion

druid-otlp-ingestion

This artifact implements an Apache Druid extension that you can use to ingest OpenTelemetry signals - logs, metrics, traces and profiles - into Apache Druid, and then query Druid through interactive charts and dashboards.

What is OpenTelemetry?

OpenTelemetry is high-quality, ubiquitous, and portable telemetry that enables effective observability.

It's a newer generation of telemetry systems that builds on the experience gained with earlier implementations and offers a number of improvements. We, at mishmash io, find OpenTelemetry particularly useful - see the reasons here, and how we use it in our development process here.

Also, make sure you check the OpenTelemetry official docs.

Why Apache Druid?

Apache Druid is a high performance, real-time analytics database that delivers sub-second queries on streaming and batch data at scale and under load. It performs particularly well on timestamped data, and OpenTelemetry signals are heavily timestamped. See Apache Druid Use Cases for more.

Quick introduction

To get an idea of why and when to use this Druid extension - here is an example setup:

  1. Setup Apache Kafka.

  2. Get the OpenTelemetry collector and configure it to export data to Kafka. Use the OpenTelemetry Kafka Exporter provided by OpenTelemetry.

  3. Setup Apache Druid with this artifact as an extension.

  4. Inside Druid, configure Kafka Streaming Ingestions for logs, metrics and traces.

    In the ingestion specs set the InputFormat to this extension (more on this below).

    Note: OpenTelemetry profiles signal is still in development and is not supported by the Kafka Exporter.

  5. Setup Apache Superset with a Druid database driver.

  6. Explore your telemetry in Superset!

superset-dashboard

Tip

We have prepared a clone of OpenTelemetry's demo app with the exact same setup as above.

It's quicker to run (no configuration needed) and will also generate telemetry to populate your Druid tables.

Get our Druid ingestion demo app fork here.

Note

There are more ways of ingesting OpenTelemetry data into Apache Druid.

Watch this repository for updates as we continue to publish our internal OpenTelemetry-related code and add more examples!

Installation

In order to use this extension you need to install it inside your Druid nodes. This is done through the typical extension-loading process:

  1. Pull the extension:
cd /opt/druid && \
bin/run-java \
  -classpath "lib/*" org.apache.druid.cli.Main tools pull-deps \
  --no-default-hadoop -c "io.mishmash.opentelemetry:druid-otlp-format:1.1.3"
  1. Enable it in the Druid configuration:
druid.extensions.loadList=[<existing extensions>, "druid-otlp-format"]

That's all! Now launch your Druid instances and you're ready to ingest OpenTelemetry data!

Using a Docker image

We've prepared an Apache Druid for OpenTelemetry Docker image that lets you skip the installation steps above and just launch a Druid cluster with our OpenTelemetry extenion preinstalled.

It is based on the official image and is configured and launched the same way. Just replace the tags with mishmashio/druid-for-opentelemetry in your docker-compose.yaml.

Don't forget to enable the extension when launching!

Building your own Docker image

If you're building your own, custom Druid image - just include the pull-deps installation step above.

Note: The pull-deps command needs to open a number of files simultaneously, and the maximum number of open files inside a container might be limited on your system. On some Linux distributions podman, for example, has limits that are too low.

If this turns out to be the case on your computer, raise the number of open files limit during the build:

podman build --ulimit nofile=65535:65535 -f Dockerfile ..

Configuring ingestion jobs

Once the druid-otlp-format extension is loaded, you can set Druid to start loading OpenTelemetry data.

To create a data import task you need to provide Druid with an ingestion spec, detailing, among other things - where it should look for data, what the data format is, etc.

To apply this extension to your data source - write an ingestion spec the way you normally would, just set the InputFormat section to:

Tip

We've created some example ingestion specs.

See the examples here.

  • when ingesting logs signals:
    • when produced by the OpenTelemetry Kafka Exporter:
      {
        ...
        "spec": {
          "ioConfig": {
            ...
            "inputFormat": {
              "type": "otlp",
              "otlpInputSignal": "logsRaw"
            },
            ...
          },
        ...
        }
        ...
      }
    • when produced by other modules published by mishmash io:
      {
        ...
        "spec": {
          "ioConfig": {
            ...
            "inputFormat": {
              "type": "otlp",
              "otlpInputSignal": "logsFlat"
            },
            ...
          },
        ...
        }
        ...
      }
  • when ingesting metrics signals:
    • when produced by the OpenTelemetry Kafka Exporter:
      {
        ...
        "spec": {
          "ioConfig": {
            ...
            "inputFormat": {
              "type": "otlp",
              "otlpInputSignal": "metricsRaw"
            },
            ...
          },
        ...
        }
        ...
      }
    • when produced by other modules published by mishmash io:
      {
        ...
        "spec": {
          "ioConfig": {
            ...
            "inputFormat": {
              "type": "otlp",
              "otlpInputSignal": "metricsFlat"
            },
            ...
          },
        ...
        }
        ...
      }
  • when ingesting traces signals:
    • when produced by the OpenTelemetry Kafka Exporter:
      {
        ...
        "spec": {
          "ioConfig": {
            ...
            "inputFormat": {
              "type": "otlp",
              "otlpInputSignal": "tracesRaw"
            },
            ...
          },
        ...
        }
        ...
      }
    • when produced by other modules published by mishmash io:
      {
        ...
        "spec": {
          "ioConfig": {
            ...
            "inputFormat": {
              "type": "otlp",
              "otlpInputSignal": "tracesFlat"
            },
            ...
          },
        ...
        }
        ...
      }
  • when ingesting profiles signals:
    • when produced by the OpenTelemetry Kafka Exporter:

      Profiles are not yet supported by the Kafka Exporter

    • when produced by other modules published by mishmash io:

      {
        ...
        "spec": {
          "ioConfig": {
            ...
            "inputFormat": {
              "type": "otlp",
              "otlpInputSignal": "profilesFlat"
            },
            ...
          },
        ...
        }
        ...
      }

An example for a logs ingestion spec loading data produced by the OpenTelemetry Kafka Exporter might look like this:

{
  "type": "kafka",
  "spec": {
    "ioConfig": {
      "type": "kafka",
      "consumerProperties": {
        "bootstrap.servers": "localhost:9092"
      },
      "topic": "otlp_logs",
      "inputFormat": {
        "type": "otlp",
        "otlpInputSignal": "logsRaw"
      },
      "useEarliestOffset": true
    },
    "tuningConfig": {
      "type": "kafka"
    },
    "dataSchema": {
        ...
      },
      "granularitySpec": {
        ...
      }
    }
  }
}

If you're not very familiar with ingestion specs - take a look at Apache Druid Kafka tutorial to get started.

Using Druid GUI console

Druid's GUI console provides an interactive way for you to configure your ingestion jobs, without actually writing JSON specs.

Unfortunately, at the moment we do not support configuring the OTLP Input Format extension through the GUI console.

If you feel a bit brave - you can still use the GUI, with a little 'hack' - when you get to the Parse data step on the GUI wizard - switch to the Edit Spec step where you'll see the JSON that is being prepared. It has the same format as above and you can paste the correct inputFormat parameters (take the correct config from above). Once you do that - switch back to the Parse data tab and voila!

See a step by step Druid GUI console guide here.

OpenTelemetry at mishmash io

OpenTelemetry's main intent is the observability of production environments, but at mishmash io it is part of our software development process. By saving telemetry from experiments and tests of our own algorithms we ensure things like performance and resource usage of our distributed database, continuously and across releases.

We believe that adopting OpenTelemetry as a software development tool might be useful to you too, which is why we decided to open-source the tools we've built.

Learn more about the broader set of OpenTelemetry-related activities at mishmash io and follow GitHub profile for updates and new releases.