OpenTelemetry, Spring Boot and Kafka

Introduction

Finding the root cause of problems in the software is a pretty complex topic. With distributed systems, another layer of the ultimate complexity, the problem becomes harder to solve. To make a distributed system more observable, one needs to implement proper tracing, apart from logging and metrics.

Distributed tracing allows to track requests and how they travel on the distributed system. It enables engineers to debug with a more precise picture of the system for the given issue.

What is OpenTelemetry?

OpenTelemetry? OpenTelemetry is the industry standard with ready-to-use implementations to enable the effective creation and collection of telemetry data (metrics, logs, and traces) and to forward them to external tools.

It has already SDKs, which are ready to use by many popular libraries and frameworks. Spring Boot is no exception. Spring Boot provided tracing with the spring-cloud-starter-sleuth but the transition to OpenTelemetry created the need for a fairly new spring project spring-cloud-starter-sleuth-otel.

The tendency to switch to OpenTelemetry is backed up by the fact: It is vendor agnostic. Data can be exported to any external entity that knows how to read format. Awesome.

An intro to the tracing is given in the next chapter.

What is tracing?

Tracing allows us to put identity on request in our system and track it on its journey. The request is not only HTTP/S but also an event messaging system message(eg. Kafka message), or some other kind of request which is supported by the instrumentation library and framework used.

Three pillars of tracing are Span, SpanContext, and Trace.

Span

The “span” is the primary building block of a distributed trace, representing an individual unit of work done in a distributed system.

SpanContext

The SpanContext carries data across process boundaries. Specifically, it has two major components:

  • An implementation-dependent state refers to the distinct span within a trace
  • A key:value pairs that cross process boundaries.

Trace

The collection of Spans that have the same root is considered to create a group called trace.

To read more about opentracing, check out https://opentracing.io/docs/overview/spans/.

Tracing example

Let's see how can we practically leverage tracing. The setup of the services is drawn in the next picture.

As you can see there are three services:

  • user-service (Providing HTTP API to the external world)
  • report-service (Providing HTTP API to the user-service and KafkaProducer to send events to topic reports)
  • email-service (Providing KafkaConsumer, listening to the topic reports).

What is the goal? Track the request from the start (user-service) to the end. From the user-service to the email-service.

Architecture is simplified for demonstration purposes. The real benefit of tracing would be seen in architecture where we have N services communicating with each other (Where N is a large number. Define large for yourself.).

Deploying tracing infrastructure

Infrastructure architecture is described in detail in the docker-compose.yaml. Let's group the deployments into the two groups:

  • application services
  • infrastructure services

Infrastructure applications:

  • jaeger-all-in-one
  • otel-collector
  • zookeeper
  • kafka

Jaeger is used for visualizing the traces. Otel-collector handles input fromspring-otel-exporter, transforms(Optionally), and sends data to Jaeger. Zookeeper and Kafka are usual suspects.

Since every one of these tools is a broad topic in itself focus will not be on introducing them. Official documentation can be found below:

Configuring maven for OpenTelemetry support

To include open-telemetry and sleuth configure the pom.xml like the one provided.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.6.5</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>

    <groupId>com.tracing</groupId>
    <artifactId>tracing</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>pom</packaging>

    <modules>
        <module>tracing-commons</module>
        <module>tracing-email</module>
        <module>tracing-user</module>
        <module>tracing-report</module>
    </modules>

    <properties>
        <java.version>11</java.version>
        <spring-cloud.version>2021.0.1</spring-cloud.version>
        <spring-cloud-sleuth-otel.version>1.1.0-M5</spring-cloud-sleuth-otel.version>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.cloud</groupId>
                <artifactId>spring-cloud-dependencies</artifactId>
                <version>${spring-cloud.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
            <dependency>
                <groupId>org.springframework.cloud</groupId>
                <artifactId>spring-cloud-sleuth-otel-dependencies</artifactId>
                <version>${spring-cloud-sleuth-otel.version}</version>
                <scope>import</scope>
                <type>pom</type>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <build>
    </build>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-sleuth</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>org.springframework.cloud</groupId>
                    <artifactId>spring-cloud-sleuth-brave</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-sleuth-otel-autoconfigure</artifactId>
        </dependency>

        <dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-exporter-otlp-trace</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.kafka</groupId>
            <artifactId>spring-kafka</artifactId>
            <version>2.8.4</version>
        </dependency>
    </dependencies>

    <repositories>
        <repository>
            <id>spring-snapshots</id>
            <url>https://repo.spring.io/snapshot</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>spring-milestones</id>
            <url>https://repo.spring.io/milestone</url>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>spring-snapshots</id>
            <url>https://repo.spring.io/snapshot</url>
        </pluginRepository>
        <pluginRepository>
            <id>spring-milestones</id>
            <url>https://repo.spring.io/milestone</url>
        </pluginRepository>
    </pluginRepositories>

</project>

Configuring pom.xml and importing dependencies is 90% of the job.

Since the Java applications are configurable through the properties and YAML files configuration the otel-starter can be modified in the application.yaml. This is one of the things I love about Spring and Java in general.

spring:
  sleuth:
    otel:
      config:
        trace-id-ratio-based: 1.0
      exporter:
        otlp:
          endpoint: http://sleuth:4317

The endpoint and probability for trace exporting for the otel-collector are configured in the YAML. Config provides also other properties to modify for the spring-otel starter (Configuration properties).

💡
spring.sleuth.otel.config.trace-id-ratio-based property defines probability of 100% exporting traces (Mapping [0.0, 1.0] -> [0, 100]%).

Important: If the ratio is less than 1.0, then some traces will not be exported (It can make you think that it’s not working).

The otel-collector configuration is configured in the docker-compose.yaml but let's glance over it quickly.

extensions:
  memory_ballast:
    size_mib: 512
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:

exporters:
  logging:
    logLevel: debug
  jaeger:
    endpoint: jaeger-all-in-one:14250
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ batch ]
      exporters: [ logging, jaeger ]
  extensions: [ memory_ballast, zpages ]

Otel-collector works in a next manner:

receivers -> processing -> exporter

As can be seen from the configuration snippet, otel-collector is highly configurable. That's the thing we talked about in the intro of this article.

The picture from the official docs depicts how otel-collector can be configured.

Analyzing logs and traces

To analyze logs and traces, it's possible to check out the stdout of the application itself. Log output in te application service will display traceId and spanId. The example is shown below.

2022-04-01 18:21:45.984  INFO [user-service,f515bcf46b607671e1182d5903a5d261,779f554008223b4c] 1 --- [nio-8080-exec-1] c.tracing.service.users.UserController   : Creating new report for user: 1

When the request goes to the other service its logs will have the same trace.

2022-04-01 18:21:46.617  INFO [report-service,f515bcf46b607671e1182d5903a5d261,75dc1c69c94bf0f2] 1 --- [nio-8080-exec-1] c.t.service.reports.ReportController     : Creating new report: 1

That way when the otel-exporter provides data to the Jaeger, Jaeger can understand the data and visualize the path of a request. Let’s see the full path of requests represented in Jaeger.

Even if the request is traveling over different protocols, the request can be tracked from the entry point to the end—an invaluable resource for debuggers.

The graph is also available.


The instrumentation of requests in this demo is out-of-the-box. All requests are made via RestTemplate (Important note: RestTemplate limitation), KafkaProducer, and KafkaConsumer.

How does it work? Spring adds trace headers to the requests and receiving service (which also has sleuth support) knows how to parse those headers.

Examples are shown below.

RestTemplate example (Which is instrumented out-of-the-box):

...
public ReportClient(RestTemplate restTemplate) {
    this.restTemplate = restTemplate;
}

public Report postReportForCustomerId(Long id) {
    HttpHeaders headers = new HttpHeaders();
    headers.setContentType(MediaType.APPLICATION_JSON);

    JSONObject reportJsonObject = new JSONObject();
    reportJsonObject.put("id", id);
    reportJsonObject.put("report", "This new generated report.");

    HttpEntity<String> request = new HttpEntity<String>(reportJsonObject.toString(), headers);

    return restTemplate.postForObject(this.reportURL + "/reports", request, Report.class);
)
...

KafkaConsumer example (Created from factory):

@KafkaListener(topics = "reports", groupId = "group")
public void listenReports(Report report) {
    logger.info("Got report with: {}", report.getId());
    System.out.println("Received Message: " + report);
}

Spring has some limitations to the out-of-the-box instrumentation of tracing. It’s also possible to add trace headers manually. Check the spring docs for more information.

“How-to” Guides

Working example of Spring boot, OpenTelemetry, and Jaeger

The full source code of the working setup is available on Github.

GitHub - qdnqn/spring-tracing-opentelemetry: Spring boot tracing using Sleuth, OpenTelemetry and Jaeger.
Spring boot tracing using Sleuth, OpenTelemetry and Jaeger. - GitHub - qdnqn/spring-tracing-opentelemetry: Spring boot tracing using Sleuth, OpenTelemetry and Jaeger.