Monitoring Kafka metrics with ClickStack
This guide shows you how to monitor Apache Kafka performance metrics with ClickStack by configuring the OpenTelemetry collector's JMX receiver. You'll learn how to:
- Configure the OTel collector to collect Kafka metrics via JMX
- Deploy ClickStack with your custom configuration
- Use a pre-built dashboard to visualize Kafka performance (broker throughput, partition lag, request rates, disk usage)
A demo dataset with sample metrics is available if you want to test the integration before configuring your production Kafka cluster.
Time required: 10-15 minutes
Integration with existing Kafka
This section covers configuring your existing Kafka installation to send metrics to ClickStack by configuring the ClickStack OTel collector with the JMX receiver.
If you would like to test the Kafka Metrics integration before configuring your own existing setup, you can test with our preconfigured demo dataset in the following section.
Prerequisites
- ClickStack instance running
- Existing Kafka installation (version 2.0 or newer)
- JMX port exposed on Kafka brokers (default port 9999)
- Network access from ClickStack to Kafka JMX endpoints
- JMX authentication credentials if enabled
Enable JMX on Kafka brokers
Kafka exposes metrics via JMX (Java Management Extensions). Ensure JMX is enabled on your Kafka brokers.
Add these settings to your Kafka broker startup configuration:
For Docker deployments:
Verify JMX is accessible:
Common Kafka JMX endpoints:
- Local installation:
localhost:9999 - Docker: Use container name or service name (e.g.,
kafka:9999) - Remote:
<kafka-host>:9999
Create custom OTel collector configuration
ClickStack allows you to extend the base OpenTelemetry collector configuration by mounting a custom configuration file and setting an environment variable. The custom configuration is merged with the base configuration managed by HyperDX via OpAMP.
Create a file named kafka-metrics.yaml with the following configuration:
This configuration:
- Connects to Kafka's JMX endpoint at
kafka:9999(adjust endpoint for your setup) - Uses the JMX receiver with Kafka-specific metric mappings via
target_system: kafka - Collects metrics every 10 seconds
- Sets the required
service.nameresource attribute per OpenTelemetry semantic conventions - Routes metrics to the ClickHouse exporter via a dedicated pipeline
Key metrics collected:
Broker metrics:
kafka.broker.message_in_rate- Messages received per secondkafka.broker.byte_in_rate- Bytes received per secondkafka.broker.byte_out_rate- Bytes sent per secondkafka.broker.request_rate- Requests handled per secondkafka.broker.log_flush_rate- Log flush operations per second
Partition metrics:
kafka.partition.count- Total number of partitionskafka.partition.leader_count- Number of partitions this broker leadskafka.partition.under_replicated- Under-replicated partitions (data safety concern)kafka.partition.offline- Offline partitions (availability concern)
Request metrics:
kafka.request.produce.time.avg- Average produce request latencykafka.request.fetch_consumer.time.avg- Average consumer fetch latencykafka.request.fetch_follower.time.avg- Average follower fetch latencykafka.request.queue.size- Request queue depth
Consumer lag:
kafka.consumer.lag- Consumer group lag by topic and partitionkafka.consumer.lag_max- Maximum lag across all partitions
Disk and storage:
kafka.log.size- Total log size in byteskafka.log.segment.count- Number of log segments
- You only define new receivers, processors, and pipelines in the custom config
- The
memory_limiterandbatchprocessors andclickhouseexporter are already defined in the base ClickStack configuration - you just reference them by name - The
resourceprocessor sets the requiredservice.nameattribute per OpenTelemetry semantic conventions - For production with JMX authentication, store credentials in environment variables:
${env:JMX_USERNAME}and${env:JMX_PASSWORD} - Adjust
collection_intervalbased on your needs (10s default; lower values increase data volume) - For multiple brokers, create separate receiver configurations with unique broker IDs in the resource attributes
- The JMX receiver JAR (
opentelemetry-jmx-metrics.jar) is included in the ClickStack OTel collector image
Configure ClickStack to load custom configuration
To enable custom collector configuration in your existing ClickStack deployment, you must:
- Mount the custom config file at
/etc/otelcol-contrib/custom.config.yaml - Set the environment variable
CUSTOM_OTELCOL_CONFIG_FILE=/etc/otelcol-contrib/custom.config.yaml - Ensure network connectivity between ClickStack and Kafka JMX endpoints
Option 1: Docker Compose
Update your ClickStack deployment configuration:
Option 2: Docker run (all-in-one image)
If using the all-in-one image with docker run:
Important: If Kafka is running in another container, use Docker networking:
Verify metrics in HyperDX
Once configured, log into HyperDX and verify metrics are flowing:
- Navigate to the Metrics explorer
- Search for metrics starting with
kafka.(e.g.,kafka.broker.message_in_rate,kafka.partition.count) - You should see metric data points appearing at your configured collection interval
Demo dataset
For users who want to test the Kafka Metrics integration before configuring their production systems, we provide a pre-generated dataset with realistic Kafka metrics patterns.
Download the sample metrics dataset
Download the pre-generated metrics files (24 hours of Kafka metrics with realistic patterns):
The dataset includes realistic patterns:
- Morning traffic ramp (07:00-09:00) - Gradual increase in message throughput
- Production deployment (11:30) - Brief spike in consumer lag, then recovery
- Peak load (14:00-16:00) - Maximum throughput with occasional under-replicated partitions
- Rebalance event (18:45) - Consumer group rebalance causing temporary lag spike
- Daily patterns - Business hours peaks, off-hours baseline, weekend traffic drops
Start ClickStack
Start a ClickStack instance:
Wait approximately 30 seconds for ClickStack to fully start.
Verify metrics in HyperDX
Once loaded, the quickest way to see your metrics is through the pre-built dashboard.
Proceed to the Dashboards and visualization section to import the dashboard and view all Kafka metrics at once.
The demo dataset time range is 2025-10-20 00:00:00 to 2025-10-21 05:00:00. Make sure your time range in HyperDX matches this window.
Look for these interesting patterns:
- 07:00-09:00 - Morning traffic ramp-up
- 11:30 - Production deployment with lag spike
- 14:00-16:00 - Peak throughput period
- 18:45 - Consumer rebalance event
Dashboards and visualization
To help you get started monitoring Kafka with ClickStack, we provide essential visualizations for Kafka metrics.
Import the pre-built dashboard
- Open HyperDX and navigate to the Dashboards section
- Click Import Dashboard in the upper right corner under the ellipses
- Upload the
kafka-metrics-dashboard.jsonfile and click Finish Import
View the dashboard
The dashboard will be created with all visualizations pre-configured:
Dashboard panels include:
- Broker Throughput - Messages/sec and bytes/sec in/out
- Request Performance - Average latency for produce, consumer fetch, and follower fetch requests
- Partition Health - Total partitions, leaders, under-replicated, and offline counts
- Consumer Lag - Current lag and maximum lag across consumer groups
- Request Queue - Pending requests indicating broker saturation
- Disk Usage - Total log size and segment counts
For the demo dataset, ensure the time range is set to 2025-10-20 05:00:00 - 2025-10-21 05:00:00.
Troubleshooting
Custom config not loading
Verify the environment variable CUSTOM_OTELCOL_CONFIG_FILE is set correctly:
Check that the custom config file is mounted at /etc/otelcol-contrib/custom.config.yaml:
View the custom config content to verify it's readable:
No metrics appearing in HyperDX
Verify JMX is accessible from the collector:
Check if the JMX metrics JAR is present:
Verify the effective config includes your JMX receiver:
Check for errors in the collector logs:
JMX authentication errors
If you see authentication errors in the logs:
Update your configuration to use credentials:
Network connectivity issues
If ClickStack can't reach Kafka JMX:
Ensure your Docker Compose file or docker run commands place both containers on the same network.
JMX receiver performance issues
If the JMX receiver is consuming too many resources:
- Increase the
collection_intervalto reduce scraping frequency (e.g.,30sinstead of10s) - Configure the JMX receiver to collect only specific metrics using the
additional_jvm_metricsoption - Monitor the collector's memory usage and adjust the
memory_limiterprocessor if needed
Missing specific metrics
If certain Kafka metrics are not appearing:
- Verify the metric exists in Kafka's JMX endpoint using a JMX browser tool
- Check that
target_system: kafkais set correctly in the receiver configuration - Some metrics may only appear under specific conditions (e.g., consumer lag only appears when consumers are active)
- Review the OpenTelemetry JMX receiver documentation for the complete list of Kafka metrics
Next steps
If you want to explore further, here are some next steps to experiment with your monitoring:
- Set up alerts for critical metrics (under-replicated partitions, consumer lag thresholds, disk usage)
- Create additional dashboards for specific use cases (producer performance, topic-level metrics, consumer group monitoring)
- Monitor multiple Kafka brokers by duplicating the receiver configuration with different endpoints and broker IDs
- Integrate Kafka topic metadata using the Kafka receiver for deeper visibility into message flow
- Correlate Kafka metrics with application traces to understand end-to-end request performance