Streaming data from SQL Server to Kafka to Snowflake ❄️ with Kafka Connect
Running Dockerised Kafka Connect worker on GCP
I talk and write about Kafka and Confluent Platform a lot, and more and more of the demos that I’m building are around Confluent Cloud. This means that I don’t have to run or manage my own Kafka brokers, Zookeeper, Schema Registry, KSQL servers, etc which makes things a ton easier. Whilst there are managed connectors on Confluent Cloud (S3 etc), I need to run my own Kafka Connect worker for those connectors not yet provided. An example is the MQTT source connector that I use in this demo. Up until now I’d either run this worker locally, or manually build a cloud VM. Locally is fine, as it’s all Docker, easily spun up in a single docker-compose up -d
command. I wanted something that would keep running whilst my laptop was off, but that was as close to my local build as possible—enter GCP and its functionality to run a container on a VM automagically.
You can see the full script here. The rest of this article just walks through the how and why.
Debezium & MySQL v8 : Public Key Retrieval Is Not Allowed
I started hitting problems when trying Debezium against MySQL v8. When creating the connector:
Using Kafka Connect and Debezium with Confluent Cloud
This is based on using Confluent Cloud to provide your managed Kafka and Schema Registry. All that you run yourself is the Kafka Connect worker.
Optionally, you can use this Docker Compose to run the worker and a sample MySQL database.
Skipping bad records with the Kafka Connect JDBC sink connector
The Kafka Connect framework provides generic error handling and dead-letter queue capabilities which are available for problems with [de]serialisation and Single Message Transforms. When it comes to errors that a connector may encounter doing the actual pull
or put
of data from the source/target system, it’s down to the connector itself to implement logic around that. For example, the Elasticsearch sink connector provides configuration (behavior.on.malformed.documents
) that can be set so that a single bad record won’t halt the pipeline. Others, such as the JDBC Sink connector, don’t provide this yet. That means that if you hit this problem, you need to manually unblock it yourself. One way is to manually move the offset of the consumer on past the bad message.
TL;DR : You can use kafka-consumer-groups --reset-offsets --to-offset <x>
to manually move the connector past a bad message
Kafka Connect and Elasticsearch
I use the Elastic stack for a lot of my talks and demos because it complements Kafka brilliantly. A few things have changed in recent releases and this blog is a quick note on some of the errors that you might hit and how to resolve them. It was inspired by a lot of the comments and discussion here and here.
Copying data between Kafka clusters with Kafkacat
kafkacat gives you Kafka super powers 😎
I’ve written before about kafkacat and what a great tool it is for doing lots of useful things as a developer with Kafka. I used it too in a recent demo that I built in which data needed manipulating in a way that I couldn’t easily elsewhere. Today I want share a very simple but powerful use for kafkacat as both a consumer and producer: copying data from one Kafka cluster to another. In this instance it’s getting data from Confluent Cloud down to a local cluster.
Kafka Summit GoldenGate bridge run/walk
Coming to Kafka Summit in San Francisco next week? Inspired by similar events at Oracle OpenWorld in past years, I’m proposing an unofficial run (or walk) across the GoldenGate bridge on the morning of Tuesday 1st October. We should be up and out and back in plenty of time to still attend the morning keynotes. Some people will run, some may prefer to walk, it’s open to everyone :)
Staying sane on the road as a Developer Advocate
Where I’ll be on the road for the remainder of 2019
Reset Kafka Connect Source Connector Offsets
Starting a Kafka Connect sink connector at the end of a topic
When you create a sink connector in Kafka Connect, by default it will start reading from the beginning of the topic and stream all of the existing—and new—data to the target. The setting that controls this behaviour is auto.offset.reset
, and you can see its value in the worker log when the connector runs:
[2019-08-05 23:31:35,405] INFO ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
…
Resetting a Consumer Group in Kafka
Migrating Alfred Clipboard to New Laptop
So how DO you make those cool diagrams? July 2019 update
Taking the Vienna-Munich sleeper train
Manually delete a connector from Kafka Connect
Kafka Connect has as REST API through which all config should be done, including removing connectors that have been created. Sometimes though, you might have reason to want to manually do this—and since Kafka Connect running in distributed mode uses Kafka as its persistent data store, you can achieve this by manually writing to the topic yourself.
Automatically restarting failed Kafka Connect tasks
Here’s a hacky way to automatically restart Kafka Connect connectors if they fail. Restarting automatically only makes sense if it’s a transient failure; if there’s a problem with your pipeline (e.g. bad records or a mis-configured server) then you don’t gain anything from this. You might want to check out Kafka Connect’s error handling and dead letter queues too.
Putting Kafka Connect passwords in a separate file / externalising secrets
Kafka Connect configuration is easy - you just write some JSON! But what if you’ve got credentials that you need to pass? Embedding those in a config file is not always such a smart idea. Fortunately with KIP-297 which was released in Apache Kafka 2.0 there is support for external secrets. It’s extendable to use your own ConfigProvider
, and ships with its own for just putting credentials in a file - which I’ll show here. You can read more here.