Apache Flink provides a variety of built-in connectors to facilitate the integration of Flink with different data sources and sinks (also called destinations). These connectors make it easy to read and write data from/to various systems in a scalable and fault-tolerant manner. In this section, we will introduce some of the most commonly used built-in connectors in Apache Flink.
Apache Kafka is a distributed streaming platform designed for high-throughput, fault-tolerant, and scalable data streaming. Flink’s Kafka connector allows you to consume and produce data from and to Kafka topics.
Upsert Kafka SQL Connector#
The Upsert Kafka SQL Connector allows Apache Flink to integrate with Apache Kafka for reading and writing data using upsert semantics. This is particularly useful when working with changelog streams or streaming upserts, where each record represents an update or deletion of a previous record based on a primary key.
Amazon Kinesis Data Streams#
Amazon Kinesis Data Streams is a managed, real-time data streaming service provided by Amazon Web Services (AWS). Flink’s Kinesis connector enables you to consume and produce data from and to Kinesis data streams.
The DataGen connector in Apache Flink allows you to create tables with in-memory data generation, which is particularly useful for developing and testing queries locally without the need to access external systems such as Kafka. DataGen tables can include computed column syntax for flexible record generation.
The Faker connector leverages the popular Java Faker library to generate random data based on predefined patterns. This allows you to create tables with data that closely resembles real-world data, enabling you to develop and test your Flink applications more effectively.
Elasticsearch is a distributed, RESTful search and analytics engine built on top of Apache Lucene. Flink’s Elasticsearch connector enables you to write data to Elasticsearch indices and perform real-time search and analytics operations on the stored data.
MySQL & MySQL CDC#
Apache Flink provides built-in connectors for MySQL to enable both batch processing and real-time change data capture (CDC) from MySQL databases. This allows you to read and write data from and to MySQL databases, and capture changes in real-time as they occur.
The MySQL connector allows you to read and write data from and to MySQL databases using Flink’s JDBC connector.
While Apache Flink does not provide a dedicated built-in connector for PostgreSQL, you can still integrate Flink with PostgreSQL using the JDBC connector or the Change Data Capture (CDC) approach.
Redis is an open-source, in-memory data structure store that can be used as a database, cache, and message broker. Flink’s Redis connector provides seamless integration with Redis, enabling you to read and write data from/to Redis data structures.
Most of documentation about built-in connectors comes from the official Apache Flink® documentation.
Refer to the Credits page for more information.