kafka: added kafka source, updated kafka dest

HofiOne · HofiOne · commit af5b2822c5fe · 2025-11-27T12:38:32.000+01:00
Signed-off-by: Hofi &lt;hofione@gmail.com&gt;
diff --git a/_data/link_aliases.yml b/_data/link_aliases.yml
@@ -117,6 +117,12 @@ adm-src-program:
 adm-dest-program:
   aliases: [ "program() destination" ]
 
+adm-src-kafka:
+  aliases: [ "kafka() source" ]
+
+adm-dest-kafkac:
+  aliases: [ "kafka() destination" ]
+
 adm-src-mqtt:
   aliases: [ "mqtt() source" ]
 
diff --git a/_data/navigation.yml b/_data/navigation.yml
@@ -165,6 +165,11 @@ admin-guide-nav:
     - title: "Jellyfin"
       url: /admin-guide/060_Sources/035_Jellyfin/README
       subnav:
+    - title: "kafka"
+      url: /admin-guide/060_Sources/038_Kafka/README
+      subnav:
+      - title: "Options of the kafka() source"
+        url: /admin-guide/060_Sources/038_Kafka/001_Kafka_options
     - title: "kubernetes"
       url: /admin-guide/060_Sources/040_Kubernetes/README
       subnav:
@@ -1159,7 +1164,7 @@ dev-guide-nav:
           subnav:
           - title: "file() Destination Driver"
             url: /dev-guide/chapter_4/section_2/macos-testing-status/affile/file-destination-driver
-          - title: "file() Source Driver (DEPRECATED)"
+          - title: "file() Source Driver"
             url: /dev-guide/chapter_4/section_2/macos-testing-status/affile/file-source-driver
           - title: "pipe() Destination Driver"
             url: /dev-guide/chapter_4/section_2/macos-testing-status/affile/pipe-destination-driver
diff --git a/_includes/doc/admin-guide/options/bootstrap-servers.md b/_includes/doc/admin-guide/options/bootstrap-servers.md
@@ -0,0 +1,17 @@
+## bootstrap-servers()
+
+|  Type:   | string |
+|Default:  |  N/A   |
+|Mandatory:|  yes   |
+
+*Description:* Specifies the hostname or IP address of the Kafka server.
+When specifying an IP address, IPv4 (for example, 192.168.0.1) or IPv6
+(for example, \[::1\]) can be used as well. Use a colon (**:**) after
+the address to specify the port number of the server. When specifying
+multiple addresses, use a comma to separate the addresses, for example:
+
+``` config
+bootstrap-servers(
+    "127.0.0.1:2525,remote-server-hostname:6464"
+)
+```
diff --git a/_includes/doc/admin-guide/options/config-kafka.md b/_includes/doc/admin-guide/options/config-kafka.md
@@ -0,0 +1,20 @@
+## config()
+
+|  Type: | key-value pairs |
+|Default:| N/A |
+
+*Description:* You can use this option to set the properties of the kafka {{ include.kafka_type }}.
+
+The {{ site.product.short_name }} kafka {{ include.type }} supports all properties of the official Kafka {{ include.kafka_type }}. For details, see the librdkafka documentation.
+
+The syntax of the config() option is the following:
+
+``` config
+config( 
+  “key1” => “value1” 
+  “key2” => “value2” 
+)
+```
+
+**NOTE:** The following kafka {{ include.kafka_type }} config options are protected and cannot be overriden in the `config()` list: {{ include.protected_options }}
+{: .notice--info}
diff --git a/_includes/doc/admin-guide/options/disable-bookmarks.md b/_includes/doc/admin-guide/options/disable-bookmarks.md
@@ -0,0 +1,9 @@
+## disable-bookmarks()
+
+| Type:     | boolean |
+| Default:  | no |
+
+*Description:* This option prevents {{ site.product.short_name }} from storing a bookmark (such as position or offset) in its persist file for the last processed message.
+
+**NOTE:** This will not prevent usage of an already presented bookmark entry, for ignoring those bookmark entries specify `ignore-saved-bookmarks(yes)` as well.
+{: .notice--info}
diff --git a/_includes/doc/admin-guide/options/kafka-logging.md b/_includes/doc/admin-guide/options/kafka-logging.md
@@ -0,0 +1,13 @@
+## kafka-logging()
+
+| Accepted values: | disabled \| trace \| kafka |
+| Default:         | disabled |
+
+*Description:* This option allows you to control how internal Kafka logs appear in the {{ site.product.short_name }} logs.
+
+- disabled: Disables internal Kafka log messages in the {{ site.product.short_name }} logs.
+- trace: Logs all internal Kafka messages at the `trace` level of {{ site.product.short_name }}.
+- kafka: Logs internal Kafka messages using log levels mapped to those of {{ site.product.short_name }}.  
+
+**NOTE:** The internal Kafka logging level itself can be configured using the config() Kafka options. For details, refer to the librdkafka documentation.  
+{: .notice--info}
diff --git a/_includes/doc/admin-guide/options/kafka-source-workers.md b/_includes/doc/admin-guide/options/kafka-source-workers.md
@@ -0,0 +1,6 @@
+- One **main worker** that fetches messages from the Kafka broker and stores them into an internal queue.
+- A second worker that processes the queued messages and forwards them to the configured destination.
+
+Although the source can operate using a single worker, this configuration typically results in a significant performance penalty compared to the default multi-worker setup.
+
+Increasing the number of workers beyond two may further improve throughput, especially when the main worker can fetch messages at high speed. In such cases, you may also need to fine-tune related options such as separated-worker-queues(), log-fetch-limit(), log-fetch-delay(), log-fetch-retry-delay(), log-fetch-queue-full-delay().
diff --git a/doc/_admin-guide/060_Sources/038_Kafka/001_Kafka_options.md b/doc/_admin-guide/060_Sources/038_Kafka/001_Kafka_options.md
@@ -0,0 +1,148 @@
+---
+title: "Options of the kafka() source"
+id: adm-src-kafka-opt
+description: >-
+    This section describes the options of the kafka() source in {{ site.product.short_name }}.
+---
+
+The kafka() source of {{ site.product.short_name }} can directly consume log messages from the Apache Kafka message bus. The source has the following options.
+
+## Required options
+
+To use the kafka() source, the following two options are required: bootstrap-servers() and topic(). Both must appear at the beginning of your {{ site.product.short_name }} configuration.
+
+{% include doc/admin-guide/options/bootstrap-servers.md %}
+
+{% include doc/admin-guide/options/config-kafka.md kafka_type='consumer' type='source' protected_options='`bootstrap.servers` `metadata.broker.list` `enable.auto.offset.store` `auto.offset.reset` `enable.auto.commit` `auto.commit.enable`' %}
+
+{% include doc/admin-guide/options/disable-bookmarks.md %}
+See Bookmarking in the kafka() source for more details.
+
+{% include doc/admin-guide/options/hook.md %}
+
+{% include doc/admin-guide/options/ignore-saved-bookmarks.md %} (depending on the setting of the read-old-records() option.\
+See Bookmarking in the kafka() source for more details.
+
+{% include doc/admin-guide/options/kafka-logging.md %}
+
+## log-fetch-limit()
+
+|  Type: | integer |
+|Default:| 10000 |
+
+*Description:* Specifies the maximum number of messages the main worker will consume and queue from the Kafka broker. This effectively determines the size of the internally used Kafka message queue. If the limit is reached, the kafka() source stops fetching messages from the broker, logs the situation, and waits the amount of time specified by fetch-queue-full-delay() before attempting to fetch new data again.
+
+**NOTE:** If more than 2 workers are configured and separated-worker-queues() is set to `yes`, then all processor workers share this total queue size.  
+For example, with `workers(3)` and `fetch-limit(100000)`, the 2 processor workers (remember, the first of the configured 3 is always the main worker) will each receive their own queue, and neither queue will grow beyond 50,000 messages.
+{: .notice--info}
+
+**NOTE:** This options worth align with the kafka config options `queued.min.messages` and `queued.max.messages.kbytes`, For details, refer to the librdkafka documentation.
+{: .notice--info}
+
+## log-fetch-delay()
+
+|  Type: | integer [1 second / fetch_retry_delay * 1000000 milliseconds] |
+|Default:| 1000 (1 millisecond) |
+
+*Description:* Specifies the time the main worker will wait between attempts to fetch new data.
+
+## log-fetch-retry-delay()
+
+|  Type: | integer [1 second / fetch_retry_delay * 1000000 milliseconds] |
+|Default:| 10000 (10 milliseconds)|
+
+*Description:* Specifies the time the main worker will wait before attempting to fetch new data again when the broker signals no more data is available.
+
+## log-fetch-queue-full-delay()
+
+|  Type: | integer in milliseconds |
+|Default:| 1000 |
+
+*Description:* When the main worker reaches the queued message limit defined by fetch-limit(), the kafka() source temporarily stops retrieving messages from the broker. It then waits for the duration specified by `fetch-queue-full-delay()` before attempting to fetch additional messages.
+
+{% include doc/admin-guide/options/persist-name.md %}
+
+## poll-timeout()
+
+|  Type: | integer in milliseconds |
+|Default:| 10000 |
+
+*Description:* Specifies the maximum amount of time {{ site.product.short_name }} waits during a Kafka broker poll request for new messages to become available.
+
+{% include doc/admin-guide/options/read-old-records.md %}\
+See Bookmarking in the kafka() source for more details.
+
+## separated-worker-queues()
+
+|  Type: | yes \| no |
+|Default:| no |
+
+*Description:* When the value of workers() is greater than 2 (meaning multiple processor threads are used to handle queued messages), and `separated-worker-queues()` is set to `yes`, the main worker of the kafka() source distributes the consumed messages into separate queues, one for each processor worker.
+
+**NOTE:** This approach can improve performance, especially in high-throughput scenarios, but may also lead to significantly increased memory usage.
+{: .notice--info}
+
+## strategy-hint()
+
+| Accepted values: | assign, subscribe |
+| Default: | assign |
+
+*Description:* This option provides a hint about which Kafka consumer strategy the kafka() source should use when the topic() list contains topic/partition definitions that could be handled in either way.
+
+Why is it worth using dual consumer strategies? describes the differences between the two.
+
+For details about how the resulting topic names, partitions, and Kafka assign/subscribe strategies are determined in different scenarios, see Basic startegy usage cross-reference of the different topic configuration cases
+
+## time-reopen()
+
+|  Type: | integer in seconds |
+|Default:| 60 |
+
+*Description:* The time {{ site.product.short_name }} waits between attempts to recover from errors that require re-initialization of the full kafka connection and its internally used data structures.
+
+## topic()
+
+|  Type:   |  key-value pairs |
+|Default:  |  N/A    |
+|Mandatory:|  yes    |
+
+*Description:* A list of pairs consisting of Kafka topic name(s) and partition number(s) from which messages are consumed, for example:
+
+``` config
+topic( 
+  "^topic-name-[13]$" => "-1"
+  "topic-name-2" => "1"
+  "topic-name-4" => "-1"
+  "topic-name-5" => "0,1,4"
+} 
+```
+
+Valid topic names have the following limitations:
+
+- The topic name must either contain only characters matching the pattern `[-._a-zA-Z0-9]`, or it can be a regular expression.  
+  For example: `^topic-name-[13]$` (which expands to `topic-name-1` and `topic-name-3`).
+- The length of the topic name must be between 1 and 249 characters.
+
+The partition number must be:
+
+- either a single partition number or a comma-separated list of partition numbers
+- a positive integer, or `-1`, which means all partitions of the topic
+
+For details about how the resulting topic names, partitions, and Kafka assign/subscribe strategies are determined in different scenarios, see Basic startegy usage cross-reference of the different topic configuration cases and Why is it worth using dual consumer strategies?
+
+## workers()
+
+|  Type: | integer |
+|Default:| 2 |
+
+*Description:* The number of workers the `kafka()` source uses to consume and process messages from the kafka broker. By default, uses two of them:
+
+{% include doc/admin-guide/options/kafka-source-workers.md %}
+
+![]({{ site.baseurl}}/assets/images/caution.png) **CAUTION:**
+Only kafka() sources with `workers()` set to less than 3 can guarantee ordered message forwarding.
+{: .notice--warning}
+
+**NOTE:** Kafka clients have their own threadpool, entirely independent from
+any {{ site.product.short_name }} settings. The `workers()` option has no effect on this threadpool.
+{: .notice--info}
diff --git a/doc/_admin-guide/060_Sources/038_Kafka/README.md b/doc/_admin-guide/060_Sources/038_Kafka/README.md
@@ -0,0 +1,71 @@
+---
+title: 'kafka(): Consuming messages from Apache Kafka using the librdkafka client'
+short_title: kafka
+id: adm-src-kafka
+description: >-
+  Starting with version 4.11, {{ site.product.name }} can directly fetch log messages from the Apache Kafka message bus.
+---
+
+The kafka() source can fetch messages from explicitly named or wildcard-matching Kafka topics, and from a single partition,
+explicitly listed partitions, or all partitions of the selected topic(s). It can use two different strategies
+— `assign` or `subscribe` — to start consuming messages from the selected partition(s).
+The strategy is determined automatically based on the topic() option definitions and the strategy-hint() option.\
+The basic rule is the following:
+
+`subscribe` is used if the topic name contains characters that are not allowed in standard Kafka topic names
+(in which case the topic name is treated as a regular expression), if the partition number is `-1`, or if the value
+of strategy-hint() is `subscribe` (except when multiple partition numbers are provided for the
+same topic name — this will raise an error).
+
+`assign` (the default) is used if the topic name contains only valid Kafka topic characters (for example,
+no regexp-related characters) and only positive partition numbers are specified.
+
+## Basic startegy usage cross-reference of the different topic configuration cases
+
+| topic(...) in config                                  | topic name(s)              | part. number(s) | strategy-hint() | resulting strategy |
+|-------------------------------------------------------|----------------------------|-----------------|-----------------|--------------------|
+| topic( "topic-name-1" => "1" }                        | topic-name-1               | 1               | assign          | assign             |
+| topic( "topic-name-1" => "1" }                        | topic-name-1               | 1               | subscribe       | subscribe          |
+| topic( "topic-name-1" => "1,2" }                      | topic-name-1               | 1-2             | assign          | assign             |
+| topic( "topic-name-1" => "1,2" }                      | topic-name-1               | 1-2             | subscribe       | N/A (error)        |
+| topic( "topic-name-1" => "1" "topic-name-1" => "2" }  | topic-name-1               | 1-2             | assign          | assign             |
+| topic( "topic-name-1" => "1" "topic-name-1" => "2" }  | topic-name-1               | 1-2             | subscribe       | N/A (error)        |
+| topic( "topic-name-1" => "1" "topic-name-3" => "2" }  | topic-name-1, topic-name-3 | 1, 2            | assign          | assign             |
+| topic( "topic-name-1" => "1" "topic-name-3" => "2" }  | topic-name-1, topic-name-3 | 1, 2            | subscribe       | subscribe          |
+| topic( "topic-name-1" => "-1" }                       | topic-name-1               | all             | assign          | subscribe          |
+| topic( "topic-name-1" => "-1" }                       | topic-name-1               | all             | subscribe       | subscribe          |
+| topic( "topic-name-1" => "1" "topic-name-3" => "-1" } | topic-name-1, topic-name-3 | 1, all          | assign          | subscribe          |
+| topic( "topic-name-1" => "1" "topic-name-3" => "-1" } | topic-name-1, topic-name-3 | 1, all          | subscribe       | subscribe          |
+| topic( "topic-name-3" => "1" "topic-name-3" => "-1" } | topic-name-1, topic-name-3 | 1, all          | assign          | subscribe          |
+| topic( "topic-name-3" => "1" "topic-name-3" => "-1" } | topic-name-1, topic-name-3 | 1, all          | subscribe       | subscribe          |
+| topic( "^topic-name-[13]$" => "2" }                   | topic-name-1, topic-name-3 | 2, 2            | assign          | subscribe          |
+| topic( "^topic-name-[13]$" => "2" }                   | topic-name-1, topic-name-3 | 2, 2            | subscribe       | subscribe          |
+| topic( "^topic-name-[13]$" => "-1" }                  | topic-name-1, topic-name-3 | all, all        | assign          | subscribe          |
+| topic( "^topic-name-[13]$" => "-1" }                  | topic-name-1, topic-name-3 | all, all        | subscribe       | subscribe          |
+
+## Why is it worth using dual consumer strategies?
+
+Using both consumer strategies — `assign` and `subscribe` — provides the flexibility to adapt to a wide range of Kafka setups and practical use cases, instead of forcing a single approach that may not fit all scenarios.
+
+- `assign` is ideal when full control and predictability are required.
+  - You can explicitly target a known set of topics and partitions.
+  - Guarantees ordering semantics more reliably in single-partition or controlled multi-partition scenarios.
+  - Works well in environments where the topic layout is static and predefined.
+
+- `subscribe` is valuable when flexibility matters more than strict control.
+  - It supports regular expressions, making it suitable when topic names follow patterns or when topics may appear dynamically.
+  - It automatically handles partition assignments inside a consumer group, reducing configuration overhead.
+  - It integrates better with scaling scenarios or when consumers should share workload automatically.
+  - The possible drawbacks of unordered and/or repeated messages are acceptable.
+
+By supporting both approaches, {{ site.product.short_name }} can be used effectively in a variety of Kafka consumption models — from tightly controlled, partition-specific pipelines to dynamic and scalable consumer setups that evolve with the broker configuration.
+
+## Bookmarking in the kafka() source
+
+By default, {{ site.product.short_name }} stores the offset of the last read message of each topic it consumes in its own persist file. This can be disabled using the disable-bookmarks() option. Automatic offset restoration takes effect at startup or reload, based on the saved offset value and the ignore-saved-bookmarks() and read-old-record() settings. If ignore-saved-bookmarks() is set to `yes`, it will not use the saved offset. Instead, if read-old-record() is set to `yes`, it will start fetching from the oldest available message, otherwise it will start from the newest one.
+
+## Multiple workers in the kafka() source
+
+The kafka() source can fetch and process messages from the fafka broker using multiple workers(), by default 2 of them:
+
+{% include doc/admin-guide/options/kafka-source-workers.md %}
diff --git a/doc/_admin-guide/070_Destinations/100_Kafka-c/003_Kafka-c_options.md b/doc/_admin-guide/070_Destinations/100_Kafka-c/003_Kafka-c_options.md
diff --git a/doc/_admin-guide/070_Destinations/100_Kafka-c/README.md b/doc/_admin-guide/070_Destinations/100_Kafka-c/README.md