Kafka流处理平台

2018-09-15 | 阅读：次 |本站总访问量次 | 本站访客数人

LinkedIn开源

Apache

streaming platform has three key capabilities；

publish and subscribe to streams of records,similar to a message queue or enterprise messaging system.
store streams of records in a fault-torlerant durable way.
process streams of records as they occur.

Kafka is generally used for two broad classes of applications:

building real-time streaming data pipelines that reliably get data between systems or applications
building real-time streaming applications that transform or react to the streams of data

producer：消息和数据的生产者，向Kafka的一个topic发布消息的进程/代码/服务
consumer：消息和数据的消费者，订阅数据（Topic）并且处理其发布的消息的进程/代码/服务
consumer group：逻辑概念，对于同一个topic，会广播给不同的group，一个group中，只有一个consumer可以消费该消息
broker：物理概念，kafka集群中的每个kafka节点
topic：逻辑概念，kafka消息的类别，对数据进行区分、隔离
partition：物理概念，kafka下数据存储的基本单元。一个topic数据，会被分散存储到多个partition，每一个partition是有序的
replication：同一个partition可能会有多个replica，多个replica之间数据是一样的
replication leader：一个partition的多个replica上，需要一个leader负责该partition上与producer和consumer交互
replicaManager：负责管理当前broker所有分区和副本的信息，处理kafkacontroller发起的一些请求，副本状态的切换、添加/读取信息等

replication特点:

此处输入图片的描述

分布式

消息队列行为跟踪
元信息监控
日志收集
流处理
事件源
持久性日志（commit log）

肖申克赫本