Data ingestion is an art and science in itself. Ingesting data effectively into a Hadoop cluster or any other data store, requires a good understanding of the source and sink with an ability to configure data pipelines. Ingestion becomes a complex task, as we source events from multiple sources in parallel and need to deliver them to various destinations in real-time. High-speed data ingestion is especially critical when implementing real-time analytics.
To provide a thorough understanding of Flume configuration & Kafka. Participants will be able to implement practical data flows in their projects.Pre-requisite : Some programming background, preferably Java.
The program is designed to provide an overview of Cassandra. Key concepts in each area will be explained and working code provided. Participants will be able to run the examples and expected to understand code on their own with some pointers. Detailed code walk-though is not provided. Code is written in Java.