Data Ingestion

Data Ingestion is the process of collecting, importing, and processing data from various sources into a centralized data repository or system, making it ready for analysis and utilization. This step is critical in any data pipeline as it ensures that data is available and accessible in the desired format for subsequent processing, analysis, and decision-making.

Key Components of Data Ingestion

  1. Source Systems

    • Definition: The origins of the data being ingested. These can include databases, APIs, flat files, cloud storage, sensors, and more.

    • Variety: Data can come from structured sources like SQL databases, semi-structured sources like JSON files, or unstructured sources like text documents.

VDA connectors:

List of Available Connectors

Create the Data Source from which data is to be Ingested

Navigate to Datasource Tab and click on the desired Data source to find the list of associated Datasets

For New Ingestion Workbook creation, Navigate to Workbook, click on Ingestion Book create

  1. Ingestion Methods

    • Batch Processing: Data is collected and processed in large chunks at scheduled intervals.

      • Use Cases: Suitable for use cases where real-time data is not necessary, such as end-of-day reports or periodic data archiving.

      • It can be:

        • Full refresh

        • Incremental

        • Historical

To create a Schedule, Navigate to Schedule, click on Plus and enter the details of Ingestion Workbook and Submit:

Name of Schedule

Name of Ingestion workbook

Frequency

Start Date

Last updated