# Data Ingestion

**Data Ingestion** is the process of collecting, importing, and processing data from various sources into a centralized data repository or system, making it ready for analysis and utilization. This step is critical in any data pipeline as it ensures that data is available and accessible in the desired format for subsequent processing, analysis, and decision-making.

#### Key Components of Data Ingestion

1. **Source Systems**
   * **Definition**: The origins of the data being ingested. These can include databases, APIs, flat files, cloud storage, sensors, and more.
   * **Variety**: Data can come from structured sources like SQL databases, semi-structured sources like JSON files, or unstructured sources like text documents.

VDA connectors:

**List of Available Connectors**

* [Amazon Athena](https://aws.amazon.com/athena/)
* [Amazon EventBridge](https://aws.amazon.com/eventbridge/)
* [Amazon Glue](https://aws.amazon.com/glue/) and anything built over it
* [Amazon Redshift](https://aws.amazon.com/redshift/)
* [Apache Cassandra](https://cassandra.apache.org/)
* [Apache Druid](https://druid.apache.org/)
* [Apache Hive](https://hive.apache.org/)
* CSV
* [dbt](https://www.getdbt.com/)
* [Delta Lake](https://delta.io/)
* [Elasticsearch](https://www.elastic.co/)
* [Google BigQuery](https://cloud.google.com/bigquery)
* [IBM DB2](https://www.ibm.com/analytics/db2)
* [Kafka Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html)
* [Microsoft SQL Server](https://www.microsoft.com/en-us/sql-server/default.aspx)
* [MySQL](https://www.mysql.com/)
* [Oracle](https://www.oracle.com/index.html) (through dbapi or sql\_alchemy)
* [PostgreSQL](https://www.postgresql.org/)
* [PrestoDB](http://prestodb.io/)
* [Trino (formerly Presto SQL)](https://trino.io/)
* [Vertica](https://www.vertica.com/)
* [Snowflake](https://www.snowflake.com/)

Create the Data Source from which data is to be Ingested

<figure><img src="/files/QJjSxN4Yh8F5yXYvAFtt" alt=""><figcaption><p>Data Source Creation</p></figcaption></figure>

Navigate to Datasource Tab and click on the desired Data source to find the list of associated Datasets

<figure><img src="/files/tBzpZHz7gb2x0R4ULEf1" alt=""><figcaption><p>Datasets</p></figcaption></figure>

For New Ingestion Workbook creation, Navigate to Workbook, click on Ingestion Book create&#x20;

<figure><img src="/files/jx7BaEZezO2S8PsGXBM1" alt=""><figcaption><p>Create New Ingestion Book</p></figcaption></figure>

1. **Ingestion Methods**
   * **Batch Processing**: Data is collected and processed in large chunks at scheduled intervals.

     * **Use Cases**: Suitable for use cases where real-time data is not necessary, such as end-of-day reports or periodic data archiving.&#x20;
     * It can be:
       * Full refresh
       * Incremental
       * Historical

     <figure><img src="/files/sd5qiWyBUmOeBAf8aZOe" alt=""><figcaption><p>Batch Ingestion</p></figcaption></figure>

To create a Schedule, Navigate to Schedule, click on Plus and enter the details of Ingestion Workbook and Submit:

Name of Schedule

Name of Ingestion workbook

Frequency

Start Date

<figure><img src="/files/vnR0IL51ljxTUbGIatL0" alt=""><figcaption><p>Create a schedule</p></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.vdalive.com/how-to-guides/data-analytics/data-ingestion.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
