Data Catalog

What is a Data Catalog

A data catalog is a comprehensive inventory of data assets within an organization, designed to help users find, understand, and use data effectively.

Metadata in Data Catalog

Metadata is the descriptive information about data, providing context and meaning. In a data catalog, metadata can be categorized into several types:

  1. Technical Metadata:

    • Schema Information: Details about the structure of the data, such as tables, columns, data types, indexes, and constraints.

    • Data Source Information: Information about where the data originates, such as database names, server locations, and connection strings.

    • Relationship: Defines how a particular data is related to other data sets .

  2. Business Metadata:

    • Business Glossary: Definitions and descriptions of business terms and concepts to ensure a common understanding across the organization.

    • Data Ownership: Information about who is responsible for the data, including data stewards and data owners.

    • Usage Context: Information about how and why the data is used in business processes and decision-making.

  3. Operational Metadata:

    • Data Quality Metrics: Information about the accuracy, completeness, consistency, and timeliness of the data.

    • Access and Usage Statistics: Details about who accessed the data, when it was accessed, and how frequently it is used.

    • Processing Metadata: Information about data processing jobs, such as ETL (Extract, Transform, Load) processes, including job schedules, statuses, and logs.

  4. Governance Metadata:

    • Policies and Compliance: Information about data governance policies, regulatory requirements, and compliance statuses.

    • Security Information: Details about data security measures, including encryption, masking, and access controls.

VDA Data Catalog :

Data Catalog in VDA consists of below components

Data Source

A data source is a logical or physical grouping of related data assets, which can include tables, files, objects, or even dashboards

Datasets

A dataset is a collection of related data, typically organized into tables, files, or objects, that is used for analysis, reporting, or other data-driven tasks

Last updated