Datasets
Last updated
Last updated
Datasets are the lifeblood of data analysis, and Virtual Data Assistant (VDA) empowers users with a feature-rich environment to harness the full potential of data. Datasets within the VDA serve as powerful entities that showcase data as a product, providing valuable insights and enabling data-driven decision-making. Let's delve into the key features of Datasets:
Types of Datasets
Live
Live datasets refer to datasets that are directly connected to real data sources. These datasets are updated and reflect the most current information available from the connected data sources. VDA establishes and maintains live connections to these data sources, ensuring that any changes or updates in metadata of source data are immediately reflected in the corresponding dataset.
Custom
Custom datasets, on the other hand, are datasets that are created and defined by users to meet specific analytical requirements. Unlike live datasets, custom datasets are not directly linked to real-time data sources. Instead, users define the data they want to include in the dataset and may apply data transformations to manipulate and refine the data to suit their analytical needs.
1. Metadata, Overview, and Tags:
Datasets are accompanied by comprehensive metadata, offering essential information about the data they contain. This metadata includes a list of columns, descriptions for each column, and data types, providing users with a clear understanding of the dataset's structure and content. Additionally, an Overview section offers a concise summary of the dataset's key characteristics and significance.
To further enhance organization and accessibility, users can add tags to datasets. Tags serve as labels or keywords, making it easier to search and categorize datasets effectively.
2. Preview - See Live Data
Datasets come to life with the "Preview" feature, offering users a glimpse of the actual data contained within. This live data preview allows analysts to quickly assess the dataset's contents before diving into in-depth analyses. It also aids in identifying any potential data issues or anomalies that may require attention.
3. Stories - Fostering Collaborative Discussions:
The "Stories" section provides a collaborative platform for users to engage in discussions about datasets. It serves as a dynamic forum where data analysts, stakeholders, and team members can share insights, ask questions, and exchange valuable perspectives related to the dataset. Additionally, users can upload important documents, transforming the Stories section into a central hub for discussions and knowledge-sharing.
4. Transformations - Empowering Custom Datasets
For custom datasets, the VDA offers the "Transformations" feature. This powerful tool enables users to define data transformations using templates, facilitating data modeling and customization. Analysts can upload transformation logic and document the process, streamlining the preparation and refinement of custom datasets.
5. Publish and Generate SQL - Effortless Code Generation:
Users can publish versions for transformations and also utilize the generate SQL option to generate code for the defined transformations .
The "Generate SQL" feature leverages the power of OpenAI to automate code generation based on the logic defined in transformations. With just a few clicks, users can effortlessly convert their transformation logic into SQL code. The code generated consists of control table expressions, ensuring compatibility with most SQL-based data warehouses. This feature significantly reduces manual coding efforts and accelerates data processing.
The ER Diagram tab visualizes the entity-relationship model of the dataset, showing tables, columns, and relationships. This graphical representation helps users understand the data schema and how different entities are interconnected. It highlights primary and foreign key relationships, making it easier to comprehend complex data structures. Supports database design and optimization by revealing the dataset's architecture. Useful for both technical and non-technical users to gain a high-level overview of data organization.
In conclusion, datasets within the Virtual Data Assistant serve as invaluable assets, providing a holistic view of data and fostering collaborative discussions. With features like live data previews, data transformation templates, and automated SQL code generation, analysts can efficiently explore, analyze, and model data, unlocking valuable insights and making data-driven decisions with ease.