Welcome

YData Fabric is a Data-Centric AI development platform that accelerates AI development by helping data practitioners achieve production-quality data.

Much like for software engineering the quality of code is a must for the success of software development, Fabric accounts for the data quality requirements for data-driven applications. It introduces standards, processes, and acceleration to empower data science, analytics, and data engineering teams.

Data-Centric AI Approach

🚀 Fabric SDK was released! 🎉

We are excited to announce the release of the Fabric SDK! This major release marks the beginning of long-term support for the package, ensuring stability, continuous improvements, and ongoing support for all users. Fabric SDK empowers developers with easy access to state-of-the-art data quality tools and generative AI capabilities. Stay tuned for more updates and new features!

Why adopt YData Fabric?

With Fabric, you can standardize the understanding of your data, quickly identify data quality issues, streamline and version your data preparation workflows and finally leverage synthetic data for privacy-compliance or as a tool to boost ML performance. Fabric is a development environment that supports a faster and easier process of preparing data for AI development. Data practitioners are using Fabric to:

Establish a centralized and collaborative repository for data projects.
Create and share comprehensive documentation of data, encompassing data schema, structure, and personally identifiable information (PII).
Prevent data quality issues with standardized data quality profiling, providing visual understanding and warnings on potential issues.
Accelerate data preparation with customizable recipes.
Improve machine learning performance with optimal data preparation through solutions such as synthetic data.
Shorten access to data with privacy-compliant synthetic data generatio.
Build and streamline data preparation workflows effortlessly through a user-friendly drag-and-drop interface.
Efficiently manage business rules, conduct comparisons, and implement version control for data workflows using pipelines.

📝 Key features

Data Catalog

Fabric Data Catalog provides a centralized perspective on datasets within a project-basis, optimizing data management through seamless integration with the organization's existing data architectures via scalable connectors (e.g., MySQL, Google Cloud Storage, AWS S3). It standardizes data quality profiling, streamlining the processes of efficient data cleaning and preparation, while also automating the identification of Personally Identifiable Information (PII) to facilitate compliance with privacy regulations.

Explore how a Data Catalog through a centralized repository of your datasets, schema validation, and automated data profiling.

Labs

Fabric's Labs environments provide collaborative, scalable, and secure workspaces layered on a flexible infrastructure, enabling users to seamlessly switch between CPUs and GPUs based on their computational needs. Labs are familiar environments that empower data developers with powerful IDEs (Jupyter Notebooks, Visual Code or H2O flow) and a seamless experience with the tools they already love combined with YData's cutting-edge SDK for data preparation.

Learn how to use the Labs to generate synthetic data in a familiar Python interface.

Synthetic data

Synthetic data, enabled by YData Fabric, provides data developers with a user-friendly interfaces (UI and code) for generating artificial datasets, offering a versatile solution across formats like tabular, time-series and multi-table datasets. The generated synthetic data holds the same value of the original and aligns intricately with specific business rules, contributing to machine learning models enhancement, mitigation of privacy concerns and more robustness for data developments. Fabric offers synthetic data that is ease to adapt and configure, allows customization in what concerns privacy-utility trade-offs.

Learn how you to create high-quality synthetic data within a user-friendly UI using Fabric’s data synthesis flow.

Pipelines

Fabric Pipelines streamlines data preparation workflows by automating, orchestrating, and optimizing data pipelines, providing benefits such as flexibility, scalability, monitoring, and reproducibility for efficient and reliable data processing. The intuitive drag-and-drop interface, leveraging Jupyter notebooks or Python scripts, expedites the pipeline setup process, providing data developers with a quick and user-friendly experience.

Explore how you can leverage Fabric Pipelines to build versionable and reproducible data preparation workflows for ML development.

Tutorials

To understand how to best apply Fabric to your use cases, start by exploring the following tutorials:

Handling Imbalanced Data for Improved Fraud Detection
Learn how to implement high-performant fraud detection models by incorporating synthetic data to balance your datasets.
Prediction with Quality Inspection
Learn how to develop data preparation workflows with automated data quality checks and Pipelines.
Generating Synthetic Data for Financial Transactions
Learn how to use synthetic data generation to replicate your existing relational databases while ensuring referential integrity.

You can find additional examples and use cases at YData Academy GitHub Repository.

🙋 Support

Facing an issue? We’re committed to providing all the support you need to ensure a smooth experience using Fabric:

Create a support ticket: our team will help you move forward!
Contact a Fabric specialist: for personalized guidance or full access to the platform