Data Preparation: Data Governance Check
Data Preparation: Data Governance Check
Choose appropriate preparation methods based on data structure and business requirements. We recommend starting with data profiling to understand data quality and characteristics before deciding on multi-table integration or constraint definitions.
flowchart
Start[Data Preparation] --> Describer[Data<br/>Profiling]
Describer --> MultiTable{Multi-table<br/>Data?}
MultiTable -->|Yes| Denormalize[Multi-table<br/>Denormalization]
MultiTable -->|No| ConstraintCheck{Need<br/>Constraints?}
Denormalize --> ConstraintCheck
ConstraintCheck -->|Yes| Constraints[Business Logic<br/>Constraints]
ConstraintCheck -->|No| Complete[Preparation<br/>Complete]
Constraints --> Complete
%% Macaron color scheme
style Start fill:#B0E0E6,stroke:#87CEEB,stroke-width:2px,color:#333
style Describer fill:#B4E7CE,stroke:#98D8C8,stroke-width:2px,color:#333
style MultiTable fill:#E6E6FA,stroke:#DDA0DD,stroke-width:2px,color:#333
style Denormalize fill:#B4E7CE,stroke:#98D8C8,stroke-width:2px,color:#333
style ConstraintCheck fill:#E6E6FA,stroke:#DDA0DD,stroke-width:2px,color:#333
style Constraints fill:#B4E7CE,stroke:#98D8C8,stroke-width:2px,color:#333
style Complete fill:#D3D3D3,stroke:#A9A9A9,stroke-width:2px,color:#333Legend:
- Light blue box: Starting point
- Light purple box: Decision node
- Light green box: Action node
Data Preparation Workflow
Follow these preparation steps based on your data characteristics:
Step 1: Data Profiling
- Data Profiling - Starting point for all data preparation (Required)
- Generate statistical reports using Describer module
- Review basic statistical information
- Identify data quality issues
- Understand data distribution characteristics
Step 2: Multi-table Data Processing
- Multi-table Relationships - When data is scattered across related tables
- Use database denormalization to integrate multiple tables
- Choose appropriate granularity based on downstream tasks
- Provides Python pandas and SQL integration examples
- Avoids immature multi-table synthesis techniques
Step 3: Constraint Definition
- Business Logic Constraints - When business rules need to be enforced
- Define logical relationships between fields
- Maintain category distributions and missing value ratios
- Use Constrainer for validation and filtering
- Provides complete YAML configuration examples
Next Steps
After completing data preparation, you can:
- Refer to Getting Started to begin data synthesis
- Check Best Practices for handling special data types
- Learn more about PETsARD YAML configuration details