DescriberAdapter
DescriberAdapter handles data description and comparison, supporting single dataset description and multi-dataset comparative analysis.
Class Architecture
classDiagram
class DescriberAdapter {
+config: dict
+source: dict
+describer: Describer
+__init__(config)
+run() dict~str, DataFrame~
}
class Describer {
+config: dict
+mode: str
+method: str
+create()
+eval(data) dict~str, DataFrame~
}
class BaseEvaluator {
<<abstract>>
+evaluate() dict
}
class DescriberDescribe {
+evaluate() dict
}
class DescriberCompare {
+evaluate() dict
}
DescriberAdapter ..> Describer : uses for description
Describer --> BaseEvaluator : creates
BaseEvaluator <|-- DescriberDescribe
BaseEvaluator <|-- DescriberCompare
%% Style definitions
class DescriberAdapter {
<<Main Class>>
}
style DescriberAdapter fill:#E6E6FA
class Describer {
<<Core Module>>
}
style Describer fill:#4169E1,color:#fff
style BaseEvaluator fill:#9370DB,color:#fff
style DescriberDescribe fill:#FFE4E1
style DescriberCompare fill:#FFE4E1
note for DescriberAdapter "1. Describe mode: Single dataset description\n2. Compare mode: Two datasets comparison\n3. Flexible source specification\n4. Auto-aligns data types before description"Legend:
- Light purple box: DescriberAdapter main class
- Blue box: Core description modules
- Purple box: Data alignment modules
- Light pink box: Configuration classes
..>: Dependency relationship-->: Ownership relationship
Main Features
- Unified data description interface
- Flexible data source selection (via
sourceparameter) - Two modes support: describe (single dataset), compare (dataset comparison)
- Automatic data type alignment (using Schema)
- Support for various statistical methods and JS Divergence calculation
Method Reference
__init__(config: dict)
Initialize a DescriberAdapter instance.
Parameters:
config: dict, required- Configuration parameters dictionary
- Must include
sourcekey (data source) - Optional
methodkey: default, describe, compare - Optional
modekey: automatically determined by source count
run(input: dict)
Execute data description or comparison, including automatic data type alignment.
Parameters:
input: dict, required- Input parameters dictionary
- Contains
datadictionary (datasets) - Optional
metadatafor data type alignment
Returns:
No direct return value. Use get_result() to retrieve results.
set_input(status)
Set input data for the describer.
Parameters:
status: Status, required- System status object
- Extracts data based on source configuration
Returns:
dict: Dictionary containing data required for description
get_result()
Retrieve description results.
Returns:
dict[str, pd.DataFrame]: Dictionary of description results
Usage Examples
Single Dataset Description
from petsard.adapter import DescriberAdapter
# Describe single dataset
adapter = DescriberAdapter({
"source": "Loader", # or ["Loader"]
"method": "describe",
"describe_method": ["mean", "median", "std", "corr"]
})
# Execute description
adapter.run({})
# Get results
results = adapter.get_result()Dataset Comparison
# Compare two datasets
adapter = DescriberAdapter({
"source": {
"base": "Splitter.train",
"target": "Synthesizer"
},
"method": "compare",
"stats_method": ["mean", "std", "jsdivergence"],
"compare_method": "pct_change"
})
# Execute comparison
adapter.run({})
# Get results
comparison_results = adapter.get_result()Workflow
- Source Parsing: Parse source parameter to determine data sources
- Mode Determination:
- 1 source: describe mode
- 2 sources: compare mode
- Data Collection: Collect specified data from Status
- Schema Retrieval: Attempt to get metadata for data alignment
- Data Type Alignment (when Schema is available)
- Execute Description or Comparison
Source Parameter Format
Describe Mode (Single Data Source)
# String format
source: "Loader"
# List format
source: ["Synthesizer"]Compare Mode (Two Data Sources)
# Dictionary format (recommended)
source:
base: "Splitter.train"
target: "Synthesizer"
# Backward compatibility format
source:
ori: "Splitter.train"
syn: "Synthesizer"Data Source Syntax
- Simple format:
"ModuleName"- Takes first available data from module - Precise format:
"ModuleName.key"- Takes specific keyed data from module- Examples:
"Splitter.train","Splitter.validation"
- Examples:
Notes
- This is an internal API, direct usage is not recommended
- Use YAML configuration files and Executor instead
- Compare mode reuses DescriberDescribe’s statistical functionality
- Parameter naming recommends using
base/targetinstead of legacyori/syn - Results are cached until next run() call