Executor API

Executor API

Executor is PETsARD’s core orchestration module, responsible for parsing configuration, executing workflow modules in sequence, and providing result access.

Class Architecture

Basic Usage

from petsard import Executor

# Load configuration and execute
exec = Executor(config='config.yaml')
exec.run()

# Get results
results = exec.get_result()
timing = exec.get_timing()

Constructor

Syntax

Executor(config: str)

Parameters

ParameterTypeRequiredDescription
configstrYesConfiguration input: YAML file path or YAML string

Return Value

Returns an Executor instance with initialized Config and Status.

Usage Examples

Example 1: Load from YAML file

from petsard import Executor

# Create Executor from file
exec = Executor(config='workflow_config.yaml')
print("Configuration loaded successfully")

Example 2: Use YAML string

from petsard import Executor

# Define configuration as YAML string
config_yaml = """
Loader:
  load_csv:
    filepath: data/input.csv

Synthesizer:
  generate:
    method: sdv
    model: GaussianCopula
    num_samples: 1000
"""

# Create Executor from YAML string
exec = Executor(config=config_yaml)
exec.run()

Example 3: Dynamic YAML string generation

from petsard import Executor

# Generate YAML string dynamically
def create_config_yaml(filepath, model_name):
    return f"""
Loader:
  load_data:
    filepath: {filepath}

Synthesizer:
  generate:
    method: sdv
    model: {model_name}
"""

# Use dynamic configuration
config = create_config_yaml('data/input.csv', 'CTGAN')
exec = Executor(config=config)
exec.run()

Configuration Options

Executor supports execution-related configuration options in the YAML file:

Executor:
  log_output_type: "both"    # Log output location: "stdout", "file", "both"
  log_level: "INFO"          # Log level
  log_dir: "./logs"          # Log file directory
  log_filename: "PETsARD_{timestamp}.log"  # Log file name template

# Other module configurations
Loader:
  load_data:
    filepath: data.csv

Configuration Parameters

ParameterTypeDefaultDescription
log_output_typestr"file"Log output location: "stdout", "file", "both"
log_levelstr"INFO"Log level: "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"
log_dirstr"."Log file storage directory
log_filenamestr"PETsARD_{timestamp}.log"Log file name template (supports {timestamp} placeholder)

Methods

run()

Execute the workflow based on configuration.

exec = Executor(config='config.yaml')
exec.run()

Note: In v2.0.0, this method will return execution status (success/failed) instead of None.

get_result()

Get execution results containing DataFrames and Schemas for all experiments.

results = exec.get_result()

# Result structure
# {
#   'Loader[experiment_1]_Synthesizer[method_a]': {
#     'data': DataFrame,
#     'schema': Schema
#   },
#   ...
# }

get_timing()

Get execution timing report showing time spent by each module and step.

timing_df = exec.get_timing()
print(timing_df)

Returns a pandas DataFrame with timing information.

is_execution_completed()

Check if workflow execution has completed.

if exec.is_execution_completed():
    print("Execution completed")
    results = exec.get_result()

Note: This method will be deprecated in v2.0.0. Use the return value of run() instead.

get_inferred_schema(module)

Get inferred Schema for specified module.

# Get inferred Preprocessor Schema
inferred_schema = exec.get_inferred_schema('Preprocessor')
if inferred_schema:
    print(f"Inferred Schema: {inferred_schema.id}")

Parameters:

  • module (str): Module name (e.g., ‘Preprocessor’)

Returns: Inferred Schema, or None if not exists

Workflow Execution

Executor executes modules in the following order:

  1. Loader - Data loading
  2. Preprocessor - Data preprocessing (optional)
  3. Splitter - Data splitting (optional)
  4. Synthesizer - Data synthesis
  5. Postprocessor - Data postprocessing (optional)
  6. Constrainer - Constraint validation (optional)
  7. Evaluator - Data evaluation (optional)
  8. Reporter - Result reporting (optional)

Internal Components

ExecutorConfig

Configuration dataclass for Executor settings:

@dataclass
class ExecutorConfig:
    log_output_type: str = "file"
    log_level: str = "INFO"
    log_dir: str = "."
    log_filename: str = "PETsARD_{timestamp}.log"

Config

Configuration management class responsible for parsing YAML configuration and building execution sequence. See Config API.

Status

Status tracking class responsible for recording execution history and metadata changes. See Status API.

Multiple Experiments

Executor automatically handles combinations of multiple experiments:

Loader:
  experiment_1:
    filepath: data1.csv
  experiment_2:
    filepath: data2.csv

Synthesizer:
  method_a:
    method: sdv
    model: GaussianCopula
  method_b:
    method: sdv
    model: CTGAN

This generates 4 experiment combinations:

  • Loader[experiment_1]_Synthesizer[method_a]
  • Loader[experiment_1]_Synthesizer[method_b]
  • Loader[experiment_2]_Synthesizer[method_a]
  • Loader[experiment_2]_Synthesizer[method_b]

Notes

  • Input Type Detection: Executor automatically detects whether config is a file path or YAML string
  • Configuration Validation: Config automatically validates configuration content during initialization
  • Path Handling: File paths support absolute and relative paths
  • Error Reporting: Provides detailed error messages for configuration and YAML parsing errors
  • Logging: Execution process generates detailed log records
  • Module Order: Executor automatically executes modules in correct order
  • Single Instance: Each Executor instance manages one workflow
  • Execution Order: Must call run() before get_result() and get_timing()