he Cleanvoice Python SDK throws a Pydantic validation error when `export_timestamps=True` is set in the processing configuration

# Bug Report: Python SDK Validation Error with export_timestamps

**Date**: October 21, 2025  
**Reporter**: Chuck Conway  
**SDK Version**: cleanvoice-sdk 1.0.1  
**Python Version**: 3.12.1  
**Platform**: macOS (Darwin 25.0.0)

---

## Summary

The Cleanvoice Python SDK throws a Pydantic validation error when `export_timestamps=True` is set in the processing configuration. The API successfully processes the audio and generates timestamp markers, but the SDK cannot retrieve the results due to a schema mismatch between the expected and actual API response format.

**Severity**: High - Feature is unusable via SDK  
**Impact**: Users cannot access timestamp markers programmatically  
**Workaround**: Manual download from dashboard

---

## Bug Description

### What Happens

When processing audio with `export_timestamps: True`:
1. ✅ Audio uploads successfully
2. ✅ Processing completes successfully (confirmed on server)
3. ✅ Timestamp marker files are generated (URLs visible in error)
4. ❌ SDK throws `ValidationError` when attempting to retrieve results
5. ❌ Processed audio and markers cannot be downloaded via SDK

### Expected Behavior

The SDK should successfully retrieve the processed audio and timestamp marker files when `export_timestamps=True` is configured.

### Actual Behavior

SDK raises `pydantic_core._pydantic_core.ValidationError` with 8 validation errors, preventing access to the successfully processed results.

---

## Steps to Reproduce

### Minimal Reproduction

```python
from cleanvoice import Cleanvoice

# Initialize SDK
cv = Cleanvoice({"api_key": "your-api-key"})

# Configure with export_timestamps enabled
config = {
    "long_silences": True,
    "normalize": True,
    "export_format": "mp3",
    "transcription": True,
    "export_timestamps": True,  # THIS TRIGGERS THE BUG
}

# Process audio
try:
    result = cv.process("audio.wav", config)
    # This line is never reached due to validation error
    print(result.audio.url)
except Exception as e:
    print(f"Error: {e}")
    # ValidationError is raised here
```

### Full Test Script

See attached: `test_cleanvoice_silence_removal.py` (lines 281-365)

---

## Error Details

### Complete Error Message

```
pydantic_core._pydantic_core.ValidationError: 8 validation errors for RetrieveEditResponse
result.ProcessingProgress.done
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.total
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.state
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.phase
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.step
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.substep
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.job_name
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.EditResult.timestamps_markers_urls
  Input should be a valid list [type=list_type, input_value={'markers_reaper': 'https...096eddeea3e1a5800c5750'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/list_type
```

### Stack Trace

```
File ".../cleanvoice/cleanvoice.py", line 91, in process
    result = self._poll_for_completion(edit_id)
File ".../cleanvoice/cleanvoice.py", line 321, in _poll_for_completion
    response = self.api_client.retrieve_edit(edit_id)
File ".../cleanvoice/client.py", line 85, in retrieve_edit
    return RetrieveEditResponse(**response_data)
File ".../pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
```

---

## Root Cause Analysis

### Schema Mismatch

The SDK's Pydantic models don't match the actual API response format.

#### Issue 1: `timestamps_markers_urls` Type Mismatch

**SDK Expected Type** (in `models.py` or similar):
```python
class EditResult(BaseModel):
    timestamps_markers_urls: List[str]  # ❌ Wrong type
```

**API Actual Response**:
```json
{
  "result": {
    "timestamps_markers_urls": {
      "markers_reaper": "https://r2.cloudflarestorage.com/.../markers.txt"
    }
  }
}
```

The API returns a **dictionary** mapping marker format names to URLs, not a list of URLs.

**Correct Type Should Be**:
```python
timestamps_markers_urls: Optional[Dict[str, str]] = None
```

Where:
- Key = marker format name (e.g., "markers_reaper", "markers_premiere")
- Value = CDN URL to download the marker file

#### Issue 2: `ProcessingProgress` Required Fields

When processing is complete, the API returns an `EditResult` object, but the SDK tries to validate it against `ProcessingProgress` first, which has required fields that don't exist in the success response.

**SDK Definition**:
```python
class ProcessingProgress(BaseModel):
    done: int  # ❌ Required, but missing in success response
    total: int  # ❌ Required
    state: str  # ❌ Required
    phase: str  # ❌ Required
    step: str  # ❌ Required
    substep: str  # ❌ Required
    job_name: str  # ❌ Required
```

**Should Be**:
```python
class ProcessingProgress(BaseModel):
    done: Optional[int] = None  # ✅ Optional
    total: Optional[int] = None  # ✅ Optional
    state: Optional[str] = None  # ✅ Optional
    phase: Optional[str] = None  # ✅ Optional
    step: Optional[str] = None  # ✅ Optional
    substep: Optional[str] = None  # ✅ Optional
    job_name: Optional[str] = None  # ✅ Optional
```

---

## Suggested Fix

### 1. Update EditResult Model

**File**: `cleanvoice/models.py` (or wherever models are defined)

**Change**:
```python
# Before:
timestamps_markers_urls: List[str]

# After:
timestamps_markers_urls: Optional[Dict[str, str]] = None
```

### 2. Make ProcessingProgress Fields Optional

**File**: `cleanvoice/models.py`

**Change**:
```python
class ProcessingProgress(BaseModel):
    done: Optional[int] = None
    total: Optional[int] = None
    state: Optional[str] = None
    phase: Optional[str] = None
    step: Optional[str] = None
    substep: Optional[str] = None
    job_name: Optional[str] = None
```

### 3. Support Union Type for Result

**File**: `cleanvoice/models.py`

```python
from typing import Union

class RetrieveEditResponse(BaseModel):
    status: str
    result: Optional[Union[EditResult, ProcessingProgress]] = None
    error: Optional[str] = None
    
    # When status == "SUCCESS", result is EditResult
    # When status == "PROCESSING", result is ProcessingProgress
```

---

## Additional Context

### API Response Format (Observed)

When processing completes successfully with `export_timestamps=True`:

```json
{
  "status": "SUCCESS",
  "id": "edit-uuid-here",
  "created_at": "2025-10-21T13:26:16Z",
  "updated_at": "2025-10-21T13:32:22Z",
  "result": {
    "video": false,
    "filename": "original_audio.wav",
    "download": {
      "url": "https://r2.cloudflarestorage.com/.../original_audio_clean.mp3"
    },
    "timestamps_markers_urls": {
      "markers_reaper": "https://r2.cloudflarestorage.com/.../markers.txt"
    },
    "transcript": {
      "text": "...",
      "summary": "..."
    },
    "statistics": {
      "BREATH": null,
      "DEADAIR": 5.0,
      "STUTTERING": null,
      "MOUTH_SOUND": null,
      "FILLER_SOUND": null
    }
  }
}
```

### Marker File Format

The `markers_reaper` file contains cut timestamps in REAPER DAW format. This is exactly what users need to synchronize video cuts with audio silence removal.

**Example Content** (expected):
```
# Name         Start       End
REGION 1      10.5        15.3
REGION 2      20.1        25.9
```

---

## Impact Assessment

### User Impact

**Severity: High** - Core feature completely unusable via SDK

**Users Affected**: 
- Anyone using `export_timestamps=True`
- Video editors needing cut synchronization
- Automated pipelines requiring timestamp data

**Current Workarounds**:
1. Manual download from dashboard (time-consuming, not automatable)
2. Direct API calls bypassing SDK (no type safety, more complexity)
3. Disable `export_timestamps` (loses critical feature)

### Business Impact

- Feature advertised in SDK documentation but doesn't work
- Users cannot integrate Cleanvoice into video editing pipelines
- Forces manual workflow, reducing automation value proposition
- May drive users to competitors (Auphonic, Descript) with working SDKs

---

## Testing Evidence

### Test Runs Completed

We successfully processed 3 audio files with `export_timestamps=True`:

| Run Time | Duration | Status | Marker Files Generated |
|----------|----------|--------|------------------------|
| 13:05:23 | ~6 min | ✅ Success | ✅ Yes (visible in error) |
| 13:26:16 | ~6 min | ✅ Success | ✅ Yes (visible in error) |
| 13:42:19 | ~12 min | ✅ Success | ✅ Yes (visible in error) |

**Confirmation**: All three edits show the same error pattern with `markers_reaper` URLs visible in the truncated error output, proving the API is working correctly and the SDK validation is the problem.

### Verification

Files exist on Cleanvoice CDN:
- Processed audio files are accessible via dashboard
- Marker files are generated and hosted
- Only SDK validation prevents programmatic access

---

## Environment Details

```
Python: 3.12.1
SDK: cleanvoice-sdk 1.0.1
OS: macOS Darwin 25.0.0
Pydantic: 2.10.6 (via SDK dependencies)
Requests: 2.32.3

Installation Method:
pip install cleanvoice-sdk
```

### Dependencies (from SDK)

```
cleanvoice-sdk==1.0.1
├── pydantic>=2.0.0
├── requests>=2.25.0
├── typing-extensions>=4.0.0
├── soundfile>=0.12.0
├── librosa>=0.10.0
├── mutagen>=1.45.0
└── av>=10.0.0
```

---

## Proposed Test Case

After fixing the bug, add this test to your test suite:

```python
def test_export_timestamps():
    """Test that export_timestamps works correctly."""
    cv = Cleanvoice({"api_key": os.environ["CLEANVOICE_API_KEY"]})
    
    config = {
        "long_silences": True,
        "export_timestamps": True,
        "transcription": True,
    }
    
    result = cv.process("test_audio.wav", config)
    
    # Should not raise ValidationError
    assert result.status == "SUCCESS"
    
    # Should have marker URLs
    assert hasattr(result.audio, 'timestamps_markers_urls')
    assert result.audio.timestamps_markers_urls is not None
    
    # Should be a dict, not a list
    assert isinstance(result.audio.timestamps_markers_urls, dict)
    
    # Should contain at least one marker format
    assert len(result.audio.timestamps_markers_urls) > 0
    
    # Should be downloadable
    for marker_name, marker_url in result.audio.timestamps_markers_urls.items():
        response = requests.get(marker_url)
        assert response.status_code == 200
        assert len(response.text) > 0
```

---

## Documentation Gaps

The SDK documentation should clarify:

1. **Return Format**: When `export_timestamps=True`, explain that `timestamps_markers_urls` is a dictionary mapping marker format names to URLs

2. **Available Formats**: Document which marker formats are available:
   - `markers_reaper` - REAPER DAW format
   - Others? (Premiere, DaVinci, etc.?)

3. **Marker File Format**: Describe the structure of marker files

4. **Use Case**: Explain how to use markers to sync video cuts with audio silence removal


---

## Request

**Priority**: Please prioritize this fix as it blocks a core advertised feature.
**Timeline**: When can we expect a patched SDK release?
**Workaround**: Is there an official workaround while we wait for the fix?

---


---

**Thank you for your attention to this issue. The Cleanvoice service itself works beautifully - we just need the SDK to match the API!**


Run Time	Duration	Status	Marker Files Generated
13:05:23	~6 min	✅ Success	✅ Yes (visible in error)
13:26:16	~6 min	✅ Success	✅ Yes (visible in error)
13:42:19	~12 min	✅ Success	✅ Yes (visible in error)

he Cleanvoice Python SDK throws a Pydantic validation error when export_timestamps=True is set in the processing configuration #1

Description

Bug Report: Python SDK Validation Error with export_timestamps

Summary

Bug Description

What Happens

Expected Behavior

Actual Behavior

Steps to Reproduce

Minimal Reproduction

Full Test Script

Error Details

Complete Error Message

Stack Trace

Root Cause Analysis

Schema Mismatch

Issue 1: timestamps_markers_urls Type Mismatch

Issue 2: ProcessingProgress Required Fields

Suggested Fix

1. Update EditResult Model

2. Make ProcessingProgress Fields Optional

3. Support Union Type for Result

Additional Context

API Response Format (Observed)

Marker File Format

Impact Assessment

User Impact

Business Impact

Testing Evidence

Test Runs Completed

Verification

Environment Details

Dependencies (from SDK)

Proposed Test Case

Documentation Gaps

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

he Cleanvoice Python SDK throws a Pydantic validation error when `export_timestamps=True` is set in the processing configuration #1

Issue 1: `timestamps_markers_urls` Type Mismatch

Issue 2: `ProcessingProgress` Required Fields