-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Bug Report: Python SDK Validation Error with export_timestamps
Date: October 21, 2025
Reporter: Chuck Conway
SDK Version: cleanvoice-sdk 1.0.1
Python Version: 3.12.1
Platform: macOS (Darwin 25.0.0)
Summary
The Cleanvoice Python SDK throws a Pydantic validation error when export_timestamps=True is set in the processing configuration. The API successfully processes the audio and generates timestamp markers, but the SDK cannot retrieve the results due to a schema mismatch between the expected and actual API response format.
Severity: High - Feature is unusable via SDK
Impact: Users cannot access timestamp markers programmatically
Workaround: Manual download from dashboard
Bug Description
What Happens
When processing audio with export_timestamps: True:
- ✅ Audio uploads successfully
- ✅ Processing completes successfully (confirmed on server)
- ✅ Timestamp marker files are generated (URLs visible in error)
- ❌ SDK throws
ValidationErrorwhen attempting to retrieve results - ❌ Processed audio and markers cannot be downloaded via SDK
Expected Behavior
The SDK should successfully retrieve the processed audio and timestamp marker files when export_timestamps=True is configured.
Actual Behavior
SDK raises pydantic_core._pydantic_core.ValidationError with 8 validation errors, preventing access to the successfully processed results.
Steps to Reproduce
Minimal Reproduction
from cleanvoice import Cleanvoice
# Initialize SDK
cv = Cleanvoice({"api_key": "your-api-key"})
# Configure with export_timestamps enabled
config = {
"long_silences": True,
"normalize": True,
"export_format": "mp3",
"transcription": True,
"export_timestamps": True, # THIS TRIGGERS THE BUG
}
# Process audio
try:
result = cv.process("audio.wav", config)
# This line is never reached due to validation error
print(result.audio.url)
except Exception as e:
print(f"Error: {e}")
# ValidationError is raised hereFull Test Script
See attached: test_cleanvoice_silence_removal.py (lines 281-365)
Error Details
Complete Error Message
pydantic_core._pydantic_core.ValidationError: 8 validation errors for RetrieveEditResponse
result.ProcessingProgress.done
Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.total
Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.state
Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.phase
Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.step
Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.substep
Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.job_name
Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/missing
result.EditResult.timestamps_markers_urls
Input should be a valid list [type=list_type, input_value={'markers_reaper': 'https...096eddeea3e1a5800c5750'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/list_type
Stack Trace
File ".../cleanvoice/cleanvoice.py", line 91, in process
result = self._poll_for_completion(edit_id)
File ".../cleanvoice/cleanvoice.py", line 321, in _poll_for_completion
response = self.api_client.retrieve_edit(edit_id)
File ".../cleanvoice/client.py", line 85, in retrieve_edit
return RetrieveEditResponse(**response_data)
File ".../pydantic/main.py", line 214, in __init__
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
Root Cause Analysis
Schema Mismatch
The SDK's Pydantic models don't match the actual API response format.
Issue 1: timestamps_markers_urls Type Mismatch
SDK Expected Type (in models.py or similar):
class EditResult(BaseModel):
timestamps_markers_urls: List[str] # ❌ Wrong typeAPI Actual Response:
{
"result": {
"timestamps_markers_urls": {
"markers_reaper": "https://r2.cloudflarestorage.com/.../markers.txt"
}
}
}The API returns a dictionary mapping marker format names to URLs, not a list of URLs.
Correct Type Should Be:
timestamps_markers_urls: Optional[Dict[str, str]] = NoneWhere:
- Key = marker format name (e.g., "markers_reaper", "markers_premiere")
- Value = CDN URL to download the marker file
Issue 2: ProcessingProgress Required Fields
When processing is complete, the API returns an EditResult object, but the SDK tries to validate it against ProcessingProgress first, which has required fields that don't exist in the success response.
SDK Definition:
class ProcessingProgress(BaseModel):
done: int # ❌ Required, but missing in success response
total: int # ❌ Required
state: str # ❌ Required
phase: str # ❌ Required
step: str # ❌ Required
substep: str # ❌ Required
job_name: str # ❌ RequiredShould Be:
class ProcessingProgress(BaseModel):
done: Optional[int] = None # ✅ Optional
total: Optional[int] = None # ✅ Optional
state: Optional[str] = None # ✅ Optional
phase: Optional[str] = None # ✅ Optional
step: Optional[str] = None # ✅ Optional
substep: Optional[str] = None # ✅ Optional
job_name: Optional[str] = None # ✅ OptionalSuggested Fix
1. Update EditResult Model
File: cleanvoice/models.py (or wherever models are defined)
Change:
# Before:
timestamps_markers_urls: List[str]
# After:
timestamps_markers_urls: Optional[Dict[str, str]] = None2. Make ProcessingProgress Fields Optional
File: cleanvoice/models.py
Change:
class ProcessingProgress(BaseModel):
done: Optional[int] = None
total: Optional[int] = None
state: Optional[str] = None
phase: Optional[str] = None
step: Optional[str] = None
substep: Optional[str] = None
job_name: Optional[str] = None3. Support Union Type for Result
File: cleanvoice/models.py
from typing import Union
class RetrieveEditResponse(BaseModel):
status: str
result: Optional[Union[EditResult, ProcessingProgress]] = None
error: Optional[str] = None
# When status == "SUCCESS", result is EditResult
# When status == "PROCESSING", result is ProcessingProgressAdditional Context
API Response Format (Observed)
When processing completes successfully with export_timestamps=True:
{
"status": "SUCCESS",
"id": "edit-uuid-here",
"created_at": "2025-10-21T13:26:16Z",
"updated_at": "2025-10-21T13:32:22Z",
"result": {
"video": false,
"filename": "original_audio.wav",
"download": {
"url": "https://r2.cloudflarestorage.com/.../original_audio_clean.mp3"
},
"timestamps_markers_urls": {
"markers_reaper": "https://r2.cloudflarestorage.com/.../markers.txt"
},
"transcript": {
"text": "...",
"summary": "..."
},
"statistics": {
"BREATH": null,
"DEADAIR": 5.0,
"STUTTERING": null,
"MOUTH_SOUND": null,
"FILLER_SOUND": null
}
}
}Marker File Format
The markers_reaper file contains cut timestamps in REAPER DAW format. This is exactly what users need to synchronize video cuts with audio silence removal.
Example Content (expected):
# Name Start End
REGION 1 10.5 15.3
REGION 2 20.1 25.9
Impact Assessment
User Impact
Severity: High - Core feature completely unusable via SDK
Users Affected:
- Anyone using
export_timestamps=True - Video editors needing cut synchronization
- Automated pipelines requiring timestamp data
Current Workarounds:
- Manual download from dashboard (time-consuming, not automatable)
- Direct API calls bypassing SDK (no type safety, more complexity)
- Disable
export_timestamps(loses critical feature)
Business Impact
- Feature advertised in SDK documentation but doesn't work
- Users cannot integrate Cleanvoice into video editing pipelines
- Forces manual workflow, reducing automation value proposition
- May drive users to competitors (Auphonic, Descript) with working SDKs
Testing Evidence
Test Runs Completed
We successfully processed 3 audio files with export_timestamps=True:
| Run Time | Duration | Status | Marker Files Generated |
|---|---|---|---|
| 13:05:23 | ~6 min | ✅ Success | ✅ Yes (visible in error) |
| 13:26:16 | ~6 min | ✅ Success | ✅ Yes (visible in error) |
| 13:42:19 | ~12 min | ✅ Success | ✅ Yes (visible in error) |
Confirmation: All three edits show the same error pattern with markers_reaper URLs visible in the truncated error output, proving the API is working correctly and the SDK validation is the problem.
Verification
Files exist on Cleanvoice CDN:
- Processed audio files are accessible via dashboard
- Marker files are generated and hosted
- Only SDK validation prevents programmatic access
Environment Details
Python: 3.12.1
SDK: cleanvoice-sdk 1.0.1
OS: macOS Darwin 25.0.0
Pydantic: 2.10.6 (via SDK dependencies)
Requests: 2.32.3
Installation Method:
pip install cleanvoice-sdk
Dependencies (from SDK)
cleanvoice-sdk==1.0.1
├── pydantic>=2.0.0
├── requests>=2.25.0
├── typing-extensions>=4.0.0
├── soundfile>=0.12.0
├── librosa>=0.10.0
├── mutagen>=1.45.0
└── av>=10.0.0
Proposed Test Case
After fixing the bug, add this test to your test suite:
def test_export_timestamps():
"""Test that export_timestamps works correctly."""
cv = Cleanvoice({"api_key": os.environ["CLEANVOICE_API_KEY"]})
config = {
"long_silences": True,
"export_timestamps": True,
"transcription": True,
}
result = cv.process("test_audio.wav", config)
# Should not raise ValidationError
assert result.status == "SUCCESS"
# Should have marker URLs
assert hasattr(result.audio, 'timestamps_markers_urls')
assert result.audio.timestamps_markers_urls is not None
# Should be a dict, not a list
assert isinstance(result.audio.timestamps_markers_urls, dict)
# Should contain at least one marker format
assert len(result.audio.timestamps_markers_urls) > 0
# Should be downloadable
for marker_name, marker_url in result.audio.timestamps_markers_urls.items():
response = requests.get(marker_url)
assert response.status_code == 200
assert len(response.text) > 0Documentation Gaps
The SDK documentation should clarify:
-
Return Format: When
export_timestamps=True, explain thattimestamps_markers_urlsis a dictionary mapping marker format names to URLs -
Available Formats: Document which marker formats are available:
markers_reaper- REAPER DAW format- Others? (Premiere, DaVinci, etc.?)
-
Marker File Format: Describe the structure of marker files
-
Use Case: Explain how to use markers to sync video cuts with audio silence removal
Request
Priority: Please prioritize this fix as it blocks a core advertised feature.
Timeline: When can we expect a patched SDK release?
Workaround: Is there an official workaround while we wait for the fix?
Thank you for your attention to this issue. The Cleanvoice service itself works beautifully - we just need the SDK to match the API!