Skip to content

he Cleanvoice Python SDK throws a Pydantic validation error when export_timestamps=True is set in the processing configuration #1

@chuckconway

Description

@chuckconway

Bug Report: Python SDK Validation Error with export_timestamps

Date: October 21, 2025
Reporter: Chuck Conway
SDK Version: cleanvoice-sdk 1.0.1
Python Version: 3.12.1
Platform: macOS (Darwin 25.0.0)


Summary

The Cleanvoice Python SDK throws a Pydantic validation error when export_timestamps=True is set in the processing configuration. The API successfully processes the audio and generates timestamp markers, but the SDK cannot retrieve the results due to a schema mismatch between the expected and actual API response format.

Severity: High - Feature is unusable via SDK
Impact: Users cannot access timestamp markers programmatically
Workaround: Manual download from dashboard


Bug Description

What Happens

When processing audio with export_timestamps: True:

  1. ✅ Audio uploads successfully
  2. ✅ Processing completes successfully (confirmed on server)
  3. ✅ Timestamp marker files are generated (URLs visible in error)
  4. ❌ SDK throws ValidationError when attempting to retrieve results
  5. ❌ Processed audio and markers cannot be downloaded via SDK

Expected Behavior

The SDK should successfully retrieve the processed audio and timestamp marker files when export_timestamps=True is configured.

Actual Behavior

SDK raises pydantic_core._pydantic_core.ValidationError with 8 validation errors, preventing access to the successfully processed results.


Steps to Reproduce

Minimal Reproduction

from cleanvoice import Cleanvoice

# Initialize SDK
cv = Cleanvoice({"api_key": "your-api-key"})

# Configure with export_timestamps enabled
config = {
    "long_silences": True,
    "normalize": True,
    "export_format": "mp3",
    "transcription": True,
    "export_timestamps": True,  # THIS TRIGGERS THE BUG
}

# Process audio
try:
    result = cv.process("audio.wav", config)
    # This line is never reached due to validation error
    print(result.audio.url)
except Exception as e:
    print(f"Error: {e}")
    # ValidationError is raised here

Full Test Script

See attached: test_cleanvoice_silence_removal.py (lines 281-365)


Error Details

Complete Error Message

pydantic_core._pydantic_core.ValidationError: 8 validation errors for RetrieveEditResponse
result.ProcessingProgress.done
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.total
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.state
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.phase
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.step
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.substep
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.ProcessingProgress.job_name
  Field required [type=missing, input_value={'video': False, 'filenam...096eddeea3e1a5800c5750'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
result.EditResult.timestamps_markers_urls
  Input should be a valid list [type=list_type, input_value={'markers_reaper': 'https...096eddeea3e1a5800c5750'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/list_type

Stack Trace

File ".../cleanvoice/cleanvoice.py", line 91, in process
    result = self._poll_for_completion(edit_id)
File ".../cleanvoice/cleanvoice.py", line 321, in _poll_for_completion
    response = self.api_client.retrieve_edit(edit_id)
File ".../cleanvoice/client.py", line 85, in retrieve_edit
    return RetrieveEditResponse(**response_data)
File ".../pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)

Root Cause Analysis

Schema Mismatch

The SDK's Pydantic models don't match the actual API response format.

Issue 1: timestamps_markers_urls Type Mismatch

SDK Expected Type (in models.py or similar):

class EditResult(BaseModel):
    timestamps_markers_urls: List[str]  # ❌ Wrong type

API Actual Response:

{
  "result": {
    "timestamps_markers_urls": {
      "markers_reaper": "https://r2.cloudflarestorage.com/.../markers.txt"
    }
  }
}

The API returns a dictionary mapping marker format names to URLs, not a list of URLs.

Correct Type Should Be:

timestamps_markers_urls: Optional[Dict[str, str]] = None

Where:

  • Key = marker format name (e.g., "markers_reaper", "markers_premiere")
  • Value = CDN URL to download the marker file

Issue 2: ProcessingProgress Required Fields

When processing is complete, the API returns an EditResult object, but the SDK tries to validate it against ProcessingProgress first, which has required fields that don't exist in the success response.

SDK Definition:

class ProcessingProgress(BaseModel):
    done: int  # ❌ Required, but missing in success response
    total: int  # ❌ Required
    state: str  # ❌ Required
    phase: str  # ❌ Required
    step: str  # ❌ Required
    substep: str  # ❌ Required
    job_name: str  # ❌ Required

Should Be:

class ProcessingProgress(BaseModel):
    done: Optional[int] = None  # ✅ Optional
    total: Optional[int] = None  # ✅ Optional
    state: Optional[str] = None  # ✅ Optional
    phase: Optional[str] = None  # ✅ Optional
    step: Optional[str] = None  # ✅ Optional
    substep: Optional[str] = None  # ✅ Optional
    job_name: Optional[str] = None  # ✅ Optional

Suggested Fix

1. Update EditResult Model

File: cleanvoice/models.py (or wherever models are defined)

Change:

# Before:
timestamps_markers_urls: List[str]

# After:
timestamps_markers_urls: Optional[Dict[str, str]] = None

2. Make ProcessingProgress Fields Optional

File: cleanvoice/models.py

Change:

class ProcessingProgress(BaseModel):
    done: Optional[int] = None
    total: Optional[int] = None
    state: Optional[str] = None
    phase: Optional[str] = None
    step: Optional[str] = None
    substep: Optional[str] = None
    job_name: Optional[str] = None

3. Support Union Type for Result

File: cleanvoice/models.py

from typing import Union

class RetrieveEditResponse(BaseModel):
    status: str
    result: Optional[Union[EditResult, ProcessingProgress]] = None
    error: Optional[str] = None
    
    # When status == "SUCCESS", result is EditResult
    # When status == "PROCESSING", result is ProcessingProgress

Additional Context

API Response Format (Observed)

When processing completes successfully with export_timestamps=True:

{
  "status": "SUCCESS",
  "id": "edit-uuid-here",
  "created_at": "2025-10-21T13:26:16Z",
  "updated_at": "2025-10-21T13:32:22Z",
  "result": {
    "video": false,
    "filename": "original_audio.wav",
    "download": {
      "url": "https://r2.cloudflarestorage.com/.../original_audio_clean.mp3"
    },
    "timestamps_markers_urls": {
      "markers_reaper": "https://r2.cloudflarestorage.com/.../markers.txt"
    },
    "transcript": {
      "text": "...",
      "summary": "..."
    },
    "statistics": {
      "BREATH": null,
      "DEADAIR": 5.0,
      "STUTTERING": null,
      "MOUTH_SOUND": null,
      "FILLER_SOUND": null
    }
  }
}

Marker File Format

The markers_reaper file contains cut timestamps in REAPER DAW format. This is exactly what users need to synchronize video cuts with audio silence removal.

Example Content (expected):

# Name         Start       End
REGION 1      10.5        15.3
REGION 2      20.1        25.9

Impact Assessment

User Impact

Severity: High - Core feature completely unusable via SDK

Users Affected:

  • Anyone using export_timestamps=True
  • Video editors needing cut synchronization
  • Automated pipelines requiring timestamp data

Current Workarounds:

  1. Manual download from dashboard (time-consuming, not automatable)
  2. Direct API calls bypassing SDK (no type safety, more complexity)
  3. Disable export_timestamps (loses critical feature)

Business Impact

  • Feature advertised in SDK documentation but doesn't work
  • Users cannot integrate Cleanvoice into video editing pipelines
  • Forces manual workflow, reducing automation value proposition
  • May drive users to competitors (Auphonic, Descript) with working SDKs

Testing Evidence

Test Runs Completed

We successfully processed 3 audio files with export_timestamps=True:

Run Time Duration Status Marker Files Generated
13:05:23 ~6 min ✅ Success ✅ Yes (visible in error)
13:26:16 ~6 min ✅ Success ✅ Yes (visible in error)
13:42:19 ~12 min ✅ Success ✅ Yes (visible in error)

Confirmation: All three edits show the same error pattern with markers_reaper URLs visible in the truncated error output, proving the API is working correctly and the SDK validation is the problem.

Verification

Files exist on Cleanvoice CDN:

  • Processed audio files are accessible via dashboard
  • Marker files are generated and hosted
  • Only SDK validation prevents programmatic access

Environment Details

Python: 3.12.1
SDK: cleanvoice-sdk 1.0.1
OS: macOS Darwin 25.0.0
Pydantic: 2.10.6 (via SDK dependencies)
Requests: 2.32.3

Installation Method:
pip install cleanvoice-sdk

Dependencies (from SDK)

cleanvoice-sdk==1.0.1
├── pydantic>=2.0.0
├── requests>=2.25.0
├── typing-extensions>=4.0.0
├── soundfile>=0.12.0
├── librosa>=0.10.0
├── mutagen>=1.45.0
└── av>=10.0.0

Proposed Test Case

After fixing the bug, add this test to your test suite:

def test_export_timestamps():
    """Test that export_timestamps works correctly."""
    cv = Cleanvoice({"api_key": os.environ["CLEANVOICE_API_KEY"]})
    
    config = {
        "long_silences": True,
        "export_timestamps": True,
        "transcription": True,
    }
    
    result = cv.process("test_audio.wav", config)
    
    # Should not raise ValidationError
    assert result.status == "SUCCESS"
    
    # Should have marker URLs
    assert hasattr(result.audio, 'timestamps_markers_urls')
    assert result.audio.timestamps_markers_urls is not None
    
    # Should be a dict, not a list
    assert isinstance(result.audio.timestamps_markers_urls, dict)
    
    # Should contain at least one marker format
    assert len(result.audio.timestamps_markers_urls) > 0
    
    # Should be downloadable
    for marker_name, marker_url in result.audio.timestamps_markers_urls.items():
        response = requests.get(marker_url)
        assert response.status_code == 200
        assert len(response.text) > 0

Documentation Gaps

The SDK documentation should clarify:

  1. Return Format: When export_timestamps=True, explain that timestamps_markers_urls is a dictionary mapping marker format names to URLs

  2. Available Formats: Document which marker formats are available:

    • markers_reaper - REAPER DAW format
    • Others? (Premiere, DaVinci, etc.?)
  3. Marker File Format: Describe the structure of marker files

  4. Use Case: Explain how to use markers to sync video cuts with audio silence removal


Request

Priority: Please prioritize this fix as it blocks a core advertised feature.
Timeline: When can we expect a patched SDK release?
Workaround: Is there an official workaround while we wait for the fix?



Thank you for your attention to this issue. The Cleanvoice service itself works beautifully - we just need the SDK to match the API!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions