Skip to content

Add onProgress callback to copy API for progress tracking #64

@gregpriday

Description

@gregpriday

Summary

Add an onProgress callback option to copy() and scan() APIs that normalizes pipeline events into a simple { percent, message } format, enabling easy integration with UI progress bars in applications like Canopy.

Problem Statement

CopyTree's Pipeline class emits detailed events (stage:start, stage:complete, file:batch, etc.), but the public copy() API doesn't expose a clean interface for consumers to listen to these events. Applications like Canopy (Electron) need to show progress bars, but currently have two problematic options:

  1. Manually construct Pipeline: Bypasses copy() API, loses its conveniences
  2. Poll for completion: Can't show real-time progress

Current State - No Easy Progress Tracking:

// src/api/copy.js - No progress hooks exposed
export async function copy(basePath, options = {}) {
  // Files discovered and processed internally
  for await (const file of scan(basePath, options)) {
    files.push(file);  // ❌ No way to know progress
  }
  
  const output = await format(files, { ... }); // ❌ No way to know formatting progress
  
  return { output, files, stats };
}

Desired State - Clean Progress Interface:

// Canopy usage - simple progress bar integration
const result = await copy('./large-repo', {
  onProgress: ({ percent, message }) => {
    mainWindow.webContents.send('copy-progress', { percent, message });
  }
});

Context

This is part of CopyTree's SDKification for embedded library use. While the pipeline events are powerful for debugging and advanced use cases, most consumers just want simple percentage-based progress for UI elements.

Existing Events (too detailed for most consumers):

pipeline.on('stage:start', { stage, index, input });
pipeline.on('stage:complete', { stage, duration, outputSize });
pipeline.on('file:batch', { stage, count, lastFile });
pipeline.on('stage:progress', { stage, progress, message });

Desired Simple Interface:

options.onProgress({ percent: 45, message: "Processing src/index.js" });
options.onProgress({ percent: 90, message: "Formatting output" });

Deliverables

Code Changes

Files to Modify:

  1. src/api/copy.js - Add onProgress support

    /**
     * @typedef {Object} CopyOptions
     * @property {Function} [onProgress] - Progress callback (percent, message)
     * // ... existing options
     */
    
    export async function copy(basePath, options = {}) {
      const { onProgress } = options;
      
      // Create pipeline with progress tracking
      const progressTracker = new ProgressTracker({
        totalStages: expectedStages,
        onProgress,
      });
      
      // Attach pipeline listeners
      pipeline.on('stage:complete', (data) => {
        progressTracker.stageComplete(data);
      });
      
      pipeline.on('file:batch', (data) => {
        progressTracker.fileBatch(data);
      });
      
      // ... rest of copy logic
    }
  2. src/api/scan.js - Add onProgress support

    • Similar implementation to copy.js
    • Progress based on: discovery → filtering → transformation stages
    • Emit progress as files are scanned: onProgress({ percent: 30, message: "Scanned 1000/3500 files" })
  3. src/utils/ProgressTracker.js - New progress normalization utility

    /**
     * Normalizes pipeline events into simple progress updates
     */
    export class ProgressTracker {
      constructor({ totalStages, onProgress }) {
        this.totalStages = totalStages;
        this.onProgress = onProgress || (() => {});
        this.completedStages = 0;
        this.currentStageProgress = 0;
        this.lastEmitTime = 0;
        this.throttleMs = 100; // Emit at most every 100ms
      }
      
      stageComplete(data) {
        this.completedStages++;
        this.currentStageProgress = 0;
        this._emit({
          percent: (this.completedStages / this.totalStages) * 100,
          message: `Completed ${data.stage}`,
        });
      }
      
      fileBatch(data) {
        // Estimate progress within current stage
        this.currentStageProgress = data.progress || 50;
        
        const stagePercent = this.completedStages / this.totalStages;
        const withinStagePercent = (this.currentStageProgress / 100) / this.totalStages;
        const totalPercent = (stagePercent + withinStagePercent) * 100;
        
        this._emit({
          percent: Math.min(totalPercent, 99), // Never show 100% until complete
          message: `Processing ${data.lastFile}`,
        });
      }
      
      _emit(progress) {
        // Throttle to avoid overwhelming UI
        const now = Date.now();
        if (now - this.lastEmitTime < this.throttleMs) {
          return;
        }
        
        this.lastEmitTime = now;
        this.onProgress(progress);
      }
    }
  4. src/pipeline/Pipeline.js - Add onProgress to options

    • Accept onProgress callback in constructor options
    • Store in this.options.onProgress
    • Pass to stages via context for direct emission
  5. src/pipeline/Stage.js - Add progress helper

    class Stage {
      // Existing emitProgress method (line ~100)
      emitProgress(percent, message) {
        if (this.options?.onProgress) {
          this.options.onProgress({ percent, message });
        }
        
        this.pipeline?.emit('stage:progress', {
          stage: this.name,
          progress: percent,
          message,
        });
      }
    }

Implementation Details

Progress Calculation Strategy:

// Total progress = weighted sum of stage progress
const stageWeights = {
  FileDiscoveryStage: 20,      // 20% of total
  ProfileFilterStage: 5,       // 5%
  GitFilterStage: 5,           // 5%
  TransformStage: 50,          // 50% (heavy transformers)
  OutputFormattingStage: 20,   // 20%
};

class ProgressTracker {
  calculateProgress(completedStages, currentStage, stageProgress) {
    let totalPercent = 0;
    
    // Add completed stages
    for (const stage of completedStages) {
      totalPercent += stageWeights[stage] || 10;
    }
    
    // Add current stage partial progress
    if (currentStage) {
      const stageWeight = stageWeights[currentStage] || 10;
      totalPercent += (stageProgress / 100) * stageWeight;
    }
    
    return Math.min(totalPercent, 99); // Cap at 99% until complete
  }
}

Throttling to Avoid UI Overload:

  • Emit progress at most every 100ms (configurable via progressThrottleMs)
  • Coalesce rapid file events into single update
  • Always emit 0% (start) and 100% (complete) regardless of throttle

Message Formatting:

  • Stage start: "Discovering files..."
  • File processing: "Processing src/utils/errors.js (1523/5000)"
  • Stage complete: "Completed transformation"
  • Final: "Completed processing 5000 files"

Tests

Test Coverage Required:

  1. Unit Tests (tests/unit/utils/)

    • Test ProgressTracker calculates percentages correctly
    • Test throttling prevents excessive emissions
    • Test weighted stage progress calculation
  2. Integration Tests (tests/integration/)

    • Test copy() with onProgress receives correct updates
    • Test progress goes from 0% to 100% without gaps
    • Test onProgress called with expected message formats
  3. Example Tests:

describe('onProgress callback', () => {
  it('reports progress from 0 to 100', async () => {
    const progressUpdates = [];
    
    await copy(testFixturePath, {
      onProgress: (progress) => {
        progressUpdates.push(progress);
      },
    });
    
    expect(progressUpdates.length).toBeGreaterThan(5);
    expect(progressUpdates[0].percent).toBe(0);
    expect(progressUpdates[progressUpdates.length - 1].percent).toBe(100);
    
    // Progress should be monotonically increasing
    for (let i = 1; i < progressUpdates.length; i++) {
      expect(progressUpdates[i].percent).toBeGreaterThanOrEqual(
        progressUpdates[i - 1].percent
      );
    }
  });
  
  it('throttles progress updates to avoid UI overload', async () => {
    const timestamps = [];
    
    await copy(largeRepoPath, {
      onProgress: () => {
        timestamps.push(Date.now());
      },
    });
    
    // Check time between updates
    for (let i = 1; i < timestamps.length - 1; i++) {
      const delta = timestamps[i] - timestamps[i - 1];
      expect(delta).toBeGreaterThanOrEqual(90); // ~100ms throttle
    }
  });
});

Documentation

Docs to Update:

  1. README.md - Add progress example

    ## Progress Tracking
    
    Track progress with a simple callback:
    
    \`\`\`javascript
    import { copy } from 'copytree';
    
    await copy('./large-repo', {
      onProgress: ({ percent, message }) => {
        console.log(`${percent.toFixed(0)}% - ${message}`);
      }
    });
    \`\`\`
    
    Perfect for UI progress bars in Electron or web apps.
  2. docs/usage/basic-usage.md - Document progress patterns

  3. types/index.d.ts - Add TypeScript definitions

    export interface ProgressEvent {
      /** Progress percentage (0-100) */
      percent: number;
      /** Human-readable progress message */
      message: string;
    }
    
    export interface CopyOptions extends ScanOptions, FormatOptions {
      /** Progress callback function */
      onProgress?: (progress: ProgressEvent) => void;
      /** Progress throttle interval in ms (default: 100) */
      progressThrottleMs?: number;
      // ... existing options
    }

Technical Specifications

Footprint:

  • New: src/utils/ProgressTracker.js
  • Modified: src/api/copy.js, src/api/scan.js, src/pipeline/Pipeline.js, src/pipeline/Stage.js, types/index.d.ts

Performance Considerations:

  • Progress calculation overhead: <1ms per update
  • Throttling prevents excessive callback invocations
  • Memory: Minimal (just tracking counters)

Progress Guarantees:

  • Always starts at 0%
  • Always ends at 100% on success
  • Monotonically increasing (never goes backward)
  • Updates at least every 1 second for long-running operations

Dependencies

Informational Dependencies:

Tasks

  • Create src/utils/ProgressTracker.js with progress normalization logic
  • Add onProgress option to copy.js
  • Add onProgress option to scan.js
  • Update Pipeline constructor to accept onProgress in Pipeline.js
  • Update Stage.emitProgress() to call onProgress callback in Stage.js
  • Add TypeScript definitions to types/index.d.ts
  • Add unit tests for ProgressTracker in tests/unit/utils/
  • Add integration tests for progress callbacks in tests/integration/
  • Document progress tracking in README.md
  • Document progress patterns in docs/usage/basic-usage.md

Acceptance Criteria

  • copy() accepts onProgress callback and invokes it during processing
  • Progress reported from 0% to 100% in monotonically increasing order
  • Progress messages are descriptive and include current file/stage
  • Throttling prevents >10 updates per second
  • Works with both copy() and scan() APIs
  • TypeScript definitions accurate and complete
  • All tests pass including throttling and accuracy tests

Edge Cases & Risks

Edge Cases:

  • Empty directories (should emit 0% → 100% with minimal updates)
  • Single file (should still show progress stages)
  • Very fast operations (<100ms total) - may only emit start/end
  • Errors mid-operation (should emit progress up to error point)

Risks:

  • Inaccurate progress estimates (transformers take variable time)
  • UI performance degradation if callbacks are slow
  • Progress jumping backward if stage estimates wrong

Mitigation:

  • Use empirical stage weights from benchmarks
  • Document that callbacks should be non-blocking
  • Enforce monotonic progress (never decrease percent)
  • Add configurable throttling to handle slow callbacks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions