Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions examples/EVALUATION_RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Efficiency Evaluation: Random vs. FewLab Methods

This document summarizes the evaluation of efficiency gains when using FewLab's optimal sampling methods compared to random sampling.

## Executive Summary

**Key Finding**: FewLab's optimal sampling methods provide **1.14x efficiency gains** (14% variance reduction) compared to random sampling in realistic scenarios with heterogeneous count distributions.

**Budget Sensitivity**: The efficiency gains are **most pronounced at moderate budgets** (10% of items), reaching up to **9.2% improvement**, and diminish as budgets increase.

## Evaluation Setup

### Simulation Configuration

- **Units (e.g., users)**: 500-1000
- **Items (e.g., products)**: 200-300
- **Features**: 5-6 covariates
- **Simulations**: 100-150 runs per configuration
- **Key innovation**: Heterogeneous count structure with variance across items (mimicking real-world scenarios where some items are much more popular than others)

### Methods Compared

1. **Random Sampling** (baseline)
2. **Deterministic A-optimal** (`items_to_label`)
3. **Balanced Sampling** (`balanced_fixed_size`)
4. **Hybrid Core+Tail** (`core_plus_tail`)
5. **Adaptive Hybrid** (`adaptive_core_tail`)

### Metrics

- **Bias**: Mean error in coefficient estimates
- **Variance**: Variance of estimation error across simulations
- **RMSE**: Root mean squared error
- **Relative Efficiency**: Variance_random / Variance_method
- **Computational Time**: Algorithm runtime

## Results

### Main Evaluation (Fixed Budget)

**Configuration**: K=40 items from 200 (20% budget), heterogeneous counts

| Method | Variance | Rel. Efficiency | Gain | Time (ms) |
|--------|----------|-----------------|------|-----------|
| Random (baseline) | 2.5273 | 1.00x | 0% | 0.1 |
| Deterministic A-opt | 2.1928 | **1.14x** | **+13.9%** | 1.8 |
| Balanced | 2.1933 | 1.14x | +13.9% | 74.6 |
| Adaptive Hybrid | 2.1930 | 1.14x | +13.9% | 15.9 |
| Hybrid Core+Tail | 2.1927 | 1.14x | +13.9% | 28.6 |

**Interpretation**:
- All FewLab methods achieve approximately **14% variance reduction** compared to random sampling
- Deterministic A-optimal is **fastest** (1.8ms) with same efficiency
- All methods perform similarly in terms of variance reduction
- The efficiency gain is **consistent across all coefficients**

### Budget Sensitivity Analysis

**Configuration**: Varying budgets from 5% to 40% of items (K=15 to 120 from 300 items)

| Budget | % of Items | Random Var | Optimal Var | Rel. Efficiency | Gain |
|--------|-----------|------------|-------------|-----------------|------|
| 15 | 5% | 2.203 | 2.155 | 1.02x | **+1.8%** |
| 30 | 10% | 2.600 | 2.348 | 1.09x | **+9.2%** |
| 45 | 15% | 2.238 | 2.117 | 1.07x | **+6.6%** |
| 60 | 20% | 2.260 | 2.203 | 1.03x | **+2.5%** |
| 90 | 30% | 2.138 | 2.103 | 1.02x | **+1.7%** |
| 120 | 40% | 2.499 | 2.482 | 1.01x | **+0.6%** |

**Key Insights**:

1. **Peak efficiency at moderate budgets**: Maximum gains (~9%) occur at **10% budget**
2. **Diminishing returns at high budgets**: Gains drop to <1% when sampling >30% of items
3. **Non-monotonic pattern**: Efficiency gains don't follow a simple monotonic pattern
4. **Practical implication**: FewLab methods are most valuable when **resources are constrained**

## Why Heterogeneity Matters

The efficiency gains depend critically on the **heterogeneity of the count distribution**:

- **Homogeneous counts** (all items equally popular): ~0.4% gain
- **Heterogeneous counts** (realistic variance in popularity): ~14% gain

This is because optimal methods can **leverage the information structure** by:
1. Selecting items with high variance across units
2. Weighting items by their regression influence
3. Balancing statistical leverage across features

## Recommendations

### When to Use FewLab Methods

✅ **Use FewLab when**:
- Budget is **10-20% of available items**
- Items have **heterogeneous distribution** across units
- Accurate coefficient estimates are **critical**
- Moderate computational overhead is acceptable (15-75ms)

⚠️ **Random sampling is fine when**:
- Budget is **>30% of items**
- Items are relatively **homogeneous**
- Speed is paramount (<1ms requirement)
- Simplicity is valued over small efficiency gains

### Method Selection Guide

| Method | Best For | Speed | Complexity |
|--------|----------|-------|------------|
| **Deterministic A-opt** | Speed + efficiency | ★★★★★ (1.8ms) | Low |
| **Adaptive Hybrid** | Automated tuning | ★★★★☆ (16ms) | Low |
| **Hybrid Core+Tail** | Custom tail fraction | ★★★☆☆ (29ms) | Medium |
| **Balanced** | Maximum balance | ★★☆☆☆ (75ms) | Medium |

**Recommendation**: Start with **Deterministic A-optimal** for its speed and simplicity. All methods perform similarly in terms of efficiency.

## Computational Cost vs. Benefit

At typical problem sizes (500 units × 300 items, K=30):
- **Random**: 0.1ms → No gain
- **Deterministic A-opt**: 2ms → **9% efficiency gain**
- **Adaptive Hybrid**: 16ms → **9% efficiency gain**

**ROI**: Even the slowest method (Balanced at 75ms) provides excellent value:
- 75ms overhead → 9% variance reduction
- In a typical survey with 100 follow-up analyses, this saves ~9 equivalent samples

## Reproducing Results

### Run Main Evaluation
```bash
python examples/evaluate_random_vs_methods.py
```

Outputs:
- `examples/simulation_results.csv`: Detailed results
- `examples/efficiency_comparison.png`: Variance and efficiency plots
- `examples/average_metrics.png`: Average metrics bar charts

### Run Budget Sensitivity Analysis
```bash
python examples/evaluate_budget_sensitivity.py
```

Outputs:
- `examples/budget_sensitivity_results.csv`: Detailed results by budget
- `examples/budget_sensitivity.png`: Efficiency gains vs. budget plots

## Technical Details

### Estimation Procedure

For each method:
1. Sample K items using the method
2. Compute Horvitz-Thompson weights (1/π_j)
3. Estimate per-unit shares: ŷ_i = (1/T_i) Σ_j w_j a_j C_ij
4. Regress X on ŷ to estimate coefficients β̂
5. Compare β̂ to true β across simulations

### Metrics Computation

- **Bias**: E[β̂ - β]
- **Variance**: Var[β̂ - β]
- **RMSE**: √E[(β̂ - β)²]
- **Relative Efficiency**: Var_random / Var_method

## References

- FewLab documentation: https://fewlab.readthedocs.io
- Deville & Särndal (1992): Calibration estimators in survey sampling
- Fuller (2009): Sampling Statistics, Ch. 6 on optimal design

## Files

- `evaluate_random_vs_methods.py`: Main evaluation script
- `evaluate_budget_sensitivity.py`: Budget sensitivity analysis
- `EVALUATION_RESULTS.md`: This document
- `simulation_results.csv`: Detailed results
- `budget_sensitivity_results.csv`: Budget sensitivity results
- `*.png`: Visualization plots
Binary file added examples/average_metrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/budget_sensitivity.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading