-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
monitoringMonitoring and observabilityMonitoring and observabilitypriority: highShould be fixed soonShould be fixed soonquadrant: q2Important, Not Urgent (Schedule)Important, Not Urgent (Schedule)type: featureNew feature implementationNew feature implementation
Description
Problem
We only learn about SDK issues when customers report them. We're reactive instead of proactive.
Solution
Add opt-in telemetry to detect issues before customers report them:
# oilpriceapi/telemetry.py
class Telemetry:
"""Opt-in telemetry for SDK health monitoring."""
def __init__(self, enabled=False):
self.enabled = enabled or os.getenv('OILPRICEAPI_TELEMETRY') == 'true'
def log_timeout_event(self, endpoint, timeout, duration):
"""Log timeout events."""
if not self.enabled:
return
self._send({
'event': 'timeout',
'endpoint': endpoint,
'timeout': timeout,
'duration': duration,
'sdk_version': __version__,
'timestamp': datetime.utcnow().isoformat()
})
def log_error(self, error_type, endpoint, message):
"""Log error events."""
if not self.enabled:
return
self._send({
'event': 'error',
'error_type': error_type,
'endpoint': endpoint,
'message': message,
'sdk_version': __version__
})Usage
# Enable telemetry (opt-in)
client = OilPriceAPI(api_key='...', enable_telemetry=True)
# Or via environment variable
export OILPRICEAPI_TELEMETRY=trueData Collected (Privacy-Preserving)
YES (collected):
- SDK version
- Endpoint called
- Error types
- Timeout events
- Response times (buckets)
- Python version
NO (NOT collected):
- API keys
- Request parameters
- Response data
- User identifying information
- IP addresses
Benefits
- Early detection: See issues before customers report
- Version adoption: Track which versions are in use
- Error patterns: Identify common failure modes
- Performance trends: Track response times across users
Implementation
# In client.py
class OilPriceAPI:
def __init__(self, ..., enable_telemetry=False):
self.telemetry = Telemetry(enabled=enable_telemetry)
def request(self, method, path, ...):
start = time.time()
try:
response = self._client.request(...)
duration = time.time() - start
if duration > timeout:
self.telemetry.log_timeout_event(path, timeout, duration)
return response
except Exception as e:
self.telemetry.log_error(type(e).__name__, path, str(e))
raiseAcceptance Criteria
- Telemetry module created
- Opt-in by default (explicit enable required)
- Privacy policy documented
- Data retention policy defined
- Dashboard created for telemetry data
- Documentation updated with telemetry info
Estimated Effort
Time: 6-8 hours
Success Metrics
- Detect issues within 1 hour of first occurrence
- 20%+ of users opt into telemetry
- Identify patterns before customer reports
coderabbitai
Metadata
Metadata
Assignees
Labels
monitoringMonitoring and observabilityMonitoring and observabilitypriority: highShould be fixed soonShould be fixed soonquadrant: q2Important, Not Urgent (Schedule)Important, Not Urgent (Schedule)type: featureNew feature implementationNew feature implementation