-
Notifications
You must be signed in to change notification settings - Fork 12
feat: Otel traces setup #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
|
Claude finished @nitink23's task —— View job OpenTelemetry Implementation Review ✅Note: Writing Summary: This PR implements OpenTelemetry tracing and metrics infrastructure, despite the description claiming to only add dev dependencies. 📋 Review Issues
🔒 Security & Performance AnalysisSecurity Issues:
Performance Concerns:
Missing Error Handling:
🧪 Test Coverage GapsCritical Missing Tests:
Suggested Test Structure: describe('Telemetry', () => {
describe('initTelemetry', () => {
it('should initialize successfully with valid config')
it('should handle missing OTLP endpoint gracefully')
it('should parse resource attributes correctly')
it('should be idempotent')
})
})💡 Code Quality ImprovementsRecommended Changes:
// In libs/zephyr-agent/package.json
- "devDependencies": {
+ "dependencies": {
"@opentelemetry/api": "^1.9.0",
"@opentelemetry/auto-instrumentations-node": "^0.62.0",
"@opentelemetry/exporter-trace-otlp-http": "^0.203.0",
// ... other OTel packages
// In libs/zephyr-agent/src/index.ts
import { initTelemetry } from './lib/telemetry';
// Initialize telemetry with proper error handling
(async () => {
try {
await initTelemetry();
} catch (error) {
console.error('Telemetry initialization failed:', error);
// Continue without telemetry or fail fast based on config
}
})();
// Required environment variables:
// - OTEL_EXPORTER_OTLP_ENDPOINT: OTLP endpoint URL
// - OTEL_RESOURCE_ATTRIBUTES: service.name=zephyr-agent,deployment.environment=production
// - GRAFANA_CLOUD_OTLP_ENDPOINT: Grafana Cloud OTLP endpoint
// - GRAFANA_CLOUD_INSTANCE_ID: Grafana Cloud instance ID
// - GRAFANA_CLOUD_API_KEY: Grafana Cloud API keyOverall Quality Score: 4/10Top 3 Next Steps:
Recommendation: This PR needs significant work before merging. The telemetry implementation itself is reasonable, but the packaging, testing, and error handling need substantial improvement. |
Nsttt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things that are missing that we need to do at some point;
- Move the boilerplate creation of spans / setup of otel to an internal lib we use on Zephyr, so it's easier to maintain.
- Add a
ZEPHYR_DISABLE_TELEMETRYenv variable for people that do not want to give telemetry to us. - Check the ammount of info we currently track and decide the pieces that we want to send as traces based on events that happen on the Engine.
- Try to keep the tracing logic inside of zephyr-engine and zephyr-agent to not pollute the plugins.
| "zephyr-agent": "workspace:*" | ||
| }, | ||
| "devDependencies": { | ||
| "@types/is-ci": "catalog:typescript", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can revert this I think.
| const OTLP_ENDPOINT = | ||
| process.env['OTEL_EXPORTER_OTLP_ENDPOINT'] || 'http://localhost:4318'; | ||
| // OTEL_EXPORTER_OTLP_HEADERS: Authentication headers for the telemetry endpoint | ||
| const OTEL_EXPORTER_OTLP_HEADERS = process.env['OTEL_EXPORTER_OTLP_HEADERS']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably hardcode the ones we actually want to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah if we decide to host the docker image we would need to replace the endpoint anyways
| const resource = resourceFromAttributes({ | ||
| [ATTR_SERVICE_NAME]: 'zephyr-packages', | ||
| [ATTR_SERVICE_VERSION]: '0.0.1', | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to decide on which attributes we want to track from the environment.
What's added in this PR?
Core Telemetry Implementation
Infrastructure Implementation
Instrumented Components
Screenshots
N/A – No UI changes, only code improvements and cleanup.
What's the issues or discussion related to this PR?
This PR addresses multiple aspects of the OpenTelemetry implementation:
Core Telemetry setup
Infrastructure:
What are the steps to test this PR?
pnpm installto ensure all OTel dependencies are correctpnpm lintto verify all ESLint issues are resolvedpnpm buildto ensure all components build successfullydocker compose -f scripts/telemetry/compose.yml uphttp://localhost:4318Documentation update for this PR (if applicable)?
Yes - The telemetry infrastructure setup should be documented:
(Optional) What's left to be done for this PR?
(Optional) What's the potential risk and how to mitigate it?
Who do you wish to review this PR other than required reviewers?
@claude @Nsttt @zackarychapple
(Required) Pre-PR/Merge checklist
Note: Documentation PR should be created to cover the telemetry infrastructure setup, especially the Docker setup requirements.