-
Notifications
You must be signed in to change notification settings - Fork 3
feat(log): add -- Adds execution logging functionality to facilitate code testing by developers #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…code testing by developers. Related to jerry609#4 Signed-off-by: LIU BOYU <oor2020@163.com>
…code testing by developers.
Related to jerry609#4
Signed-off-by: LIU BOYU <oor2020@163.com>
Summary of ChangesHello @CJBshuosi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive and flexible logging system designed to enhance debugging and operational monitoring for developers. It centralizes log configuration, enables logging to distinct files based on functionality, and includes features like log rotation and trace ID propagation. This significantly improves the ability to track application flow and diagnose issues efficiently. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new centralized logging module, offering features like configuration from environment variables, YAML file-based log path definitions, log rotation, and context-aware trace IDs. However, it presents significant security concerns, including a high-severity path traversal vulnerability in how log file paths are resolved and a medium-severity resource exhaustion risk due to unbounded caching of file handlers. These must be resolved to prevent potential abuse, such as unauthorized file access/overwrites or application crashes. Beyond security, the implementation has critical thread safety issues when writing to log files, silent failures during YAML configuration loading, minor inconsistencies in log file key naming, and largely re-implements functionality available in Python's standard logging module, suggesting a refactor to leverage the built-in library for robustness and simplicity.
| handler.stream.write(formatted + "\n") | ||
| handler.stream.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing directly to handler.stream bypasses the thread-safe mechanisms (i.e., locks) provided by Python's logging.Handler. In a multi-threaded application, this can lead to race conditions and result in garbled or interleaved log messages. To ensure thread safety, you should acquire the handler's lock before writing to the stream.
handler.acquire()
try:
handler.stream.write(formatted + "\n")
handler.stream.flush()
finally:
handler.release()
| def _resolve_file_path(file: Optional[str]) -> str: | ||
| """Resolve the full file path for logging.""" | ||
| base_dir = _config.get("base_dir", DEFAULT_LOG_DIR) | ||
|
|
||
| if file is None: | ||
| return str(Path(base_dir) / DEFAULT_LOG_FILE) | ||
|
|
||
| # If file contains directory separator, use as relative path under base_dir | ||
| if "/" in file or "\\" in file: | ||
| return str(Path(base_dir) / file) | ||
|
|
||
| # Otherwise, just a filename in base_dir | ||
| return str(Path(base_dir) / file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _resolve_file_path function is vulnerable to path traversal. It joins the base_dir with the file argument without validating that the resulting path remains within the base_dir. If the file argument is an absolute path or contains traversal sequences like .., it can point to arbitrary files on the system. This could allow an attacker to overwrite sensitive files if they can influence the log file name.
| def _resolve_file_path(file: Optional[str]) -> str: | |
| """Resolve the full file path for logging.""" | |
| base_dir = _config.get("base_dir", DEFAULT_LOG_DIR) | |
| if file is None: | |
| return str(Path(base_dir) / DEFAULT_LOG_FILE) | |
| # If file contains directory separator, use as relative path under base_dir | |
| if "/" in file or "\\" in file: | |
| return str(Path(base_dir) / file) | |
| # Otherwise, just a filename in base_dir | |
| return str(Path(base_dir) / file) | |
| def _resolve_file_path(file: Optional[str]) -> str: | |
| """Resolve the full file path for logging and prevent path traversal.""" | |
| base_dir = Path(_config.get("base_dir", DEFAULT_LOG_DIR)).resolve() | |
| if file is None: | |
| return str(base_dir / DEFAULT_LOG_FILE) | |
| # Resolve the path and ensure it's under base_dir | |
| target_path = (base_dir / file).resolve() | |
| try: | |
| target_path.relative_to(base_dir) | |
| except ValueError: | |
| # Fallback or raise error if path is outside base_dir | |
| return str(base_dir / DEFAULT_LOG_FILE) | |
| return str(target_path) |
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This broad except Exception: pass will silently ignore all errors during the loading of log_config.yaml, including a missing pyyaml dependency or a malformed YAML file. This can make debugging configuration issues very difficult. It's better to at least log a warning to standard error when the configuration can't be loaded.
| except Exception: | |
| pass | |
| except Exception as e: | |
| import sys | |
| print(f"WARNING: Failed to load log config from {LOG_CONFIG_FILE}, using defaults. Error: {e}", file=sys.stderr) |
| def _get_file_handler(file_path: str) -> RotatingFileHandler: | ||
| """Get or create a file handler for the given path.""" | ||
| if file_path not in _file_handlers: | ||
| path = Path(file_path) | ||
| _ensure_dir(path.parent) | ||
|
|
||
| handler = RotatingFileHandler( | ||
| filename=str(path), | ||
| maxBytes=_config.get("max_bytes", DEFAULT_MAX_BYTES), | ||
| backupCount=_config.get("backup_count", DEFAULT_BACKUP_COUNT), | ||
| encoding="utf-8", | ||
| ) | ||
| _file_handlers[file_path] = handler | ||
|
|
||
| return _file_handlers[file_path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _file_handlers dictionary acts as an unbounded cache for file handlers. Since handlers are never removed or closed (except on global shutdown), providing dynamic file names to the logger can lead to a resource exhaustion attack (Denial of Service). An attacker could trigger the creation of thousands of file handlers, exhausting file descriptors and memory.
| files: | ||
| HARVEST: "harvest/harvest.log" | ||
| api: "api/api.log" | ||
| error: "errors/error.log" | ||
| database: "db/database.log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keys under files use mixed casing (HARVEST vs api, error, database). The corresponding Python code in logging_config.py suggests that these are meant to be accessed like constants (e.g., LogFiles.HARVEST), which by convention (PEP 8) should be uppercase. For consistency and improved readability, it's recommended to use uppercase for all keys.
files:
HARVEST: "harvest/harvest.log"
API: "api/api.log"
ERROR: "errors/error.log"
DATABASE: "db/database.log"| cls._files = { | ||
| "harvest": "harvest/harvest.log", | ||
| "api": "api/api.log", | ||
| "error": "errors/error.log", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded default values for log files are incomplete. The log_config.yaml file includes a database log, but it's missing from these defaults. If the YAML file fails to load, any attempt to use LogFiles.DATABASE will result in an AttributeError. To make the fallback behavior more robust and consistent, the default values should include all expected log files.
| cls._files = { | |
| "harvest": "harvest/harvest.log", | |
| "api": "api/api.log", | |
| "error": "errors/error.log", | |
| } | |
| cls._files = { | |
| "harvest": "harvest/harvest.log", | |
| "api": "api/api.log", | |
| "error": "errors/error.log", | |
| "database": "db/database.log", | |
| } |
| frame = inspect.currentframe() | ||
| caller_frame = frame.f_back.f_back if frame and frame.f_back else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using inspect.currentframe() to get caller information can have a significant performance impact, especially if logging is done frequently in tight loops. The standard logging module is highly optimized for this (often in C). While this approach works, it's important to be aware of the potential overhead. If performance becomes a concern, consider refactoring to leverage the standard logging module's more efficient mechanisms for capturing caller info (e.g., using %(filename)s and %(lineno)d in the format string).
此log模块用于开发人员在代码开发过程中记录执行日志,方便调试代码
Related to #4
Signed-off-by: LIU BOYU oor2020@163.com