Skip to content

Conversation

@wangrong1069
Copy link
Contributor

No description provided.

- Use std::move() to avoid creating temporary objects.
- Use scan_directory() instead of disk_scanner::scan() to allow directory traversal and processing to proceed simultaneously, avoiding the need for large amounts of memory to temporarily store paths.
- Introduces a new `init_scan` job to handle the initial population of the index for each configured path sequentially.
- Event processing for a given path is now enabled only after its corresponding initial scan has started.

Log: Rework the initial index scanning mechanism
As title.

Log: Update version to 7.0.29
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @wangrong1069, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@github-actions
Copy link

TAG Bot

TAG: 7.0.29
EXISTED: no
DISTRIBUTION: unstable

@deepin-ci-robot
Copy link

deepin pr auto review

我对这个代码变更进行了仔细审查,以下是我的分析和改进建议:

  1. 总体评价:
    这是一个对初始索引扫描机制的重构,主要改进了文件索引的性能和可靠性。代码整体结构清晰,但仍有几处可以优化的地方。

  2. 具体改进建议:

a) 代码性能方面:

  • scan_directory 函数中,可以考虑使用并行处理来加速目录扫描,特别是对于大型文件系统。
  • file_index_manager::make_file_record 函数中的字符串移动操作(std::move)使用得当,有助于减少内存拷贝。

b) 代码逻辑方面:

  • stop_scan_directory_ 原子变量的引入是好的改进,但建议在扫描过程中添加更多的进度反馈机制。
  • default_event_handler 中,indexing_item.enable 标志的引入有助于控制扫描过程,但可以考虑添加超时机制,防止某些路径扫描时间过长。

c) 代码安全方面:

  • scan_directory 函数中的 std::filesystem::recursive_directory_iterator 使用了 skip_permission_denied 选项,这是个好的安全实践。
  • 建议在处理文件路径时增加更多的异常处理,特别是在文件系统操作中。

d) 代码质量方面:

  • base_event_handler.cpp 中,set_index_invalid_and_restart 函数被重复定义,需要删除重复的定义。
  • 建议为新增的 init_scan 功能添加更详细的日志记录,便于调试和监控。
  1. 具体建议的修改:
// 在 scan_directory 函数中添加进度反馈
bool base_event_handler::scan_directory(const std::string& dir_path, 
                                      std::function<bool(const std::string&)> handler) {
    spdlog::info("Scanning directory {}", dir_path);
    
    std::error_code ec;
    std::atomic<std::size_t> file_count{0};
    std::size_t last_reported_count = 0;
    auto report_progress = [&]() {
        if (file_count - last_reported_count > 1000) { // 每扫描1000个文件报告一次进度
            spdlog::info("Scanned {} files in directory {}", file_count, dir_path);
            last_reported_count = file_count;
        }
    };

    std::filesystem::recursive_directory_iterator dirpos{dir_path, 
        std::filesystem::directory_options::skip_permission_denied};
    for (auto it = begin(dirpos); it != end(dirpos); ++it) {
        if (stop_scan_directory_) {
            spdlog::info("Scanning interrupted");
            return true;
        }

        path = std::move(it->path().string());
        if (is_path_in_blacklist(path, config_->blacklist_paths) ||
            !std::filesystem::exists(it->path(), ec)) {
            it.disable_recursion_pending();
            continue;
        }

        if (!handler(path)) {
            return false;
        }

        file_count++;
        report_progress();
    }

    spdlog::info("Scanning directory {} completed, total files: {}", dir_path, file_count);
    return true;
}
// 在 default_event_handler 中添加超时机制
void default_event_handler::start_handle_init_scan(const std::string &path) {
    static constexpr std::chrono::seconds timeout{30}; // 30秒超时
    
    for (auto& item : indexing_items_) {
        if (item.enable)
            continue;

        std::string origin_path_without_slash = item.origin_path;
        if (origin_path_without_slash != "/")
            origin_path_without_slash.pop_back();

        if (origin_path_without_slash == path) {
            // 设置超时处理
            std::thread([this, &item]() {
                std::this_thread::sleep_for(timeout);
                if (!item.enable) {
                    spdlog::warn("Init scan timeout for path: {}", item.origin_path);
                    item.enable = true; // 强制启用,防止阻塞
                }
            }).detach();
            
            break;
        }
    }
}
  1. 其他建议:
  • 考虑添加单元测试来验证新的扫描机制的正确性和性能。
  • 可以考虑添加配置选项,允许用户自定义扫描行为,如并行度、超时时间等。
  • 建议在文档中详细说明新的扫描机制的工作原理和配置方法。

这些改进将有助于提高代码的可靠性、性能和可维护性。

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lzwind, wangrong1069

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wangrong1069
Copy link
Contributor Author

/merge

@deepin-bot deepin-bot bot merged commit 9869d5f into linuxdeepin:develop/snipe Sep 16, 2025
18 checks passed
@wangrong1069 wangrong1069 deleted the pr0916 branch September 16, 2025 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants