Skip to content

Conversation

@bewithgaurav
Copy link
Collaborator

@bewithgaurav bewithgaurav commented Dec 16, 2025

Work Item / Issue Reference

AB#41042

GitHub Issue: #370


Summary

This pull request refactors the way file system paths are handled in the mssql_python/pybind/ddbc_bindings.cpp file to use C++17's std::filesystem for improved cross-platform compatibility and proper handling of UTF-8 paths. The changes simplify and unify path manipulation logic, especially for dynamic library loading, and ensure correct encoding is used on all platforms.

Cross-platform and encoding improvements:

  • Replaced platform-specific logic in GetModuleDirectory() with std::filesystem::path to extract the module directory in a cross-platform and UTF-8 safe manner. This removes the need for separate Windows and Unix/macOS code paths.
  • Updated dynamic library loading on Windows to use fs::path::c_str(), which provides a correctly encoded wchar_t* for LoadLibraryW, ensuring proper handling of UTF-8 paths. This change is applied both in LoadDriverLibrary and when loading mssql-auth.dll in LoadDriverOrThrowException(). [1] [2]

…characters (#370)

Root cause: On Windows, paths containing non-ASCII characters (e.g., usernames
like 'Thalén' with 'é') were being corrupted due to:
1. GetModuleDirectory() using ANSI APIs (char[], PathRemoveFileSpecA)
2. LoadDriverLibrary() using broken UTF-8→UTF-16 conversion via
   std::wstring(path.begin(), path.end())
3. LoadDriverOrThrowException() using same broken pattern for mssql-auth.dll

Fix: Use std::filesystem::path which properly handles encoding on all platforms.
On Windows, fs::path::c_str() returns wchar_t* with correct UTF-16 encoding.

This fix enables users with non-ASCII characters in their Windows username or
installation path to use Entra ID authentication successfully.
@github-actions github-actions bot added the pr-size: small Minimal code update label Dec 16, 2025
Add comprehensive tests for the non-ASCII path encoding fix:

1. Default tests (cross-platform):
   - Verify module import exercises path handling code
   - Test UTF-8 string operations with international characters
   - Test pathlib with non-ASCII directory names

2. Windows-specific tests:
   - Verify DLL loading succeeds
   - Verify libs directory structure

3. Integration tests (Windows only, ~2-4 min total):
   - Create venv in paths with Swedish (Thalén), German (Müller),
     Japanese (日本語), and Chinese (中文) characters
   - Install mssql-python and verify import succeeds

These tests ensure the fs::path fix for LoadLibraryW works correctly
for users with non-ASCII characters in their Windows username.
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: small Minimal code update labels Dec 16, 2025
Mark 4 tests as @pytest.mark.stress (skipped by default per pytest.ini):
- test_aggressive_dbc_segfault_reproduction: 10 real DB connections
- test_force_gc_finalization_order_issue: 10 connections + 5 GC cycles
- test_rapid_connection_churn_with_shutdown: 10 connections with churn
- test_active_connections_thread_safety: 200 mock connections + 10 threads

These tests are resource-intensive and slow down CI. They will still run
when explicitly requested with 'pytest -m stress' or 'pytest -m ""'.
@github-actions
Copy link

github-actions bot commented Dec 16, 2025

📊 Code Coverage Report

🔥 Diff Coverage

100%


🎯 Overall Coverage

76%


📈 Total Lines Covered: 5358 out of 7034
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/pybind/ddbc_bindings.cpp (100%)

Summary

  • Total: 7 lines
  • Missing: 0 lines
  • Coverage: 100%

📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.logger_bridge.hpp: 58.8%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.row.py: 66.2%
mssql_python.helpers.py: 67.5%
mssql_python.pybind.ddbc_bindings.cpp: 69.4%
mssql_python.pybind.ddbc_bindings.h: 71.7%
mssql_python.pybind.connection.connection.cpp: 73.6%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 83.9%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: medium Moderate update size labels Dec 24, 2025
@github-actions github-actions bot added pr-size: medium Moderate update size and removed pr-size: large Substantial code update labels Dec 24, 2025
@bewithgaurav bewithgaurav marked this pull request as ready for review December 24, 2025 10:17
Copilot AI review requested due to automatic review settings December 24, 2025 10:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a critical bug (Issue #370) where the mssql-python driver failed to load when installed in paths containing non-ASCII characters on Windows, such as usernames like "Thalén" or directories with accented characters. The fix refactors path handling to use C++17's std::filesystem for proper cross-platform UTF-8 path support.

Key changes:

  • Replaced platform-specific path manipulation code with std::filesystem::path for unified, encoding-aware path handling
  • Fixed UTF-8 to UTF-16 conversion on Windows by using fs::path::c_str() instead of incorrect std::wstring(path.begin(), path.end()) conversion
  • Added comprehensive test suite covering UTF-8 path handling with real-world non-ASCII characters (Swedish, German, Japanese, Chinese, etc.)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.

File Description
mssql_python/pybind/ddbc_bindings.cpp Refactored GetModuleDirectory(), LoadDriverLibrary(), and LoadDriverOrThrowException() to use std::filesystem::path for proper UTF-8 encoding on all platforms, with correct UTF-16 conversion on Windows
tests/test_015_utf8_path_handling.py Added comprehensive test coverage including code path verification tests, non-ASCII string handling tests, Windows-specific tests, and full integration tests with virtual environments in non-ASCII paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: medium Moderate update size

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants