-
Notifications
You must be signed in to change notification settings - Fork 32
FIX: Resolve paths with non-ASCII characters in Windows #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…characters (#370) Root cause: On Windows, paths containing non-ASCII characters (e.g., usernames like 'Thalén' with 'é') were being corrupted due to: 1. GetModuleDirectory() using ANSI APIs (char[], PathRemoveFileSpecA) 2. LoadDriverLibrary() using broken UTF-8→UTF-16 conversion via std::wstring(path.begin(), path.end()) 3. LoadDriverOrThrowException() using same broken pattern for mssql-auth.dll Fix: Use std::filesystem::path which properly handles encoding on all platforms. On Windows, fs::path::c_str() returns wchar_t* with correct UTF-16 encoding. This fix enables users with non-ASCII characters in their Windows username or installation path to use Entra ID authentication successfully.
Add comprehensive tests for the non-ASCII path encoding fix:
1. Default tests (cross-platform):
- Verify module import exercises path handling code
- Test UTF-8 string operations with international characters
- Test pathlib with non-ASCII directory names
2. Windows-specific tests:
- Verify DLL loading succeeds
- Verify libs directory structure
3. Integration tests (Windows only, ~2-4 min total):
- Create venv in paths with Swedish (Thalén), German (Müller),
Japanese (日本語), and Chinese (中文) characters
- Install mssql-python and verify import succeeds
These tests ensure the fs::path fix for LoadLibraryW works correctly
for users with non-ASCII characters in their Windows username.
Mark 4 tests as @pytest.mark.stress (skipped by default per pytest.ini): - test_aggressive_dbc_segfault_reproduction: 10 real DB connections - test_force_gc_finalization_order_issue: 10 connections + 5 GC cycles - test_rapid_connection_churn_with_shutdown: 10 connections with churn - test_active_connections_thread_safety: 200 mock connections + 10 threads These tests are resource-intensive and slow down CI. They will still run when explicitly requested with 'pytest -m stress' or 'pytest -m ""'.
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.hpp: 58.8%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.row.py: 66.2%
mssql_python.helpers.py: 67.5%
mssql_python.pybind.ddbc_bindings.cpp: 69.4%
mssql_python.pybind.ddbc_bindings.h: 71.7%
mssql_python.pybind.connection.connection.cpp: 73.6%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 83.9%🔗 Quick Links
|
…com/microsoft/mssql-python into bewithgaurav/fix-utf8-path-encoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request fixes a critical bug (Issue #370) where the mssql-python driver failed to load when installed in paths containing non-ASCII characters on Windows, such as usernames like "Thalén" or directories with accented characters. The fix refactors path handling to use C++17's std::filesystem for proper cross-platform UTF-8 path support.
Key changes:
- Replaced platform-specific path manipulation code with
std::filesystem::pathfor unified, encoding-aware path handling - Fixed UTF-8 to UTF-16 conversion on Windows by using
fs::path::c_str()instead of incorrectstd::wstring(path.begin(), path.end())conversion - Added comprehensive test suite covering UTF-8 path handling with real-world non-ASCII characters (Swedish, German, Japanese, Chinese, etc.)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.
| File | Description |
|---|---|
| mssql_python/pybind/ddbc_bindings.cpp | Refactored GetModuleDirectory(), LoadDriverLibrary(), and LoadDriverOrThrowException() to use std::filesystem::path for proper UTF-8 encoding on all platforms, with correct UTF-16 conversion on Windows |
| tests/test_015_utf8_path_handling.py | Added comprehensive test coverage including code path verification tests, non-ASCII string handling tests, Windows-specific tests, and full integration tests with virtual environments in non-ASCII paths |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Work Item / Issue Reference
Summary
This pull request refactors the way file system paths are handled in the
mssql_python/pybind/ddbc_bindings.cppfile to use C++17'sstd::filesystemfor improved cross-platform compatibility and proper handling of UTF-8 paths. The changes simplify and unify path manipulation logic, especially for dynamic library loading, and ensure correct encoding is used on all platforms.Cross-platform and encoding improvements:
GetModuleDirectory()withstd::filesystem::pathto extract the module directory in a cross-platform and UTF-8 safe manner. This removes the need for separate Windows and Unix/macOS code paths.fs::path::c_str(), which provides a correctly encodedwchar_t*forLoadLibraryW, ensuring proper handling of UTF-8 paths. This change is applied both inLoadDriverLibraryand when loadingmssql-auth.dllinLoadDriverOrThrowException(). [1] [2]