generated from sensein/python-package-template
-
Notifications
You must be signed in to change notification settings - Fork 4
Add flexible GROBID deployment options - make Docker optional #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Copilot
wants to merge
9
commits into
main
Choose a base branch
from
copilot/move-to-grobid-python-dependency
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Created detailed GROBID setup guide with 4 deployment options - Updated README with quick start and configuration info - Enhanced error handling in PDF extraction with helpful messages - Added .env.example for easy environment configuration - Created test script for verifying GROBID connection - Updated Docker documentation to clarify GROBID is optional - Added examples README with setup instructions - Improved code documentation and error messages Co-authored-by: Sulstice <11812946+Sulstice@users.noreply.github.com>
Remove duplicate imports of GrobidArticleExtractor, pandas, requests, weaviate, and dotenv to improve code maintainability Co-authored-by: Sulstice <11812946+Sulstice@users.noreply.github.com>
- Use explicit check for None/empty in grobid_server - Replace bare except with specific Exception catch - Improve code quality and error handling Co-authored-by: Sulstice <11812946+Sulstice@users.noreply.github.com>
- Make grobid_server check more explicit for None and empty/whitespace strings - Improve JSON parsing error handling in test script - Add specific exception types for better error handling Co-authored-by: Sulstice <11812946+Sulstice@users.noreply.github.com>
- Use short-circuit evaluation to safely handle None in grobid_server check - Replace ValueError with specific JSONDecodeError for JSON parsing - Handle 500 status code separately with warning instead of treating as success - Improve error messaging for server errors Co-authored-by: Sulstice <11812946+Sulstice@users.noreply.github.com>
…rror - Use explicit None check before string operations to avoid AttributeError - Import json module and use json.JSONDecodeError instead of requests version - Ensure robust error handling for all edge cases Co-authored-by: Sulstice <11812946+Sulstice@users.noreply.github.com>
- Catch ValueError when external service returns invalid JSON - Provide helpful error message with response preview - Guide users to compatible services documentation Co-authored-by: Sulstice <11812946+Sulstice@users.noreply.github.com>
Complete documentation of the GROBID flexible setup implementation including problem analysis, solution approach, and all changes made Co-authored-by: Sulstice <11812946+Sulstice@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Migrate code to use grobid python module
Add flexible GROBID deployment options - make Docker optional
Dec 17, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
GROBID is a Java application requiring a server. Python packages (
grobidarticleextractor) are HTTP clients. This PR makes Docker optional by documenting 4 deployment alternatives and improving error handling.Changes
Documentation (7 files)
Code Improvements
src/utils/utils.py:Testing Utility
scripts/test_grobid_connection.py: Connection diagnostic with troubleshootingDeployment Options
Example
Error handling now provides guidance:
Related Issue(s)
#[issue number will be linked automatically]
Motivation and Context
Users required Docker to run GROBID locally. Code already supported external services via environment variables but lacked documentation. This PR makes alternatives visible and accessible.
How Has This Been Tested?
Screenshots (if appropriate):
N/A - CLI tool and documentation changes
Types of changes
Checklist:
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.