The GBIF agent system prompt combines its instructional content with few-shot examples specific to each entrypoint. This demonstrate parameter extraction for its specific use case, ensuring the LLM understands the nuances and user intent when mapping natural language to GBIF API parameters.
We use Instructor for structured outputs that conform to response models. Each response has multiple fields:
params: The API parameters extracted from the requestunresolved_params: List of fields that need clarification to continueclarification_needed: Boolean flag indicating if clarification is requiredclarification_reason: Explanation of why clarification is neededartifact_description: Concise characterization of the retrieved data
This allows the agent to "abort" the request and seek clarification when necessary rather than forcing the model to parse the request with incomplete information.
If there is any valueError raised from the model validations; instructor library retries the request and pass the valueError with the LLM context to ensure LLM take the issue with previous response into account. (Max retries - 3)
Before seeking user clarification, the system attempts automatic parameter resolution via resolve_pending_search_parameters(). This function:
- Taxonomic Name Extraction: Uses LLM to extract taxonomic names from user requests across different ranks (kingdom, phylum, class, order, family, genus, species)
- GBIF Key Resolution: Converts extracted taxonomic names to GBIF keys using the species match API
- Field Mapping: Maps resolved keys to appropriate parameter fields (e.g., familyKey, genusKey, speciesKey)
Only parameters that cannot be automatically resolved are presented to the user for clarification.
The validations ensures parameter generation accuracy in the LLM response:
- It ensures that all parameter values appear in the original user request
- Parameter Validation Fields: Each search parameter validator schema can define set of fields that require explicit mention (keys, IDs, coordinates, etc.)
- Process:
- Validates all parameters are valid model fields
- Ensures at least one parameter has a value
- Checks that critical values (defined in
VALIDATION_FIELDS) exist in the original request - Prevents hallucinated or inferred values from passing through
- It validates facet parameters for aggregation queries. We excludes control fields (
facet,facetMincount,facetMultiselect,limit,offset) and raise valueError exception in case LLM tries to put any such field in the generated response - Provides clear error messages for invalid facet selections
- Parsing: Each entrypoint uses a
parsefunction with Instructor to extract intent and parameters from the user request into structured response models. - Validation: RequestValidationMixin and specialized validators prevent hallucination and ensure parameter accuracy.
- Parameter Resolution: If clarification is needed, the system attempts automatic resolution via:
- LLM-based taxonomic name extraction from user requests
- GBIF species match API calls to resolve names to keys
- Field mapping to appropriate parameter types (familyKey, genusKey, etc.)
- Scientific Name Resolution: If scientificName parameters are provided, they're resolved to taxonKey for better search performance.
- API Construction: The API URL is constructed using the validated and resolved parameters.
- Response Handling:
- API responses are checked for errors and status codes with detailed logging
- Results are summarized using LLM-generated response summaries
- Artifacts are created with metadata including portal URLs for downstream use
