Skip to content

Add LangChain integration to main package with auto_instrument() support#1320

Open
clutchski wants to merge 3 commits intomainfrom
matt/lc-auto
Open

Add LangChain integration to main package with auto_instrument() support#1320
clutchski wants to merge 3 commits intomainfrom
matt/lc-auto

Conversation

@clutchski
Copy link
Collaborator

Summary

  • Move LangChain wrapper from integrations/langchain-py into the main braintrust package
  • Enable auto-instrumentation via braintrust.auto_instrument()
  • Add setup_langchain() for manual setup with global callback handler

Test plan

  • nox -s "test_langchain(0.3.27)" passes (335 tests)
  • make fixup passes
  • Verify examples work: python py/examples/langchain/auto.py

🤖 Generated with Claude Code

clutchski and others added 2 commits January 29, 2026 17:01
Move LangChain wrapper from integrations/langchain-py into the main
braintrust package, enabling auto-instrumentation via braintrust.auto_instrument().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The deprecation wrapper can be added after the new braintrust package is released.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
"""
span = current_span()
if span == NOOP_SPAN:
init_logger(project=project_name, api_key=api_key, project_id=project_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know what happens if init logger is initialized up front without a project name etc.? I have vague recollection that it could make traces show up in project log instead of in the an ongoing eval.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to add to this repo

import asyncio

from braintrust import EvalAsync, Score, init_dataset, init_logger
from braintrust_langchain import BraintrustCallbackHandler, set_global_handler
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_openai import ChatOpenAI

project_name = "test-braintrust-converted"

logger = init_logger(project=project_name)
set_global_handler(BraintrustCallbackHandler(logger=logger))
chat_model = ChatOpenAI(model="gpt-4o-mini", temperature=0)


async def toxicity_classifier(inputs: dict) -> dict:
    instructions = (
        "Please review the user query below and determine if it contains any form of toxic behavior, "
        "such as insults, threats, or highly negative comments. Respond with 'Toxic' if it does "
        "and 'Not toxic' if it doesn't."
    )
    messages = [
        SystemMessage(content=instructions),
        HumanMessage(content=inputs["text"]),
    ]
    result = await chat_model.ainvoke(messages)
    return {"class": result.content}


examples = [
    {
        "input": {"text": "Shut up, idiot"},
        "expected": "Toxic",
    },
    {
        "input": {"text": "You're a wonderful person"},
        "expected": "Not toxic",
    },
    {
        "input": {"text": "This is the worst thing ever"},
        "expected": "Toxic",
    },
    {
        "input": {"text": "I had a great day today"},
        "expected": "Not toxic",
    },
    {
        "input": {"text": "Nobody likes you"},
        "expected": "Toxic",
    },
    {
        "input": {"text": "This is unacceptable. I want to speak to the manager."},
        "expected": "Not toxic",
    },
]

dataset = init_dataset(project=project_name, name="Toxic Queries")

if len(list(dataset.fetch())) == 0:
    for example in examples:
        dataset.insert(**example)
    dataset.summarize()


def correct(input, output, expected):
    return Score(
        name="Correct",
        score=1 if output["class"] == expected else 0,
    )

async def run_evaluation():
    await EvalAsync(
        project_name,
        data=dataset,
        task=toxicity_classifier,
        scores=[correct],
        experiment_name="gpt-4o-mini, baseline",
        metadata={"description": "Testing the baseline system."},
        max_concurrency=4,
    )


if __name__ == "__main__":
    asyncio.run(run_evaluation())

py/noxfile.py Outdated
# langchain requires Python >= 3.10
# Note: langchain ecosystem packages have tight version coupling, so we pin
# entire sets of compatible versions rather than testing "latest"
LANGCHAIN_VERSIONS = ("0.3.27",)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a 1.x now too

def test_langchain(session, version):
"""Test LangChain integration."""
# langchain requires Python >= 3.10
if sys.version_info < (3, 10):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support 3.9 anymore

py/noxfile.py Outdated
# langsmith is needed for the wrapper module but not in VENDOR_PACKAGES
session.install("langsmith")
# langchain dependencies for the langchain wrapper (pinned compatible versions)
session.install("langchain==0.3.27", "langchain-openai==0.3.35", "langchain-anthropic==0.3.22", "langgraph>=0.2.1,<0.4.0", "tenacity")
Copy link
Collaborator

@ibolmo ibolmo Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably test 1.x stuff as well. there shouldn't be any breaking changes between 0.x and 1.x but good to have the coverage now

from .context import clear_global_handler, set_global_handler

__all__ = ["BraintrustCallbackHandler", "set_global_handler"]
__all__ = ["BraintrustCallbackHandler", "set_global_handler", "clear_global_handler"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we needed this change. should we just kill the source in the repo? the published pypi may be enough. perhaps we can save a tag or branch if we need to provide patch fixes.

Copy link
Collaborator

@ibolmo ibolmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would update and run the langchain.py golden tests https://github.com/braintrustdata/braintrust-sdk/blob/main/internal/golden/langchain.py

I also have a few (separate local repo) I'll try to add to the examples here

- Add LATEST to LANGCHAIN_VERSIONS for testing against newest releases
- Remove redundant version pinning and explicit transitive deps (tenacity, pydantic)
- Remove conditional skip for langgraph - it's now a required test dependency

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants