Add LangChain integration to main package with auto_instrument() support#1320
Add LangChain integration to main package with auto_instrument() support#1320
Conversation
Move LangChain wrapper from integrations/langchain-py into the main braintrust package, enabling auto-instrumentation via braintrust.auto_instrument(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The deprecation wrapper can be added after the new braintrust package is released. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| """ | ||
| span = current_span() | ||
| if span == NOOP_SPAN: | ||
| init_logger(project=project_name, api_key=api_key, project_id=project_id) |
There was a problem hiding this comment.
do we know what happens if init logger is initialized up front without a project name etc.? I have vague recollection that it could make traces show up in project log instead of in the an ongoing eval.
There was a problem hiding this comment.
if you want to add to this repo
import asyncio
from braintrust import EvalAsync, Score, init_dataset, init_logger
from braintrust_langchain import BraintrustCallbackHandler, set_global_handler
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_openai import ChatOpenAI
project_name = "test-braintrust-converted"
logger = init_logger(project=project_name)
set_global_handler(BraintrustCallbackHandler(logger=logger))
chat_model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
async def toxicity_classifier(inputs: dict) -> dict:
instructions = (
"Please review the user query below and determine if it contains any form of toxic behavior, "
"such as insults, threats, or highly negative comments. Respond with 'Toxic' if it does "
"and 'Not toxic' if it doesn't."
)
messages = [
SystemMessage(content=instructions),
HumanMessage(content=inputs["text"]),
]
result = await chat_model.ainvoke(messages)
return {"class": result.content}
examples = [
{
"input": {"text": "Shut up, idiot"},
"expected": "Toxic",
},
{
"input": {"text": "You're a wonderful person"},
"expected": "Not toxic",
},
{
"input": {"text": "This is the worst thing ever"},
"expected": "Toxic",
},
{
"input": {"text": "I had a great day today"},
"expected": "Not toxic",
},
{
"input": {"text": "Nobody likes you"},
"expected": "Toxic",
},
{
"input": {"text": "This is unacceptable. I want to speak to the manager."},
"expected": "Not toxic",
},
]
dataset = init_dataset(project=project_name, name="Toxic Queries")
if len(list(dataset.fetch())) == 0:
for example in examples:
dataset.insert(**example)
dataset.summarize()
def correct(input, output, expected):
return Score(
name="Correct",
score=1 if output["class"] == expected else 0,
)
async def run_evaluation():
await EvalAsync(
project_name,
data=dataset,
task=toxicity_classifier,
scores=[correct],
experiment_name="gpt-4o-mini, baseline",
metadata={"description": "Testing the baseline system."},
max_concurrency=4,
)
if __name__ == "__main__":
asyncio.run(run_evaluation())
py/noxfile.py
Outdated
| # langchain requires Python >= 3.10 | ||
| # Note: langchain ecosystem packages have tight version coupling, so we pin | ||
| # entire sets of compatible versions rather than testing "latest" | ||
| LANGCHAIN_VERSIONS = ("0.3.27",) |
| def test_langchain(session, version): | ||
| """Test LangChain integration.""" | ||
| # langchain requires Python >= 3.10 | ||
| if sys.version_info < (3, 10): |
There was a problem hiding this comment.
we don't support 3.9 anymore
py/noxfile.py
Outdated
| # langsmith is needed for the wrapper module but not in VENDOR_PACKAGES | ||
| session.install("langsmith") | ||
| # langchain dependencies for the langchain wrapper (pinned compatible versions) | ||
| session.install("langchain==0.3.27", "langchain-openai==0.3.35", "langchain-anthropic==0.3.22", "langgraph>=0.2.1,<0.4.0", "tenacity") |
There was a problem hiding this comment.
we should probably test 1.x stuff as well. there shouldn't be any breaking changes between 0.x and 1.x but good to have the coverage now
| from .context import clear_global_handler, set_global_handler | ||
|
|
||
| __all__ = ["BraintrustCallbackHandler", "set_global_handler"] | ||
| __all__ = ["BraintrustCallbackHandler", "set_global_handler", "clear_global_handler"] |
There was a problem hiding this comment.
not sure we needed this change. should we just kill the source in the repo? the published pypi may be enough. perhaps we can save a tag or branch if we need to provide patch fixes.
ibolmo
left a comment
There was a problem hiding this comment.
I would update and run the langchain.py golden tests https://github.com/braintrustdata/braintrust-sdk/blob/main/internal/golden/langchain.py
I also have a few (separate local repo) I'll try to add to the examples here
- Add LATEST to LANGCHAIN_VERSIONS for testing against newest releases - Remove redundant version pinning and explicit transitive deps (tenacity, pydantic) - Remove conditional skip for langgraph - it's now a required test dependency Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
integrations/langchain-pyinto the main braintrust packagebraintrust.auto_instrument()setup_langchain()for manual setup with global callback handlerTest plan
nox -s "test_langchain(0.3.27)"passes (335 tests)make fixuppassespython py/examples/langchain/auto.py🤖 Generated with Claude Code