In Day 2 we built a tool-using agent with Anthropic's Claude SDK. In Day 3 we used the OpenAI Agents SDK. Both SDKs support function/tool calling, but their APIs differ in ways that matter when you're choosing a platform or porting code between them.
This post builds the exact same task — a weather lookup agent — with both SDKs, then compares them side by side.
The agent receives a user question like "What's the weather in Tokyo?", calls a get_weather function, and incorporates the result into its reply. Simple enough to fit on one screen; representative enough to expose real differences.
This is the biggest surface-level difference between the two SDKs.
Anthropic Claude SDK
Tools are defined as plain Python dicts (or TypedDicts) with a name, description, and input_schema (JSON Schema):
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The name of the city, e.g. 'Tokyo'",
},
},
"required": ["city"],
},
}
]OpenAI Python SDK
OpenAI uses a function wrapper with a parameters key (also JSON Schema), nested inside a type: "function" object:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The name of the city, e.g. 'Tokyo'",
},
},
"required": ["city"],
},
},
}
]Key difference: Anthropic uses a flat input_schema key; OpenAI wraps everything in { "type": "function", "function": { ... } } with a parameters key. Both accept standard JSON Schema — once you know the wrapper shape, converting between them is mechanical.
Anthropic Claude SDK
When Claude wants to call a tool it returns a response with stop_reason: "tool_use" and a content array that contains one or more tool_use blocks:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
)
# Check if Claude wants to call a tool
if response.stop_reason == "tool_use":
tool_use = next(b for b in response.content if b.type == "tool_use")
tool_name = tool_use.name # "get_weather"
tool_input = tool_use.input # {"city": "Tokyo"}
tool_use_id = tool_use.idOpenAI Python SDK
OpenAI uses finish_reason: "tool_calls" and a tool_calls array on the message object:
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4.1",
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
)
message = response.choices[0].message
# Check if the model wants to call a tool
if message.tool_calls:
tool_call = message.tool_calls[0]
tool_name = tool_call.function.name # "get_weather"
tool_input = json.loads(tool_call.function.arguments) # {"city": "Tokyo"}
tool_call_id = tool_call.idKey difference: Anthropic puts tool calls inside the content array as typed tool_use blocks; OpenAI puts them in a separate tool_calls array on the message. OpenAI also serialises arguments as a JSON string rather than a dict — don't forget json.loads().
Anthropic Claude SDK
Tool results are fed back as a new user message containing a tool_result block, referencing the original tool_use id:
def run_tool(name: str, inputs: dict) -> str:
if name == "get_weather":
return f"Sunny, 22°C in {inputs['city']}."
result = run_tool(tool_name, tool_input)
# Continue the conversation with the tool result
follow_up = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": result,
}
],
},
],
)
print(follow_up.content[0].text)OpenAI Python SDK
Tool results are fed back as a tool role message, referencing the tool_call_id:
result = run_tool(tool_name, tool_input)
follow_up = client.chat.completions.create(
model="gpt-4.1",
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"},
message, # the assistant's message containing tool_calls
{
"role": "tool",
"tool_call_id": tool_call_id,
"content": result,
},
],
)
print(follow_up.choices[0].message.content)Key difference: Anthropic uses the user role with a tool_result content block; OpenAI uses a dedicated tool role message. The id-pairing mechanism is the same conceptually — you reference the id from the request in the result.
Both SDKs let you signal tool errors back to the model, which can then decide to retry or tell the user about the problem.
Anthropic: set "is_error": true in the tool_result block:
{
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": "Error: city not found",
"is_error": True,
}OpenAI: just return an error string as the content of the tool message — there's no explicit error flag; the model infers from the content:
{"role": "tool", "tool_call_id": tool_call_id, "content": "Error: city not found"}| Anthropic Claude SDK | OpenAI SDK / Agents SDK | |
|---|---|---|
| Tool schema key | input_schema | parameters (inside function) |
| Tool call detection | stop_reason == "tool_use" | finish_reason == "tool_calls" |
| Tool call location | content[] (tool_use blocks) | message.tool_calls[] |
| Arguments format | Dict | JSON string (needs json.loads) |
| Result message role | user (with tool_result block) | tool |
| Explicit error flag | "is_error": true | None (content-based) |
| High-level SDK | @anthropic-ai/sdk / anthropic | openai-agents |
When to pick Anthropic: you want extended thinking, prefer the content-block model, or need Claude's strong instruction-following on complex multi-step tasks.
When to pick OpenAI: you're already using GPT models in production, want the high-level Agents SDK with built-in tracing and handoffs, or need Code Interpreter / file search via the Assistants API.
Tomorrow we step back from the request-response loop and tackle the memory problem: how do agents remember things across sessions, and what's the right storage tier for each kind of memory?