Every agent we've built so far has had one thing in common: we managed the agentic loop ourselves — appending messages, calling tools, feeding results back. The OpenAI Assistants API takes a different approach: OpenAI manages the loop, the thread history, the context window, and the built-in tools on their servers. You describe what you want; the API handles the plumbing.
This is a genuine trade-off — you gain a lot of convenience and lose some control. Understanding when that trade-off makes sense is what this post is about.
| Chat Completions | Assistants API | |
|---|---|---|
| Conversation history | You manage the messages array | Stored in a Thread on OpenAI's servers |
| Context window overflow | You truncate manually | Managed automatically |
| Tool loop | You write it | Managed automatically |
| Built-in tools | None | Code Interpreter, File Search |
| File storage | None | OpenAI Files API |
| State persistence | None | Threads and Runs persist indefinitely |
Use Chat Completions when you need fine-grained control or want to avoid vendor lock-in on state. Use the Assistants API when you want to ship a capable agent quickly without managing infrastructure.
pip install openai
export OPENAI_API_KEY="sk-..."from openai import OpenAI
client = OpenAI()
assistant = client.beta.assistants.create(
name="Data Analyst",
instructions=(
"You are an expert data analyst. When given a CSV file, you use the "
"Code Interpreter tool to explore and analyse it. Always show your "
"reasoning and include the Python code you ran."
),
model="gpt-4.1",
tools=[{"type": "code_interpreter"}],
)
print(f"Assistant created: {assistant.id}")
# Save this ID — you can reuse the same assistant for all users.You only need to create the assistant once. Store assistant.id in your config and reuse it.
# Create a sample CSV to analyse
import csv, io
csv_content = "month,revenue,users\nJan,12000,340\nFeb,15000,410\nMar,11000,290\nApr,18000,520\nMay,22000,610\n"
file = client.files.create(
file=("sales.csv", io.BytesIO(csv_content.encode()), "text/csv"),
purpose="assistants",
)
print(f"File uploaded: {file.id}")# Create a new thread for this user session
thread = client.beta.threads.create()
# Add a user message that references the uploaded file
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Please analyse the attached sales data and identify the best and worst performing months.",
attachments=[
{
"file_id": file.id,
"tools": [{"type": "code_interpreter"}],
}
],
)import time
run = client.beta.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)
# Poll until the run completes
while run.status in ("queued", "in_progress"):
time.sleep(1)
run = client.beta.runs.retrieve(thread_id=thread.id, run_id=run.id)
if run.status == "completed":
messages = client.beta.threads.messages.list(thread_id=thread.id)
# The last assistant message contains the response
answer = messages.data[0].content[0].text.value
print(answer)
else:
print(f"Run ended with status: {run.status}")The assistant will run Python code via Code Interpreter, produce analysis, and return both the explanation and the code it used — all managed server-side.
Because the thread persists, follow-up questions are trivial:
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Can you plot revenue over time and describe the trend?",
)
run = client.beta.runs.create(thread_id=thread.id, assistant_id=assistant.id)
# ... poll and retrieve as aboveThe assistant remembers the file and the prior analysis. No re-uploading, no re-sending history.
Polling adds latency. For a real application, use the streaming API:
with client.beta.threads.runs.stream(
thread_id=thread.id,
assistant_id=assistant.id,
) as stream:
for text in stream.text_deltas:
print(text, end="", flush=True)text_deltas yields tokens as they're generated, giving you a live typing effect with no polling loop.
Code Interpreter runs code. File Search is the other built-in tool — it chunks, embeds, and indexes uploaded documents for semantic retrieval. Enable it the same way:
assistant = client.beta.assistants.create(
name="Knowledge Assistant",
instructions="Answer questions using the provided documents.",
model="gpt-4.1",
tools=[{"type": "file_search"}],
)Upload PDFs, docs, or text files to a Vector Store and attach it to the assistant. The model retrieves relevant chunks automatically — no ChromaDB, no embedding pipeline, no infrastructure to manage. (We'll build the DIY version of this in Day 9 so you understand what's happening under the hood.)
Good fit:
Poor fit:
Tomorrow we build our own retrieval pipeline from scratch using ChromaDB — the DIY alternative to File Search that you can run locally and customise fully.