OpenAI Assistants API: Files, Code Interpreter & Retrieval in One Agent

The OpenAI Assistants API is a fully-managed agent runtime: it persists conversation threads, handles context-window overflow, and bundles hosted tools like Code Interpreter and file search. This post builds a working data-analysis assistant that reads a CSV and answers questions about it.

5 min read

Every agent we've built so far has had one thing in common: we managed the agentic loop ourselves — appending messages, calling tools, feeding results back. The OpenAI Assistants API takes a different approach: OpenAI manages the loop, the thread history, the context window, and the built-in tools on their servers. You describe what you want; the API handles the plumbing.

This is a genuine trade-off — you gain a lot of convenience and lose some control. Understanding when that trade-off makes sense is what this post is about.

Assistants vs. Chat Completions

	Chat Completions	Assistants API
Conversation history	You manage the `messages` array	Stored in a Thread on OpenAI's servers
Context window overflow	You truncate manually	Managed automatically
Tool loop	You write it	Managed automatically
Built-in tools	None	Code Interpreter, File Search
File storage	None	OpenAI Files API
State persistence	None	Threads and Runs persist indefinitely

Use Chat Completions when you need fine-grained control or want to avoid vendor lock-in on state. Use the Assistants API when you want to ship a capable agent quickly without managing infrastructure.

Core Concepts

Assistant — an LLM configuration with instructions, a model, and enabled tools. Think of it as the agent's "persona".
Thread — a persistent conversation. You create one per user/session and it stores all messages automatically.
Message — a user or assistant turn added to a thread.
Run — the act of invoking the assistant on a thread. A run processes pending messages and streams or polls for a result.

Install and Set Up

Terminal

pip install openai
export OPENAI_API_KEY="sk-..."

Step 1: Create an Assistant

from openai import OpenAI
 
client = OpenAI()
 
assistant = client.beta.assistants.create(
    name="Data Analyst",
    instructions=(
        "You are an expert data analyst. When given a CSV file, you use the "
        "Code Interpreter tool to explore and analyse it. Always show your "
        "reasoning and include the Python code you ran."
    ),
    model="gpt-4.1",
    tools=[{"type": "code_interpreter"}],
)
 
print(f"Assistant created: {assistant.id}")
# Save this ID — you can reuse the same assistant for all users.

You only need to create the assistant once. Store assistant.id in your config and reuse it.

Step 2: Upload a File

# Create a sample CSV to analyse
import csv, io
 
csv_content = "month,revenue,users\nJan,12000,340\nFeb,15000,410\nMar,11000,290\nApr,18000,520\nMay,22000,610\n"
 
file = client.files.create(
    file=("sales.csv", io.BytesIO(csv_content.encode()), "text/csv"),
    purpose="assistants",
)
 
print(f"File uploaded: {file.id}")

Step 3: Create a Thread and Send a Message

# Create a new thread for this user session
thread = client.beta.threads.create()
 
# Add a user message that references the uploaded file
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Please analyse the attached sales data and identify the best and worst performing months.",
    attachments=[
        {
            "file_id": file.id,
            "tools": [{"type": "code_interpreter"}],
        }
    ],
)

Step 4: Run the Assistant and Poll for Results

import time
 
run = client.beta.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)
 
# Poll until the run completes
while run.status in ("queued", "in_progress"):
    time.sleep(1)
    run = client.beta.runs.retrieve(thread_id=thread.id, run_id=run.id)
 
if run.status == "completed":
    messages = client.beta.threads.messages.list(thread_id=thread.id)
    # The last assistant message contains the response
    answer = messages.data[0].content[0].text.value
    print(answer)
else:
    print(f"Run ended with status: {run.status}")

The assistant will run Python code via Code Interpreter, produce analysis, and return both the explanation and the code it used — all managed server-side.

Step 5: Continue the Conversation

Because the thread persists, follow-up questions are trivial:

client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Can you plot revenue over time and describe the trend?",
)
 
run = client.beta.runs.create(thread_id=thread.id, assistant_id=assistant.id)
# ... poll and retrieve as above

The assistant remembers the file and the prior analysis. No re-uploading, no re-sending history.

Using Streaming Instead of Polling

Polling adds latency. For a real application, use the streaming API:

with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id,
) as stream:
    for text in stream.text_deltas:
        print(text, end="", flush=True)

text_deltas yields tokens as they're generated, giving you a live typing effect with no polling loop.

Built-in Tool: File Search

Code Interpreter runs code. File Search is the other built-in tool — it chunks, embeds, and indexes uploaded documents for semantic retrieval. Enable it the same way:

assistant = client.beta.assistants.create(
    name="Knowledge Assistant",
    instructions="Answer questions using the provided documents.",
    model="gpt-4.1",
    tools=[{"type": "file_search"}],
)

Upload PDFs, docs, or text files to a Vector Store and attach it to the assistant. The model retrieves relevant chunks automatically — no ChromaDB, no embedding pipeline, no infrastructure to manage. (We'll build the DIY version of this in Day 9 so you understand what's happening under the hood.)

When to Use the Assistants API

Good fit:

Rapid prototyping of agents with file analysis or document Q&A
Multi-turn chat where you don't want to manage thread state
Code execution (Code Interpreter) without a sandboxed container

Poor fit:

You need full control over every API call (latency-sensitive streaming UIs)
You're avoiding OpenAI lock-in on state and file storage
You need custom retrieval logic beyond semantic search

What's Next

Tomorrow we build our own retrieval pipeline from scratch using ChromaDB — the DIY alternative to File Search that you can run locally and customise fully.