homeresume
 
   
🔍

AI image generation with OpenAI API

June 10, 2026

OpenAI exposes image generation through the Image API (POST /images/generations). The official openai npm package wraps it as client.images.generate. This post walks through the main request parameters and how to save generated images from Node.js.

The examples use gpt-image-2, OpenAI's latest GPT Image model. GPT Image models always return base64-encoded image data in data[].b64_json. Use output_format for the on-disk file type and put artistic direction in the prompt.

For text generation with the same package, see the Chat Completions API and Responses API posts. Image generation is also available through Responses API tools, but this post focuses on the dedicated Image API endpoint.

The running scenario: generate marketing hero images for a fictional todo app.

Prerequisites

  • OpenAI account
  • Generated API key
  • Enabled billing
  • Node.js version 26
  • openai package installed (npm i openai)

Client setup

Create a client with your API key (read from the environment in production).

import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

The same SDK can target other hosts that implement a compatible API by setting baseURL and apiKey:

const client = new OpenAI({
apiKey: process.env.LLM_API_KEY,
baseURL: 'https://your-gateway.example/v1',
});

Azure OpenAI uses AzureOpenAI instead. Confirm your provider supports the Image API and the model you pass to model.

Basic integration

Call client.images.generate with model and prompt. The examples use gpt-image-2, older snapshots include gpt-image-1.5, gpt-image-1, and gpt-image-1-mini. Pin a snapshot (for example gpt-image-2-2026-04-21) when you need stable behavior across deploys.

The prompt describes what to generate. GPT Image models accept up to about 32,000 characters. Be specific about subject, layout, colors, and style.

GPT Image models always return base64 in data[].b64_json. Decode it and write the file yourself.

import { writeFile } from 'node:fs/promises';
const prompt = `
Minimal flat illustration for a productivity app landing page.
Show a todo dashboard with a checklist, calendar widget, and soft pastel palette.
No text labels on screen elements.
`.trim();
const result = await client.images.generate({
model: 'gpt-image-2',
prompt,
});
await writeFile('hero.png', Buffer.from(result.data[0].b64_json, 'base64'));

n

Use n to generate multiple images in one request (default 1, maximum 10). Loop over result.data to save each image.

import { writeFile } from 'node:fs/promises';
const result = await client.images.generate({
model: 'gpt-image-2',
prompt:
'Minimal flat illustration of a todo app dashboard, variant layout, soft pastel colors',
n: 2,
});
for (const [index, item] of result.data.entries()) {
await writeFile(
`hero-${index}.png`,
Buffer.from(item.b64_json, 'base64'),
);
}

Size

Control dimensions with size. Common presets are 1024x1024 (square), 1536x1024 (landscape), and 1024x1536 (portrait). auto lets the model pick based on the prompt.

gpt-image-2 also accepts custom WIDTHxHEIGHT strings when width and height are multiples of 16, the aspect ratio is between 1:3 and 3:1, and total pixels stay within the documented limits.

const result = await client.images.generate({
model: 'gpt-image-2',
prompt:
'Minimal flat illustration of a todo app dashboard, portrait orientation, soft pastel colors',
size: '1024x1536',
});

Quality

Set rendering quality with quality. Use low for fast drafts and iterations, then medium or high for final assets. Default is auto.

const draft = await client.images.generate({
model: 'gpt-image-2',
prompt:
'Minimal flat illustration of a todo app dashboard, soft pastel colors',
quality: 'low',
});
const final = await client.images.generate({
model: 'gpt-image-2',
prompt:
'Minimal flat illustration of a todo app dashboard, soft pastel colors, polished details',
quality: 'high',
size: '1024x1536',
});

Output format

GPT Image models return base64 in the JSON response. Use output_format to control the encoded file type: png (default), jpeg, or webp.

import { writeFile } from 'node:fs/promises';
const result = await client.images.generate({
model: 'gpt-image-2',
prompt:
'Minimal flat illustration of a todo app dashboard, soft pastel colors',
output_format: 'jpeg',
});
await writeFile('hero.jpg', Buffer.from(result.data[0].b64_json, 'base64'));

Compression

When output_format is jpeg or webp, set output_compression from 0 to 100 to trade file size for quality. JPEG is often faster than PNG when latency matters.

const result = await client.images.generate({
model: 'gpt-image-2',
prompt:
'Minimal flat illustration of a todo app dashboard, soft pastel colors',
output_format: 'webp',
output_compression: 50,
});

Background

Use background: 'transparent' with png or webp on models that support it when you need a cutout asset. gpt-image-2 does not support transparent backgrounds; use gpt-image-1.5 or an earlier GPT Image model for that workflow, or bake the background into the prompt.

const result = await client.images.generate({
model: 'gpt-image-1.5',
prompt: 'Flat icon of a checkmark, no background, centered',
output_format: 'png',
background: 'transparent',
});

Production notes

  • Cost scales with quality and size. See OpenAI pricing before generating at scale.
  • Moderation - use moderation: 'auto' (default) or 'low' on GPT Image models when you need less restrictive filtering.
  • Errors - handle image_generation_user_error (for example moderation_blocked) by changing the prompt or inputs; do not blindly retry.
  • Latency - complex prompts can take up to about two minutes.
  • Storage - decode and persist files yourself. GPT Image responses are base64 in JSON.

Demo

Runnable scripts for each section live in the openai-image-generation-demo folder. Get access via code demos.

Building AI agents with Vercel AI SDK

June 9, 2026

The Vercel AI SDK treats agents as tool-calling loops: the model generates text or invokes tools, the SDK runs those tools, and the loop continues until the model answers or a stop condition fires.

This post builds a support triage agent that looks up customers and invoices, searches an internal knowledge base, and either opens a ticket or escalates to a human. It builds on the LLM integration with Vercel AI SDK post and focuses on multiple tools, stopWhen, and stepCountIs.

For external tools exposed over MCP instead of SDK-native tool() handlers, see the MCP server with Node.js post.

Prerequisites

  • OpenAI account
  • Generated API key
  • Enabled billing
  • Node.js version 26
  • ai, @ai-sdk/openai, and zod installed (npm i ai @ai-sdk/openai zod)
  • Client setup from the Vercel AI SDK integration post

Mental model - steps and the tool loop

A step is one model generation. In that step the model either:

  • returns text (the loop ends), or
  • returns tool calls (the SDK executes them and starts another step with the results)

Typical flow for the support triage agent: user question → model calls lookup tools (getCustomer, getInvoice, searchKnowledgeBase) → model creates a ticket or escalates → final answer. stopWhen can end the loop before or after the write tools run.

stepCountIs(5) means "stop after 5 steps" (five model generations), not five individual tool calls. A single step can include multiple parallel tool calls.

When you pass tools without stopWhen, the SDK defaults to stepCountIs(20) as a safety cap.

Support triage scenario

Example prompt:

Customer cus_1042 says they were charged twice for invoice inv_8891. What should we do?

A realistic chain:

  1. getCustomer - plan tier, open ticket count
  2. getInvoice - amount, status, payment IDs
  3. searchKnowledgeBase - duplicate-charge and refund policy
  4. createSupportTicket or escalateToHuman - write action or sentinel stop

The demo uses in-memory fixtures (customers, invoices, knowledge-base articles) so scripts run without a database.

Defining multiple tools

Register tools with tool() and Zod inputSchema. Clear description values help the model pick the right tool.

import { tool } from 'ai';
import { z } from 'zod';
const getCustomer = tool({
description: 'Look up a customer account by ID',
inputSchema: z.object({
customerId: z.string().describe('Customer ID, e.g. cus_1042'),
}),
execute: async ({ customerId }) => {
const customer = customers.find((item) => item.id === customerId);
if (!customer) {
return { found: false, customerId, error: 'Customer not found' };
}
return { found: true, customer };
},
});
const getInvoice = tool({
description: 'Look up an invoice by ID, including payment IDs and status',
inputSchema: z.object({
invoiceId: z.string().describe('Invoice ID, e.g. inv_8891'),
}),
execute: async ({ invoiceId }) => {
const invoice = invoices.find((item) => item.id === invoiceId);
if (!invoice) {
return { found: false, invoiceId, error: 'Invoice not found' };
}
return { found: true, invoice };
},
});
const searchKnowledgeBase = tool({
description: 'Search internal support articles by keyword',
inputSchema: z.object({
query: z.string().describe('Search terms, e.g. duplicate charge refund'),
}),
execute: async ({ query }) => {
// keyword match against mocked articles
return { query, articles: matches };
},
});

Add write tools for outcomes:

const createSupportTicket = tool({
description: 'Create a support ticket after gathering customer and policy context',
inputSchema: z.object({
customerId: z.string(),
subject: z.string().min(3),
priority: z.enum(['low', 'medium', 'high']),
summary: z.string().min(10),
}),
execute: async (input) => {
const ticket = createTicket(input);
return { created: true, ticket };
},
});
const escalateToHuman = tool({
description: 'Escalate when policy requires manual review',
inputSchema: z.object({
customerId: z.string(),
reason: z.string().min(10),
urgency: z.enum(['normal', 'high']),
}),
execute: async (input) => ({
escalated: true,
queue: input.urgency === 'high' ? 'billing-urgent' : 'billing-standard',
...input,
}),
});

Return structured objects from execute. The SDK serializes them as tool results for the next step. Return explicit errors (for example { found: false, error: '...' }) so the model can recover instead of throwing.

Multi-step triage with generateText

Pass all tools and a system prompt with triage rules:

import { generateText, stepCountIs } from 'ai';
const { text, steps } = await generateText({
model: openai('gpt-5.5'),
system: `You are a billing support triage agent.
- Look up customer and invoice before recommending refunds.
- Search the knowledge base for policy guidance.
- Create a ticket when you can resolve within policy.
- Call escalateToHuman when manual review is required.`,
tools: {
getCustomer,
getInvoice,
searchKnowledgeBase,
createSupportTicket,
escalateToHuman,
},
stopWhen: stepCountIs(8),
prompt:
'Customer cus_1042 says they were charged twice for invoice inv_8891. What should we do?',
});
console.log('steps:', steps.length);
console.log(text);

Use a model that supports tool calling (same requirement as web search in the Vercel AI SDK post).

stopWhen - when the loop stops

stopWhen defines stopping conditions for the tool loop. Conditions are evaluated only when the last step contains tool results.

  • A single condition stops when that condition returns true
  • An array stops when any condition returns true (OR logic)
  • Without stopWhen, the SDK applies stepCountIs(20)

The loop also ends naturally when the model returns text without further tool calls.

stepCountIs - cap the number of steps

stepCountIs(n) stops once steps.length reaches n. Use it on every production agent to prevent runaway loops and unbounded API cost.

Use caseSuggested cap
Single tool, then answer2 (tool step + text step)
Chat with occasional tool use3-5
Task agents (triage, research)8-15
Long autonomous workflows15-20 (with monitoring)

Tight vs relaxed cap on the same prompt:

import { generateText, stepCountIs } from 'ai';
// Stops after 3 steps even if the model still wants more context
const capped = await generateText({
model: openai('gpt-5.5'),
tools: supportTools,
stopWhen: stepCountIs(3),
prompt: '...',
});
// Allows a fuller investigation chain
const relaxed = await generateText({
model: openai('gpt-5.5'),
tools: supportTools,
stopWhen: stepCountIs(8),
prompt: '...',
});

Combining hasToolCall with stepCountIs

hasToolCall('toolName') stops when the model invokes a specific tool in the latest step. Pair it with stepCountIs for a hard cap plus a sentinel tool:

import { generateText, stepCountIs, hasToolCall } from 'ai';
const { text, steps } = await generateText({
model: openai('gpt-5.5'),
system: TRIAGE_INSTRUCTIONS,
tools: supportTools,
stopWhen: [stepCountIs(10), hasToolCall('escalateToHuman')],
prompt:
'Customer cus_2201 on the starter plan reports a duplicate $190 charge on invoice inv_9104.',
});

escalateToHuman works well as a sentinel: the loop stops as soon as the model decides the case needs a human, without waiting for a final text-only step.

Inspecting steps and usage

The steps array on the result contains per-step tool calls, tool results, finish reason, and usage. Use it for debugging and cost tracking:

const { text, steps, totalUsage } = await generateText({
model: openai('gpt-5.5'),
tools: supportTools,
stopWhen: stepCountIs(8),
prompt: '...',
});
for (const [index, step] of steps.entries()) {
console.log(`step ${index + 1}`);
console.log(' toolCalls:', step.toolCalls?.map((c) => c.toolName));
console.log(' usage:', step.usage);
}
console.log('totalUsage:', totalUsage);

With streamText, pass onStepFinish to log each step as it completes.

ToolLoopAgent - reusable agent definition

ToolLoopAgent wraps the same loop for reuse across scripts and API routes. It accepts the same settings as generateText (tools, stopWhen, instructions).

import { ToolLoopAgent, stepCountIs } from 'ai';
const supportTriageAgent = new ToolLoopAgent({
model: openai('gpt-5.5'),
instructions: TRIAGE_INSTRUCTIONS,
tools: supportTools,
stopWhen: stepCountIs(8),
});
const result = await supportTriageAgent.generate({
prompt:
'Customer cus_1042 says they were charged twice for invoice inv_8891. What should we do?',
onStepFinish: async ({ stepNumber, usage, toolCalls }) => {
console.log(`step ${stepNumber + 1}`, {
tokens: usage.totalTokens,
tools: toolCalls?.map((call) => call.toolName),
});
},
});
console.log(result.text);

Use .stream() for streaming. For Next.js chat UIs, see createAgentUIStreamResponse in the AI SDK agents docs.

Streaming with tools

streamText supports the same tools and stopWhen settings:

import { streamText, stepCountIs } from 'ai';
const result = streamText({
model: openai('gpt-5.5'),
system: TRIAGE_INSTRUCTIONS,
tools: supportTools,
stopWhen: stepCountIs(8),
prompt: 'Customer cus_1042 says they were charged twice for invoice inv_8891.',
onStepFinish: async ({ stepNumber, toolCalls }) => {
console.error(`step ${stepNumber + 1}:`, toolCalls?.map((c) => c.toolName));
},
});
for await (const part of result.textStream) {
process.stdout.write(part);
}

Text streams incrementally. Tool calls run between text segments as the loop progresses.

Production notes

  • Always set stopWhen - do not rely on the default stepCountIs(20) in production without monitoring
  • Cost - each step is another model call; log steps or onStepFinish usage
  • Tool errors - return structured errors from execute instead of throwing when the model should retry or escalate
  • Instructions - keep policy rules in system / instructions, not only in the user prompt
  • Same patterns elsewhere - PR review (listPRsgetCheckssubmitReview) or job-fit scoring use the same loop mechanics with different tools

Demo

Runnable scripts for each section live in the vercel-ai-sdk-agents-demo folder. Get access via code demos.

LLM integration with OpenRouter

June 8, 2026

OpenRouter is a unified API gateway to hundreds of language models from providers such as OpenAI, Anthropic, Google, and Meta. You use one API key and one billing surface, and swap models by changing a provider/model slug. OpenRouter exposes a Chat Completions-compatible HTTP API.

This post shows three Node.js integration paths: the official @openrouter/sdk, the openai package with baseURL, and the Vercel AI SDK with @openrouter/ai-sdk-provider.

For deeper patterns on each stack, see the Chat Completions API, OpenAI Responses API (OpenAI direct only), and Vercel AI SDK posts.

Prerequisites

  • OpenRouter account
  • API key
  • Credits or billing enabled as needed
  • Node.js version 26
  • Install packages for the path you use:
    • @openrouter/sdk (npm i @openrouter/sdk)
    • openai (npm i openai)
    • ai and @openrouter/ai-sdk-provider (npm i ai @openrouter/ai-sdk-provider)

Configuration

Read credentials from the environment in production.

VariablePurpose
OPENROUTER_API_KEYBearer token from OpenRouter settings
OPENROUTER_MODELDefault model slug, for example openai/gpt-5.5
OPENROUTER_SITE_URLOptional site URL sent as HTTP-Referer for rankings on openrouter.ai
OPENROUTER_SITE_TITLEOptional app name sent as X-OpenRouter-Title

Model IDs use the provider/model format, for example openai/gpt-5.5, anthropic/claude-opus-4.8, or google/gemini-3.1-flash-lite. Browse the full catalog at openrouter.ai/models.

The examples below use openai/gpt-5.5, matching the model in the other LLM posts in this series. Override it with OPENROUTER_MODEL when you want a different model.

@openrouter/sdk

OpenRouter's official TypeScript SDK is type-safe and generated from the OpenAPI spec.

Client setup

import { OpenRouter } from '@openrouter/sdk';
const client = new OpenRouter({
apiKey: process.env.OPENROUTER_API_KEY,
httpReferer: process.env.OPENROUTER_SITE_URL,
appTitle: process.env.OPENROUTER_SITE_TITLE,
});

Basic integration

const response = await client.chat.send({
chatRequest: {
model: process.env.OPENROUTER_MODEL ?? 'openai/gpt-5.5',
messages: [
{ role: 'user', content: 'Write a one-sentence bedtime story about a unicorn.' },
],
},
});
console.log(response.choices[0].message.content);

System prompt

Add a system message before the user turn to set tone, format, and role.

const response = await client.chat.send({
chatRequest: {
model: process.env.OPENROUTER_MODEL ?? 'openai/gpt-5.5',
messages: [
{ role: 'system', content: 'Reply in one short sentence. Use plain language.' },
{ role: 'user', content: 'Explain what an LLM is.' },
],
},
});
console.log(response.choices[0].message.content);

Streaming

Set stream: true and read incremental text from choices[0].delta.content.

const stream = await client.chat.send({
chatRequest: {
model: process.env.OPENROUTER_MODEL ?? 'openai/gpt-5.5',
messages: [{ role: 'user', content: 'List three colors.' }],
stream: true,
},
});
process.stdout.write('[stream] ');
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) {
process.stdout.write(delta);
}
}
process.stdout.write('\n');

Model switching

Change only the model string to route the same code to a different provider.

const models = ['openai/gpt-5.5', 'google/gemini-3.1-flash-lite'];
for (const model of models) {
const response = await client.chat.send({
chatRequest: {
model,
messages: [{ role: 'user', content: 'Reply with exactly one word: ok.' }],
},
});
console.log(model, '->', response.choices[0].message.content);
}

openai package

If you already use the OpenAI SDK, point it at OpenRouter with baseURL. The request shape matches the Chat Completions API.

Client setup

import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.OPENROUTER_API_KEY,
baseURL: 'https://openrouter.ai/api/v1',
defaultHeaders: {
'HTTP-Referer': process.env.OPENROUTER_SITE_URL,
'X-OpenRouter-Title': process.env.OPENROUTER_SITE_TITLE,
},
});

Basic integration

const completion = await client.chat.completions.create({
model: process.env.OPENROUTER_MODEL ?? 'openai/gpt-5.5',
messages: [
{ role: 'user', content: 'Write a one-sentence bedtime story about a unicorn.' },
],
});
console.log(completion.choices[0].message.content);

System prompt

const completion = await client.chat.completions.create({
model: process.env.OPENROUTER_MODEL ?? 'openai/gpt-5.5',
messages: [
{ role: 'system', content: 'Reply in one short sentence. Use plain language.' },
{ role: 'user', content: 'Explain what an LLM is.' },
],
});
console.log(completion.choices[0].message.content);

Streaming

const stream = await client.chat.completions.create({
model: process.env.OPENROUTER_MODEL ?? 'openai/gpt-5.5',
messages: [{ role: 'user', content: 'List three colors.' }],
stream: true,
});
process.stdout.write('[stream] ');
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) {
process.stdout.write(delta);
}
}
process.stdout.write('\n');

For JSON schema output, Markdown-to-HTML, and few-shot prompting, reuse the patterns from the Chat Completions post with the OpenRouter client and model slug above.

Vercel AI SDK

The @openrouter/ai-sdk-provider package exposes OpenRouter models to generateText, streamText, and related helpers from the ai package. See the OpenRouter Vercel AI SDK guide for the full integration reference.

Client setup

import { createOpenRouter } from '@openrouter/ai-sdk-provider';
const openrouter = createOpenRouter({
apiKey: process.env.OPENROUTER_API_KEY,
appUrl: process.env.OPENROUTER_SITE_URL,
appName: process.env.OPENROUTER_SITE_TITLE,
});

The returned provider is callable. Pass a model slug directly: openrouter('openai/gpt-5.5').

Basic integration

import { generateText } from 'ai';
const { text } = await generateText({
model: openrouter(process.env.OPENROUTER_MODEL ?? 'openai/gpt-5.5'),
prompt: 'Write a one-sentence bedtime story about a unicorn.',
});
console.log(text);

System prompt

const { text } = await generateText({
model: openrouter(process.env.OPENROUTER_MODEL ?? 'openai/gpt-5.5'),
system: 'Reply in one short sentence. Use plain language.',
prompt: 'Explain what an LLM is.',
});
console.log(text);

Streaming

import { streamText } from 'ai';
const result = streamText({
model: openrouter(process.env.OPENROUTER_MODEL ?? 'openai/gpt-5.5'),
prompt: 'List three colors.',
});
process.stdout.write('[stream] ');
for await (const part of result.textStream) {
process.stdout.write(part);
}
process.stdout.write('\n');

For structured output, embeddings, and web search, see the Vercel AI SDK post. Those patterns apply when you call OpenAI directly; OpenRouter coverage depends on the model and endpoint.

Demo

Runnable scripts for each integration path live in the openrouter-demo folder. Get access via code demos.

LLM integration with Vercel AI SDK

June 7, 2026

Large language models (LLMs) understand and generate text from prompts. The Vercel AI SDK is a provider-agnostic layer over LLM APIs - core functions are generateText, streamText, and embed. This post uses the OpenAI provider and mirrors the patterns from the OpenAI Responses API post.

For the lower-level openai npm package, see the Chat Completions API and Responses API posts. For multi-tool agents with stopWhen and stepCountIs, see the Building AI agents with Vercel AI SDK post.

Prerequisites

  • OpenAI account
  • Generated API key
  • Enabled billing
  • Node.js version 26
  • ai, @ai-sdk/openai, and zod installed (npm i ai @ai-sdk/openai zod)
  • For Markdown output: marked, dompurify, and jsdom (npm i marked dompurify jsdom)

Client setup

Create an OpenAI provider with your API key (read from the environment in production).

import { createOpenAI } from '@ai-sdk/openai';
const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY });

For OpenRouter, use the dedicated @openrouter/ai-sdk-provider package - see the OpenRouter integration post. The same provider can target other hosts that implement a compatible API by setting baseURL and apiKey:

const openai = createOpenAI({
apiKey: process.env.LLM_API_KEY,
baseURL: 'https://your-gateway.example/v1',
});

Many third-party gateways support Chat Completions only. The examples below use openai(model) (Responses API path). If your provider does not support it, switch to openai.chat(model) and skip the web search example.

Basic integration

Pass a string as prompt and read text from the result.

import { generateText } from 'ai';
import { createOpenAI } from '@ai-sdk/openai';
const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY });
const { text } = await generateText({
model: openai('gpt-5.5'),
prompt: 'Write a one-sentence bedtime story about a unicorn.',
});
console.log(text);

System prompt

Use the system parameter for stable behavior (tone, format, role). It takes precedence over casual wording in the user message.

const { text } = await generateText({
model: openai('gpt-5.5'),
system: 'Reply in one short sentence. Use plain language.',
prompt: 'Explain what an LLM is.',
});
console.log(text);

Few-shot prompting

Pass prior turns in a messages array with user and assistant roles, then the new user message. Keep task rules in system.

const { text } = await generateText({
model: openai('gpt-5.5'),
system:
'Classify sentiment as exactly one word: positive, negative, or neutral.',
messages: [
{ role: 'user', content: 'I love this!' },
{ role: 'assistant', content: 'positive' },
{ role: 'user', content: 'This is awful.' },
{ role: 'assistant', content: 'negative' },
{ role: 'user', content: 'It is fine I guess.' },
],
});
console.log(text);

Streaming

Use streamText and iterate over textStream for incremental text.

import { streamText } from 'ai';
const result = streamText({
model: openai('gpt-5.5'),
prompt: 'List three colors.',
});
for await (const part of result.textStream) {
process.stdout.write(part);
}

Structured output with JSON schema

Constrain the model to JSON matching your schema via Output.object() and a Zod schema. The SDK validates the result.

import { generateText, Output } from 'ai';
import { z } from 'zod';
const { output } = await generateText({
model: openai('gpt-5.5'),
prompt: 'The film Inception was directed by Christopher Nolan.',
output: Output.object({
schema: z.object({
title: z.string(),
director: z.string(),
}),
schemaName: 'movie_summary',
}),
});
console.log(output.title, output.director);

Markdown output to HTML

Ask for Markdown in system, then convert text to HTML and sanitize before rendering (for example with innerHTML in the browser or when storing HTML).

import { marked } from 'marked';
import { JSDOM } from 'jsdom';
import DOMPurify from 'dompurify';
const purify = DOMPurify(new JSDOM('').window);
const { text } = await generateText({
model: openai('gpt-5.5'),
system: 'Reply in Markdown only. Use a heading and a short bullet list.',
prompt: 'Explain what an LLM is in three bullet points.',
});
const markdown = text;
const html = marked.parse(markdown);
const safeHtml = purify.sanitize(html);

Always run DOMPurify.sanitize on model-generated HTML. The model can emit unsafe markup. Sanitization strips scripts and other dangerous content.

Web search tool

Enable the built-in web search tool when the answer should use current information from the web.

const result = await generateText({
model: openai('gpt-5.5'),
tools: { web_search: openai.tools.webSearch() },
prompt: 'What was a major tech headline this week? Cite sources briefly.',
});
console.log(result.text);

Web search adds latency and tool usage cost. Use a model that supports tools.

Embeddings

Embeddings are numeric vectors that represent the semantic meaning of text. Use them for semantic search, clustering, and RAG.

Pass a single string to embed and read the vector from embedding.

import { embed } from 'ai';
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: 'How do I connect pgvector to PostgreSQL?',
});
console.log(embedding.length);

Pass multiple strings in a values array with embedMany. Results are in the same order as the input.

import { embedMany } from 'ai';
const chunks = [
'pgvector adds vector similarity search to PostgreSQL.',
'LangChain helps split long documents into retrieval-friendly chunks.',
'RAG retrieves context first, then asks an LLM to answer.',
];
const { embeddings } = await embedMany({
model: openai.embedding('text-embedding-3-small'),
values: chunks,
});
console.log(embeddings.length); // 3

For a full RAG flow with pgvector, see the RAG with OpenAI embeddings post.

Demo

Runnable scripts for each section live in the vercel-ai-sdk-demo folder. Get access via code demos.

Building an MCP server with Node.js

June 6, 2026

The Model Context Protocol (MCP) is an open standard for connecting AI hosts (Claude, ChatGPT, Cursor, VS Code, and others) to external context and actions through a structured protocol instead of ad-hoc plugins.

The host runs an MCP client. Your application is the MCP server, exposing tools (actions the model can invoke), resources (read-only data), and optionally prompts (reusable message templates). Communication uses JSON-RPC over a transport such as stdio or HTTP. After the client connects, the server completes an initialization handshake (initializeinitialized). The host discovers capabilities via tools/list, resources/list, and prompts/list, then calls tools or reads resources at runtime.

This post shows how to build a small todo MCP server with Node.js using the official @modelcontextprotocol/sdk package and Zod schemas.

Prerequisites

  • Node.js version 26
  • npm i @modelcontextprotocol/sdk zod
  • Optional for client testing: Claude Desktop and/or ChatGPT (Connectors / Apps)

The stable v1 SDK is @modelcontextprotocol/sdk. A v2 split (@modelcontextprotocol/server, @modelcontextprotocol/client) is in pre-release - this post uses v1, which matches current production tooling.

MCP capabilities - tools, resources, prompts

CapabilityPurposeDemo example
ToolsModel-invoked actions with typed inputsadd_todo, list_todos, mark_todo_done
ResourcesRead-only context the host can fetchtodo://all JSON snapshot
Prompts (optional)Named templates with argumentssummarize-open-todos

Tools are the main integration surface - the model calls them via tools/call. Resources are fetched with resources/read and should stay read-only. Prompts return pre-built messages via prompts/get.

Tool inputs need a schema so clients know parameters. With the TypeScript SDK, pass Zod fields in inputSchema:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
const server = new McpServer({ name: 'todo-mcp-server', version: '1.0.0' });
server.registerTool(
'add_todo',
{
description: 'Add a new todo item',
inputSchema: { title: z.string().min(1) },
},
async ({ title }) => ({
content: [{ type: 'text', text: JSON.stringify({ title, done: false }) }],
})
);

Register a resource at a fixed URI:

server.registerResource(
'all-todos',
'todo://all',
{
title: 'All todos',
mimeType: 'application/json',
},
async (uri) => ({
contents: [
{
uri: uri.href,
mimeType: 'application/json',
text: JSON.stringify([{ id: 1, title: 'Learn MCP', done: false }]),
},
],
})
);

Register an optional prompt:

server.registerPrompt(
'summarize-open-todos',
{
title: 'Summarize open todos',
description: 'Ask the model to summarize open todos',
},
() => ({
messages: [
{
role: 'user',
content: {
type: 'text',
text: 'Summarize my open todos and suggest a priority order.',
},
},
],
})
);

Building the server

Use a factory so the same server definition works with multiple transports:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
const todos = [
{ id: 1, title: 'Learn MCP basics', done: false },
{ id: 2, title: 'Ship demo server', done: false },
];
let nextId = 3;
export function createMcpServer() {
const server = new McpServer({ name: 'todo-mcp-server', version: '1.0.0' });
server.registerTool(
'add_todo',
{
description: 'Add a new todo item',
inputSchema: { title: z.string().min(1) },
},
async ({ title }) => {
const todo = { id: nextId++, title, done: false };
todos.push(todo);
return { content: [{ type: 'text', text: JSON.stringify(todo, null, 2) }] };
}
);
server.registerTool('list_todos', { description: 'List all todo items' }, async () => ({
content: [{ type: 'text', text: JSON.stringify(todos, null, 2) }],
}));
server.registerTool(
'mark_todo_done',
{
description: 'Mark a todo item as done by id',
inputSchema: { id: z.number().int().positive() },
},
async ({ id }) => {
const todo = todos.find((item) => item.id === id);
if (!todo) {
return { content: [{ type: 'text', text: `Todo ${id} not found` }], isError: true };
}
todo.done = true;
return { content: [{ type: 'text', text: JSON.stringify(todo, null, 2) }] };
}
);
// register resource and prompt here (see snippets above)
return server;
}

Tool handlers return { content: [...] }. Set isError: true when a tool fails so the host can surface the error. Resource handlers return { contents: [...] }. Prompt handlers return { messages: [...] }.

The demo uses an in-memory store so you can run it without API keys or a database.

Transports - stdio vs SSE / Streamable HTTP

MCP separates protocol (JSON-RPC messages) from transport (how bytes move between client and server).

Stdio (StdioServerTransport)

The client spawns your server as a child process. JSON-RPC goes over stdin/stdout.

  • Use when: Claude Desktop, Cursor, VS Code, Claude Code, local CLI agents
  • Pros: simplest setup, no ports or firewall rules, no OAuth
  • Cons: one client per process; cloud hosts cannot spawn your local binary
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { createMcpServer } from './create-server.js';
const server = createMcpServer();
await server.connect(new StdioServerTransport());

Write logs to stderr only - stdout is the protocol channel.

Remote HTTP - SSE (legacy) vs Streamable HTTP (current)

Early MCP remote servers used HTTP + SSE: POST for client→server requests, Server-Sent Events for server→client streaming. That transport is deprecated.

New servers should use Streamable HTTP (StreamableHTTPServerTransport). It supports POST request/response, optional SSE for notifications, and session management. The v2 SDK removes server-side SSE entirely; client-side SSE remains for legacy servers.

ScenarioTransport
Claude Desktop, local devstdio
Cursor / VS Code project MCPstdio
ChatGPT Apps / ConnectorsStreamable HTTP over public HTTPS
Legacy SSE-only clientsSSE client transport still exists; prefer Streamable HTTP for new servers

Streamable HTTP entry (stateless):

import { createMcpExpressApp } from '@modelcontextprotocol/sdk/server/express.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { createMcpServer } from './create-server.js';
const app = createMcpExpressApp();
const PORT = Number(process.env.PORT) || 3000;
app.post('/mcp', async (req, res) => {
const server = createMcpServer();
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
await server.connect(transport);
await transport.handleRequest(req, res, req.body);
res.on('close', () => {
transport.close();
server.close();
});
});
app.listen(PORT, () => {
console.error(`MCP server listening on http://127.0.0.1:${PORT}/mcp`);
});

createMcpExpressApp() enables DNS rebinding protection when binding to localhost - recommended for local HTTP servers.

Deploy Streamable HTTP behind HTTPS before exposing it to cloud clients. ChatGPT Connectors also require OAuth 2.1 for production use.

Connecting MCP clients

Claude Desktop (stdio - primary demo path)

Config file on Windows: %APPDATA%\Claude\claude_desktop_config.json. Open it via Settings → Developer → Edit Config.

{
"mcpServers": {
"todo-mcp": {
"command": "node",
"args": ["C:/path/to/demos/ai/mcp-server-nodejs-demo/src/stdio.js"]
}
}
}

Restart Claude Desktop after saving. Claude shows a tool approval UI before executing write operations.

Claude Desktop (remote HTTP)

claude_desktop_config.json validates stdio servers only - do not put a bare url field there expecting it to work. For a public HTTPS MCP server, use Settings → Connectors → Add custom connector. For local HTTP during development, bridge with mcp-remote as a stdio-launched proxy.

ChatGPT (Connectors / Apps)

ChatGPT has no local MCP config file. Register servers in Settings → Apps (or Connectors) with a name, description, and MCP server URL.

Requirements:

  • Public HTTPS endpoint (Streamable HTTP)
  • OAuth 2.1 for production connectors
  • Stdio-only servers need an HTTP wrapper or tunnel before ChatGPT can reach them

ChatGPT shows confirmation modals before write/modify tool calls.

Cursor

Cursor uses .cursor/mcp.json with the same stdio shape as Claude Desktop (command + args).

What else matters

  • Security - validate tool inputs, scope tools narrowly, and never expose secrets in resources.
  • Logging - stderr only for stdio servers; avoid console.log on stdout.
  • Testing - use @modelcontextprotocol/sdk client helpers with StdioClientTransport for smoke tests.
  • Spec evolution - prefer Streamable HTTP over legacy SSE; watch the v2 SDK split when upgrading.

Demo

Runnable scripts for this post live in the mcp-server-nodejs-demo folder in the private demos repository. Get access via code demos.

RAG with OpenAI Embeddings, pgvector and LangChain

June 2, 2026

Retrieval-Augmented Generation (RAG) is a practical pattern: store knowledge as embeddings, retrieve the most relevant chunks with semantic search, then generate an answer grounded in that context.

This guide shows a simple end-to-end flow with OpenAI embeddings, PostgreSQL + pgvector, and LangChain chunking.

Prerequisites

  • OpenAI account
  • Generated API key
  • Enabled billing
  • Node.js version 26
  • PostgreSQL with pgvector extension enabled
  • npm packages: openai, langchain, pg, pgvector

What are embeddings?

Embeddings are numeric vectors that represent the semantic meaning of text. Similar text should produce vectors that are close in vector space.

In practice:

  • Convert document chunks to vectors and store them in pgvector
  • Convert a user question to a vector
  • Run a nearest-neighbor search to find the most relevant chunks

OpenAI client setup

import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Embedding one input element

Use a single string when embedding a user query.

const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: 'How do I connect pgvector to PostgreSQL?',
});
const queryEmbedding = response.data[0].embedding;
console.log(queryEmbedding.length);

Embedding multiple input elements

Use an array to embed multiple chunks in one API call.

const chunks = [
'pgvector adds vector similarity search to PostgreSQL.',
'LangChain helps split long documents into retrieval-friendly chunks.',
'RAG retrieves context first, then asks an LLM to answer.',
];
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: chunks,
});
const rows = response.data.map((item, index) => ({
text: chunks[index],
embedding: item.embedding,
}));
console.log(rows.length); // 3

Chunking documents with LangChain

Chunking makes retrieval more precise. Instead of embedding one large document, split it into smaller overlapping parts. Start with chunkSize: 800 and chunkOverlap: 120, then adjust based on your document style and answer quality.

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 800,
chunkOverlap: 120,
});
const docs = await splitter.createDocuments([
`RAG combines retrieval and generation. Store chunks as vectors and fetch similar chunks at query time.`,
]);
console.log(docs.map((doc) => doc.pageContent));

Store embeddings in pgvector

Create a table with a vector column. text-embedding-3-small outputs 1536 dimensions.

CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS rag_chunks (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding VECTOR(1536) NOT NULL,
source TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Insert chunk vectors from Node.js:

import pg from 'pg';
import pgvector from 'pgvector/pg';
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
await pgvector.registerTypes(pool);
await pool.query(
`INSERT INTO rag_chunks (content, embedding, source)
VALUES ($1, $2, $3)`,
['Chunk content', pgvector.toSql(queryEmbedding), 'notes.md']
);

Semantic search in pgvector

Embed the user question, then retrieve nearest chunks using cosine distance. Lower distance means a closer semantic match. top-k means how many nearest chunks you return (in this query, k=4 with LIMIT 4). You can also use a simple threshold (for example 0.4) to discard weak matches. As a starting point, many setups work well in the 0.35 to 0.45 range for cosine distance, then tune with real questions from your domain.

const searchResult = await pool.query(
`SELECT id, content, source, embedding <=> $1::vector AS distance
FROM rag_chunks
ORDER BY embedding <=> $1::vector
LIMIT 4`,
[pgvector.toSql(queryEmbedding)]
);
const contextChunks = searchResult.rows.map((row) => row.content);

Threshold filtering example:

const DISTANCE_THRESHOLD = 0.4;
const filteredChunks = searchResult.rows
.filter((row) => Number(row.distance) <= DISTANCE_THRESHOLD)
.map((row) => row.content);

If no chunks pass the threshold, skip answer generation and return a fallback message:

if (filteredChunks.length === 0) {
console.log('I do not have enough context to answer this.');
process.exit(0);
}

Generate answer from retrieved context

Use retrieved chunks as grounded context for the final model call.

const context = contextChunks.join('\n\n---\n\n');
const answer = await client.responses.create({
model: 'gpt-5.5',
instructions:
'Answer only from the provided context. If context is insufficient, respond with: I do not have enough context to answer this.',
input: `Context:\n${context}\n\nQuestion: How does pgvector semantic search work?`,
});
console.log(answer.output_text);

Demo

Runnable scripts for this post live in the rag-openai-embeddings-pgvector-demo folder in the private demos repository. Get access via code demos.

Integration with Hugging Face Inference API

June 1, 2026

Hugging Face hosts thousands of open models for NLP, vision, and other tasks. The Inference API (via Inference Providers) lets you call those models over HTTP. The @huggingface/inference package from huggingface.js is the Node.js client.

Prerequisites

  • Hugging Face account
  • Access token with permission to call Inference (create a token of type Read or Fine-grained with inference access)
  • Node.js version 26
  • @huggingface/inference installed (npm i @huggingface/inference)

Some models (especially image generation) are routed through Inference Providers. Enable providers and billing in account settings when a model requires it.

Client setup

Pass your token to InferenceClient. Read it from the environment in production.

import { InferenceClient } from '@huggingface/inference';
const client = new InferenceClient(process.env.HF_TOKEN);

Summarization

Shrink longer text into a short summary. Pick a model trained for summarization and respect its max input length.

const result = await client.summarization({
model: 'facebook/bart-large-cnn',
inputs: `The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was initially criticized by some artists and intellectuals, but it became a global cultural icon of France and one of the most recognizable structures in the world.`,
parameters: {
max_length: 80,
},
});
console.log(result.summary_text);

Text classification

Assign labels with confidence scores. Common use case: sentiment analysis on short text.

const labels = await client.textClassification({
model: 'distilbert-base-uncased-finetuned-sst-2-english',
inputs: 'I really enjoyed this workshop.',
});
for (const { label, score } of labels) {
console.log(label, score.toFixed(4));
}

Text to image

Generate an image from a text prompt. The client returns a Blob; write it to disk or serve it from your app.

import { writeFileSync } from 'node:fs';
const image = await client.textToImage({
provider: 'hf-inference',
model: 'black-forest-labs/FLUX.1-schnell',
inputs: 'A watercolor painting of a fox in an autumn forest',
parameters: {
num_inference_steps: 4,
},
});
const buffer = Buffer.from(await image.arrayBuffer());
writeFileSync('output.png', buffer);

With provider: 'hf-inference', use a model from the hf-inference catalog for that task (for example black-forest-labs/FLUX.1-schnell). Older Hub repos such as stabilityai/stable-diffusion-2-1 may no longer exist or lack provider mapping.

Image models are often slower and may incur provider usage charges.

Demo

Runnable scripts for each task live in the huggingface-inference-api-demo folder. Get access via code demos.

LLM integration with OpenAI Responses API

May 31, 2026

Large language models (LLMs) understand and generate text from prompts. OpenAI exposes models through the Responses API. The official openai npm package is the practical way to call it from Node.js. This post covers common patterns beyond a single prompt string.

For the Chat Completions API (messages, choices[0].message.content), see the dedicated post. For the same patterns with the Vercel AI SDK (generateText, streamText), see the dedicated post.

Prerequisites

  • OpenAI account
  • Generated API key
  • Enabled billing
  • Node.js version 26
  • openai package installed (npm i openai)
  • For Markdown output: marked, dompurify, and jsdom (npm i marked dompurify jsdom)

Client setup

Create a client with your API key (read from the environment in production).

import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

The same SDK can target other hosts that implement a compatible API by setting baseURL and apiKey:

const client = new OpenAI({
apiKey: process.env.LLM_API_KEY,
baseURL: 'https://your-gateway.example/v1',
});

Azure OpenAI uses AzureOpenAI instead. Many third-party gateways support Chat Completions only; the examples below use client.responses.*, so confirm your provider supports the Responses API (especially for tools like web search).

Basic integration

Pass a string as input and read output_text from the response.

const response = await client.responses.create({
model: 'gpt-5.5',
input: 'Write a one-sentence bedtime story about a unicorn.',
});
console.log(response.output_text);

System prompt

Use top-level instructions for stable behavior (tone, format, role). They take precedence over casual wording in the user message.

const response = await client.responses.create({
model: 'gpt-5.5',
instructions: 'Reply in one short sentence. Use plain language.',
input: 'Explain what an LLM is.',
});
console.log(response.output_text);

Few-shot prompting

Pass prior turns as an input array with user and assistant roles, then the new user message. Keep task rules in instructions.

const response = await client.responses.create({
model: 'gpt-5.5',
instructions:
'Classify sentiment as exactly one word: positive, negative, or neutral.',
input: [
{ role: 'user', content: 'I love this!' },
{ role: 'assistant', content: 'positive' },
{ role: 'user', content: 'This is awful.' },
{ role: 'assistant', content: 'negative' },
{ role: 'user', content: 'It is fine I guess.' },
],
});
console.log(response.output_text);

Streaming

Set stream: true and handle response.output_text.delta events for incremental text.

const stream = await client.responses.create({
model: 'gpt-5.5',
input: 'List three colors.',
stream: true,
});
for await (const event of stream) {
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
}
}

Structured output with JSON schema

Constrain the model to JSON matching your schema via text.format. With strict: true, the output should match the schema.

const response = await client.responses.create({
model: 'gpt-5.5',
input: 'The film Inception was directed by Christopher Nolan.',
text: {
format: {
type: 'json_schema',
name: 'movie_summary',
strict: true,
schema: {
type: 'object',
properties: {
title: { type: 'string' },
director: { type: 'string' },
},
required: ['title', 'director'],
additionalProperties: false,
},
},
},
});
const data = JSON.parse(response.output_text);
console.log(data.title, data.director);

For typed parsing with Zod, you can use client.responses.parse() instead of JSON.parse.

Markdown output to HTML

Ask for Markdown in instructions, then convert output_text to HTML and sanitize before rendering (for example with innerHTML in the browser or when storing HTML).

import { marked } from 'marked';
import { JSDOM } from 'jsdom';
import DOMPurify from 'dompurify';
const purify = DOMPurify(new JSDOM('').window);
const response = await client.responses.create({
model: 'gpt-5.5',
instructions: 'Reply in Markdown only. Use a heading and a short bullet list.',
input: 'Explain what an LLM is in three bullet points.',
});
const markdown = response.output_text;
const html = marked.parse(markdown);
const safeHtml = purify.sanitize(html);

Always run DOMPurify.sanitize on model-generated HTML. The model can emit unsafe markup; sanitization strips scripts and other dangerous content.

Web search tool

Enable the built-in web search tool when the answer should use current information from the web.

const response = await client.responses.create({
model: 'gpt-5.5',
tools: [{ type: 'web_search' }],
include: ['web_search_call.action.sources'],
input: 'What was a major tech headline this week? Cite sources briefly.',
});
console.log(response.output_text);

Web search adds latency and tool usage cost. Use a model that supports tools.

Demo

Runnable scripts for each section live in the openai-responses-api-demo folder. Get access via code demos.