Željko Šević | Node.js Developer

Conversation memory for LangChain agents

June 18, 2026

This post extends the support triage agent from Building AI agents with LangChain into a multi-turn flow: turn 1 looks up the customer and invoice; turn 2 creates the ticket without the user repeating IDs. It is post #5 in the LangChain series, following the overview, loaders/chunking, RAG, and agents posts.

Prerequisites

OpenAI account
Generated API key
Enabled billing
Node.js version 26
Packages from the agents post, plus the checkpoint package:

npm i langchain @langchain/openai @langchain/core @langchain/langgraph-checkpoint zod

OPENAI_API_KEY set in the environment

Mental model

Three related concepts:

Checkpointer - short-term session memory. Saves messages and graph state after each step so the next invoke on the same thread can resume.
thread_id - conversation key passed in configurable. Same ID = same history; different ID = isolated session.
Store - long-term memory across threads (user preferences, facts learned over time). LangGraph stores are separate from checkpointers; this post focuses on checkpointers only.

Typical support flow with memory:

Turn 1 - rep asks to look up cus_1042 and inv_8891; agent calls lookup tools and summarizes findings.
Turn 2 - rep says "create the ticket we discussed"; agent recalls prior tool results and calls create_support_ticket.

MemorySaver

For demos and tests, use MemorySaver - an in-memory checkpointer that persists state for the lifetime of the process:

import { MemorySaver } from '@langchain/langgraph-checkpoint';
const checkpointer = new MemorySaver();

State is lost when the Node process exits. That is fine for local scripts; production apps need a durable backend (see below).

Attach a checkpointer to createAgent

Pass the checkpointer when creating the agent. Reuse the same triage tools and instructions from the agents post:

import { createAgent } from 'langchain';
import { MemorySaver } from '@langchain/langgraph-checkpoint';
const agent = createAgent({
  model: 'gpt-5.5',
  tools: supportTools,
  systemPrompt: TRIAGE_INSTRUCTIONS,
  checkpointer: new MemorySaver(),
});

The agent loop is unchanged - the checkpointer hooks into LangGraph beneath createAgent.

First turn - lookup

Pass a stable thread_id in the invoke config:

const threadConfig = { configurable: { thread_id: 'support-cus-1042' } };
const turn1 = await agent.invoke(
  {
    messages: [
      {
        role: 'user',
        content:
          'Look up customer cus_1042 and invoice inv_8891 for a possible duplicate charge. Summarize what you find. Do not create a ticket yet.',
      },
    ],
  },
  threadConfig,
);
console.log(turn1.messages.at(-1)?.content);

The agent calls get_customer, get_invoice, and search_knowledge_base. LangGraph saves the full message history (including tool results) to the checkpointer.

Second turn - follow-up without IDs

Send only the new user message on the same thread_id. Prior context is restored automatically:

const turn2 = await agent.invoke(
  {
    messages: [
      {
        role: 'user',
        content: 'Create the support ticket we discussed.',
      },
    ],
  },
  threadConfig,
);
console.log(turn2.messages.at(-1)?.content);

The agent should call create_support_ticket using customer and invoice details from turn 1 - the user does not repeat cus_1042 or inv_8891.

Read the final answer from result.messages as in the agents post:

const lastAi = [...turn2.messages]
  .reverse()
  .find((message) => message.type === 'ai');
console.log(lastAi?.content);

Thread isolation

Different thread_id values do not share history. Two support reps working different cases should use separate thread IDs:

await agent.invoke(
  { messages: [{ role: 'user', content: 'Look up cus_1042.' }] },
  { configurable: { thread_id: 'rep-alice-case-1' } },
);
await agent.invoke(
  { messages: [{ role: 'user', content: 'Create the ticket we discussed.' }] },
  { configurable: { thread_id: 'rep-bob-case-2' } },
);

The second invoke on rep-bob-case-2 has no knowledge of Alice's lookup - Bob's thread starts empty.

Production checkpointers

MemorySaver is process-local and not suitable for production. LangGraph supports durable checkpointers backed by Postgres, SQLite, and other stores via @langchain/langgraph-checkpoint integrations. Swap the checkpointer implementation; the thread_id API stays the same.

Pick a backend that matches your deployment: Postgres for multi-instance apps, SQLite for single-node services.

Building AI agents with LangChain

June 17, 2026

LangChain agents are built on LangGraph: the model calls tools in a loop until it returns a final answer. The high-level entry point is createAgent - pass a model, tools defined with tool(), and an optional systemPrompt.

This post builds the same support triage agent as the Vercel AI SDK agents and OpenAI Agents SDK posts so you can compare SDKs on one scenario. It follows the LangChain overview for Node.js and fits as post #4 in the LangChain series (after loaders/chunking and the RAG with pgvector pipeline).

Prerequisites

OpenAI account
Generated API key
Enabled billing
Node.js version 26
langchain, @langchain/openai, @langchain/core, and zod installed:

npm i langchain @langchain/openai @langchain/core zod

OPENAI_API_KEY set in the environment

Mental model - turns and the agent loop

A turn is one model generation. In that turn the model either:

returns final text (the run ends), or
returns tool calls (LangChain executes them and starts another turn with the results)

Typical flow for the support triage agent: user question → model calls lookup tools (get_customer, get_invoice, search_knowledge_base) → model creates a ticket or escalates → final answer.

A single turn can include multiple parallel tool calls. Set recursionLimit on invoke or stream to cap how many graph steps run (each model generation and tool batch counts toward the limit).

Defining tools

Use tool() from langchain with a Zod schema, plus name and description so the model knows when to call each tool:

import { tool } from 'langchain';
import { z } from 'zod';
const getInvoice = tool(
  async ({ invoiceId }) => {
    const invoice = invoices.find((item) => item.id === invoiceId);
    if (!invoice) {
      return { found: false, invoiceId, error: 'Invoice not found' };
    }
    return { found: true, invoice };
  },
  {
    name: 'get_invoice',
    description: 'Look up an invoice by ID, including payment IDs and status',
    schema: z.object({
      invoiceId: z.string().describe('Invoice ID, e.g. inv_8891'),
    }),
  },
);

LangChain uses schema (not Vercel's inputSchema or OpenAI Agents' parameters). The handler receives validated input as the first argument.

createAgent

Wire the model, tools, and triage instructions:

import { createAgent } from 'langchain';
const agent = createAgent({
  model: 'gpt-5.5',
  tools: [getInvoice],
  systemPrompt: `You are a billing support triage agent.
Look up records before recommending refunds or creating tickets.`,
});

model can be a provider string ('gpt-5.5', 'openai:gpt-5.5') or a chat model instance from @langchain/openai.

Invoke

Pass a messages array and read the final answer from result.messages:

const result = await agent.invoke({
  messages: [
    {
      role: 'user',
      content: 'What is the status of invoice inv_8891? Reply in one sentence.',
    },
  ],
});
const lastAi = [...result.messages]
  .reverse()
  .find((message) => message.type === 'ai');
console.log(lastAi?.content);

The last AI message is the agent's final reply after any tool calls complete.

Support triage scenario

Example prompt:

Customer cus_1042 says they were charged twice for invoice inv_8891. What should we do?

A realistic chain:

get_customer - plan tier, open ticket count
get_invoice - amount, status, payment IDs
search_knowledge_base - duplicate-charge and refund policy
create_support_ticket or escalate_to_human - write action or escalation

The demo uses in-memory fixtures (customers, invoices, knowledge-base articles) so scripts run without a database.

Multi-tool agent

import { createAgent } from 'langchain';
import {
  getCustomer,
  getInvoice,
  searchKnowledgeBase,
  createSupportTicket,
  escalateToHuman,
  TRIAGE_INSTRUCTIONS,
} from './tools/index.js';
const agent = createAgent({
  model: 'gpt-5.5',
  tools: [
    getCustomer,
    getInvoice,
    searchKnowledgeBase,
    createSupportTicket,
    escalateToHuman,
  ],
  systemPrompt: TRIAGE_INSTRUCTIONS,
});
const result = await agent.invoke({
  messages: [
    {
      role: 'user',
      content:
        'Customer cus_1042 says they were charged twice for invoice inv_8891. What should we do?',
    },
  ],
  recursionLimit: 15,
});
const answer = [...result.messages]
  .reverse()
  .find((message) => message.type === 'ai');
console.log(answer?.content);

Inspect result.messages for the full trace: human input, AI tool-call messages, tool results, and the final AI reply.

Streaming

agent.stream() yields state updates as the graph runs. Use streamMode: 'values' to receive the full message list after each step:

const stream = await agent.stream(
  {
    messages: [
      {
        role: 'user',
        content:
          'Customer cus_1042 says they were charged twice for invoice inv_8891. What should we do?',
      },
    ],
  },
  { streamMode: 'values', recursionLimit: 15 },
);
let finalMessages = [];
for await (const state of stream) {
  if (state.messages) {
    finalMessages = state.messages;
  }
}
const answer = [...finalMessages]
  .reverse()
  .find((message) => message.type === 'ai');
console.log(answer?.content);

For token-level streaming, use streamMode: 'messages' or streamEvents (see LangGraph streaming).

When to pick LangChain

	LangChain `createAgent`	Vercel AI SDK	OpenAI Agents SDK
Best for	RAG + LCEL + agents in one stack	TypeScript apps already on AI SDK	OpenAI-first agent primitives
Tool definition	`tool()` + Zod `schema`	`tool()` + `inputSchema`	`tool()` + Zod `parameters`
Run API	`agent.invoke` / `agent.stream`	`generateText` + `stopWhen`	`run()` + `maxTurns`
Handoffs / guardrails	Middleware (advanced)	Limited	Built-in
Memory	LangGraph checkpointers	Bring your own	Session helpers

Pick LangChain when document loaders, retrievers, and agents should share one ecosystem. Pick Vercel AI SDK or OpenAI Agents SDK when you want a focused agent layer without the broader LangChain surface.

Document loaders and chunking with LangChain

June 16, 2026

This post covers local file ingestion and chunking in Node.js. For LangChain basics (LCEL, packages, agents), see the LangChain overview post. For the full RAG chain with pgvector, see the RAG with pgvector post.

Prerequisites

Node.js version 26
langchain, @langchain/core, @langchain/classic, and @langchain/textsplitters installed

npm i langchain @langchain/core @langchain/classic @langchain/textsplitters

More loader types (web, cloud, audio) live in standalone integration packages - see the document loader integrations page.

The Document type

Every loader returns Document instances from @langchain/core:

pageContent - the text of the chunk or file
metadata - optional key/value pairs (source path, section, page) used for citations

import { Document } from '@langchain/core/documents';
const doc = new Document({
  pageContent: 'pgvector adds vector search to PostgreSQL.',
  metadata: { source: 'notes/pgvector.txt', section: 'basics' }
});

Load a single file

Use TextLoader for plain text or markdown files:

import { TextLoader } from '@langchain/classic/document_loaders/fs/text';
const loader = new TextLoader('./notes/pgvector.txt');
const docs = await loader.load();
console.log(docs[0].pageContent);
console.log(docs[0].metadata.source);

The loader sets metadata.source to the file path - keep it for citations in RAG answers.

Load a directory

Use DirectoryLoader when you have many files. Map extensions to loader factories:

import { DirectoryLoader } from '@langchain/classic/document_loaders/fs/directory';
import { TextLoader } from '@langchain/classic/document_loaders/fs/text';
const loader = new DirectoryLoader('./notes', {
  '.txt': (path) => new TextLoader(path),
  '.md': (path) => new TextLoader(path)
});
const docs = await loader.load();
console.log(`Loaded ${docs.length} documents`);

PDF, CSV, and JSON loaders are available via other integration packages. This post uses .txt and .md files.

Split documents

Chunking makes retrieval more precise. Instead of embedding one large file, split it into smaller overlapping parts. Pass the docs array from TextLoader or DirectoryLoader to a splitter:

Two parameters matter most:

chunkSize - target maximum size per chunk (characters or tokens, depending on splitter)
chunkOverlap - shared text between adjacent chunks so context is not lost at boundaries

Start with chunkSize: 800 and chunkOverlap: 120, then tune based on document style and answer quality.

import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 800,
  chunkOverlap: 120
});
const chunks = await splitter.splitDocuments(docs);
console.log(chunks.length);

Splitter comparison

The example above uses RecursiveCharacterTextSplitter, the default for most RAG setups. Alternatives:

Splitter	Best for
`RecursiveCharacterTextSplitter`	Default choice; tries paragraphs, then sentences, then words
`CharacterTextSplitter`	Fixed character windows when structure does not matter
`TokenTextSplitter`	When chunk limits must match model token budgets

Character-based:

import { CharacterTextSplitter } from '@langchain/textsplitters';
const splitter = new CharacterTextSplitter({
  chunkSize: 800,
  chunkOverlap: 120
});
const chunks = await splitter.splitDocuments(docs);

Token-based:

import { TokenTextSplitter } from '@langchain/textsplitters';
const splitter = new TokenTextSplitter({
  encodingName: 'cl100k_base',
  chunkSize: 200,
  chunkOverlap: 20
});
const chunks = await splitter.splitDocuments(docs);

Use token-based splitting when chunks must fit within a model's context window. Character-based recursive splitting is the usual starting point for RAG over prose.

Metadata through the pipeline

Pass metadata when creating documents manually, or rely on loader metadata - splitters preserve it on each chunk:

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 400,
  chunkOverlap: 60
});
const chunks = await splitter.createDocuments(
  ['First paragraph.\n\nSecond paragraph.'],
  [{ source: 'manual', section: 'intro' }]
);
console.log(chunks[0].metadata);

After splitDocuments(docs), each chunk keeps fields like source from the parent document. Use those fields when storing chunks in a vector database or displaying citations.

Choosing parameters

Short FAQs or API docs - smaller chunkSize (300–500) for precise retrieval
Long guides or blog posts - larger chunkSize (800–1200) to keep sections together
More overlap - helps when answers span chunk boundaries; increases storage and embedding cost
Less overlap - fewer redundant chunks; risk losing context at splits

Tune with real questions from your domain.

LangChain overview for Node.js

June 15, 2026

LangChain.js is a framework for LLM applications in TypeScript and Node.js. It standardizes how you wire prompts, models, tools, document loaders, embeddings, and retrievers into reusable pipelines and agents.

LangChain, Deep Agents, LangGraph, and LangSmith

Project	Role
LangChain	High-level APIs: LCEL chains, `createAgent`, loaders, retrievers
Deep Agents	Batteries-included agent harness: planning, subagents, filesystem, context management
LangGraph	Low-level orchestration; LangChain agents run on LangGraph under the hood
LangSmith	Tracing, debugging, and evaluation for LangChain and LangGraph apps

Use Deep Agents for complex multi-step tasks out of the box. Use LangChain's createAgent when you want a minimal harness you compose with middleware. Reach for LangGraph when you need custom stateful workflows, branching, or fine-grained control over the agent loop.

Packages

Install the core packages first (install guide):

npm i langchain @langchain/core @langchain/openai zod

Provider-specific integrations live in separate packages:

langchain - createAgent, tool, and high-level chain helpers
zod - tool input schemas when defining tools with tool()
@langchain/core - prompts, output parsers, Runnable interface, LCEL
@langchain/openai - ChatOpenAI, OpenAIEmbeddings
@langchain/textsplitters - document chunking (used in the RAG post)
Standalone integration packages for other providers and tools (see the integrations page)

For raw API access, see the Chat Completions and OpenAI Responses API posts. For provider-agnostic text and agents, see the Vercel AI SDK and OpenAI Agents SDK posts.

When to use LangChain

Tool	Best for
Raw `openai` package	Minimal calls, full control, least abstraction
Vercel AI SDK	Provider-agnostic `generateText`, streaming, embeddings, tool loops
OpenAI Agents SDK	Official agent loop, handoffs, guardrails
LangChain	Document ingestion, retrievers, LCEL chains, `createAgent`, swappable vector stores

Reach for LangChain when RAG or multi-step LLM pipelines grow beyond a few manual API calls.

Prerequisites

OpenAI account
Generated API key
Enabled billing
Node.js version 26
langchain, @langchain/core, @langchain/openai, and zod installed
OPENAI_API_KEY set in the environment

Core concepts

Document - a chunk of text with optional metadata. Loaders produce Document instances; splitters break long sources into retrieval-friendly pieces.

import { Document } from '@langchain/core/documents';
const doc = new Document({
  pageContent: 'LangChain helps compose LLM pipelines.',
  metadata: { source: 'intro' }
});

Runnable - any component with .invoke(), .stream(), or .batch(). Prompts, models, parsers, and composed chains are all Runnables.

LCEL (LangChain Expression Language) - chain Runnables with .pipe(). Data flows left to right: prompt → model → parser. The same .invoke(), .stream(), and .batch() interface applies to every Runnable in the chain.

import { ChatPromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
import { ChatOpenAI } from '@langchain/openai';
const prompt = ChatPromptTemplate.fromMessages([
  ['system', 'Answer in one sentence.'],
  ['human', '{question}']
]);
const model = new ChatOpenAI({ model: 'gpt-5.5' });
const chain = prompt.pipe(model).pipe(new StringOutputParser());
const answer = await chain.invoke({ question: 'What is LangChain?' });
console.log(answer);

Agents - LangChain's current high-level agent API is createAgent. Pass a model string or chat model, optional tools (with zod schemas), and an optional checkpointer for conversation memory (@langchain/langgraph). For tools and the support triage scenario, see the agents post.

import { createAgent } from 'langchain';
const agent = createAgent({
  model: 'gpt-5.5',
  tools: []
});
const result = await agent.invoke({
  messages: [{ role: 'user', content: 'What is LangChain?' }]
});

Structured output - return typed JSON instead of free text. In LCEL chains, call .withStructuredOutput() on a chat model with a Zod schema:

import { z } from 'zod';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { ChatOpenAI } from '@langchain/openai';
const schema = z.object({
  answer: z.string(),
  confidence: z.number(),
});
const prompt = ChatPromptTemplate.fromMessages([
  ['system', 'Answer briefly and rate your confidence from 0 to 1.'],
  ['human', '{question}'],
]);
const model = new ChatOpenAI({ model: 'gpt-5.5' }).withStructuredOutput(schema);
const result = await prompt.pipe(model).invoke({ question: 'What is LangChain?' });
console.log(result);

On agents, pass the same schema as responseFormat and read result.structuredResponse:

import { createAgent } from 'langchain';
import { z } from 'zod';
const schema = z.object({ answer: z.string(), confidence: z.number() });
const agent = createAgent({
  model: 'gpt-5.5',
  tools: [],
  responseFormat: schema,
});
const result = await agent.invoke({
  messages: [{ role: 'user', content: 'What is LangChain?' }],
});
console.log(result.structuredResponse);

What LangChain can do

Load and split documents - file and directory loaders, text splitters (see the loaders and chunking post); PDF, HTML, CSV via integration packages
Embeddings and vector stores - OpenAI embeddings with pgvector, Pinecone, Chroma, and others
Retrievers and RAG chains - fetch relevant context, then call a model (see the RAG with pgvector post)
Conversation memory - short-term memory via @langchain/langgraph checkpointers and thread_id (see the agent memory post); long-term memory via stores
Tools and agents - createAgent with tools and middleware; for production agents you may also prefer the Vercel AI SDK agents post or OpenAI Agents SDK post
Structured output - Zod schemas via .withStructuredOutput() on a chat model or responseFormat on createAgent; read parsed objects from the chain result or result.structuredResponse
Observability - trace runs with LangSmith (LANGSMITH_TRACING=true); optional LangSmith Engine monitors traces and flags issues

Streaming and batch

The same LCEL chain supports streaming and batch invocation:

for await (const chunk of await chain.stream({ question: 'What is LCEL?' })) {
  process.stdout.write(chunk);
}
const answers = await chain.batch([
  { question: 'What is a Runnable?' },
  { question: 'What is a retriever?' }
]);

RAG with OpenAI Embeddings, pgvector and LangChain

June 2, 2026

Retrieval-Augmented Generation (RAG) is a practical pattern: store knowledge as embeddings, retrieve the most relevant chunks with semantic search, then generate an answer grounded in that context.

This guide shows an end-to-end RAG flow with LangChain, OpenAI embeddings, PostgreSQL + pgvector, and an LCEL answer chain. For LangChain basics, see the LangChain overview post. For loaders and splitter choice, see the loaders and chunking post.

Prerequisites

OpenAI account
Generated API key
Enabled billing
Node.js version 26
PostgreSQL with pgvector extension enabled
npm packages: @langchain/pgvector, @langchain/openai, @langchain/core, @langchain/textsplitters, langchain, pg

npm i @langchain/pgvector @langchain/openai @langchain/core @langchain/textsplitters langchain pg

What are embeddings?

Embeddings are numeric vectors that represent the semantic meaning of text. Similar text should produce vectors that are close in vector space.

In this pipeline:

Split source documents into chunks
Embed chunks with OpenAIEmbeddings and store them in pgvector via PGVectorStore
Embed the user question at query time and retrieve nearest chunks with a LangChain retriever
Pass retrieved context into an LCEL chain that calls ChatOpenAI

Chunk documents

Chunking makes retrieval more precise. Instead of embedding one large document, split it into smaller overlapping parts. Start with chunkSize: 800 and chunkOverlap: 120, then adjust based on your document style and answer quality.

import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 800,
  chunkOverlap: 120
});
const docs = await splitter.createDocuments(
  ['RAG combines retrieval and generation. Store chunks as vectors and fetch similar chunks at query time.'],
  [{ source: 'notes.md' }]
);

Store chunks in pgvector

Use PGVectorStore from @langchain/pgvector. It creates the table if needed, embeds documents, and stores vectors with metadata.

import pg from 'pg';
import { OpenAIEmbeddings } from '@langchain/openai';
import { PGVectorStore } from '@langchain/pgvector';
const embeddings = new OpenAIEmbeddings({ model: 'text-embedding-3-small' });
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
const vectorStore = await PGVectorStore.initialize(embeddings, {
  pool,
  tableName: 'rag_documents',
  columns: {
    idColumnName: 'id',
    vectorColumnName: 'vector',
    contentColumnName: 'content',
    metadataColumnName: 'metadata'
  },
  distanceStrategy: 'cosine'
});
await vectorStore.addDocuments(docs);

Retrieve context

Turn the vector store into a retriever to fetch the top-k relevant chunks for a question:

const retriever = vectorStore.asRetriever({ k: 4 });
const chunks = await retriever.invoke('How does pgvector semantic search work?');

RAG chain with LCEL

Wire retrieval and generation with LCEL. The retriever supplies context; the model answers from that context only.

import { ChatPromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
import { RunnablePassthrough, RunnableSequence } from '@langchain/core/runnables';
import { ChatOpenAI } from '@langchain/openai';
const prompt = ChatPromptTemplate.fromMessages([
  [
    'system',
    'Answer only from the provided context. If context is insufficient, say you need more data.'
  ],
  ['human', 'Context:\n{context}\n\nQuestion: {question}']
]);
const model = new ChatOpenAI({ model: 'gpt-5.5' });
const formatDocs = (documents) =>
  documents.map((doc) => doc.pageContent).join('\n\n---\n\n');
const chain = RunnableSequence.from([
  {
    context: retriever,
    question: new RunnablePassthrough()
  },
  (input) => ({
    context: formatDocs(input.context),
    question: input.question
  }),
  prompt,
  model,
  new StringOutputParser()
]);
const answer = await chain.invoke('How does pgvector semantic search work?');
console.log(answer);