homeresume
 
   

RAG with OpenAI Embeddings, pgvector and LangChain

Published June 2, 2026Last updated June 2, 20264 min read

Retrieval-Augmented Generation (RAG) is a practical pattern: store knowledge as embeddings, retrieve the most relevant chunks with semantic search, then generate an answer grounded in that context.

This guide shows a simple end-to-end flow with OpenAI embeddings, PostgreSQL + pgvector, and LangChain chunking.

Architecture at a glance

Prerequisites

  • OpenAI account
  • Generated API key
  • Enabled billing
  • Node.js version 26
  • PostgreSQL with pgvector extension enabled
  • npm packages: openai, langchain, pg, pgvector

What are embeddings?

Embeddings are numeric vectors that represent the semantic meaning of text. Similar text should produce vectors that are close in vector space.

In practice:

  • Convert document chunks to vectors and store them in pgvector
  • Convert a user question to a vector
  • Run a nearest-neighbor search to find the most relevant chunks

OpenAI client setup

import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Embedding one input element

Use a single string when embedding a user query.

const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: 'How do I connect pgvector to PostgreSQL?',
});
const queryEmbedding = response.data[0].embedding;
console.log(queryEmbedding.length);

Embedding multiple input elements

Use an array to embed multiple chunks in one API call.

const chunks = [
'pgvector adds vector similarity search to PostgreSQL.',
'LangChain helps split long documents into retrieval-friendly chunks.',
'RAG retrieves context first, then asks an LLM to answer.',
];
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: chunks,
});
const rows = response.data.map((item, index) => ({
text: chunks[index],
embedding: item.embedding,
}));
console.log(rows.length); // 3

Chunking documents with LangChain

Chunking makes retrieval more precise. Instead of embedding one large document, split it into smaller overlapping parts. Start with chunkSize: 800 and chunkOverlap: 120, then adjust based on your document style and answer quality.

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 800,
chunkOverlap: 120,
});
const docs = await splitter.createDocuments([
`RAG combines retrieval and generation. Store chunks as vectors and fetch similar chunks at query time.`,
]);
console.log(docs.map((doc) => doc.pageContent));

Store embeddings in pgvector

Create a table with a vector column. text-embedding-3-small outputs 1536 dimensions.

CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS rag_chunks (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding VECTOR(1536) NOT NULL,
source TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Insert chunk vectors from Node.js:

import pg from 'pg';
import pgvector from 'pgvector/pg';
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
await pgvector.registerTypes(pool);
await pool.query(
`INSERT INTO rag_chunks (content, embedding, source)
VALUES ($1, $2, $3)`,
['Chunk content', pgvector.toSql(queryEmbedding), 'notes.md']
);

Semantic search in pgvector

Embed the user question, then retrieve nearest chunks using cosine distance. Lower distance means a closer semantic match. top-k means how many nearest chunks you return (in this query, k=4 with LIMIT 4). You can also use a simple threshold (for example 0.4) to discard weak matches. As a starting point, many setups work well in the 0.35 to 0.45 range for cosine distance, then tune with real questions from your domain.

const searchResult = await pool.query(
`SELECT id, content, source, embedding <=> $1::vector AS distance
FROM rag_chunks
ORDER BY embedding <=> $1::vector
LIMIT 4`,
[pgvector.toSql(queryEmbedding)]
);
const contextChunks = searchResult.rows.map((row) => row.content);

Threshold filtering example:

const DISTANCE_THRESHOLD = 0.4;
const filteredChunks = searchResult.rows
.filter((row) => Number(row.distance) <= DISTANCE_THRESHOLD)
.map((row) => row.content);

If no chunks pass the threshold, skip answer generation and return a fallback message:

if (filteredChunks.length === 0) {
console.log('I do not have enough context to answer this.');
process.exit(0);
}

Generate answer from retrieved context

Use retrieved chunks as grounded context for the final model call.

const context = contextChunks.join('\n\n---\n\n');
const answer = await client.responses.create({
model: 'gpt-5.5',
instructions:
'Answer only from the provided context. If context is insufficient, respond with: I do not have enough context to answer this.',
input: `Context:\n${context}\n\nQuestion: How does pgvector semantic search work?`,
});
console.log(answer.output_text);

Demo

Runnable scripts for this post live in the rag-openai-embeddings-pgvector-demo folder in the private demos repository. Get access via code demos.