Supercharging AI Agents with Persistent Vector Storage

When building AI agents you’ll notice something frustrating, they forget everything as soon as the conversation ends. This is obviously not ideal, and it’s where vector storage comes in. I’m going to share how you can implement a practical, persistent memory system for your agents.
The Memory Problem
When building agents it makes sense to initially focus on their reasoning capabilities and tool integration. But as soon as you start using these agents for real work, you’ll quickly discover that their ephemeral memory severely limits their usefulness. Users get frustrated repeating information, agents lose track of long-running tasks, and contextual knowledge vanishes between sessions.
What we need is a way to give our agents memory that persists, can be searched semantically, and intelligently fits within the context limits of language models. That’s what I’ll explain in this post.
Why Vector Storage?
Traditional databases work well for structured data, but conversations and knowledge are often unstructured and nuanced. Vector embeddings solve this problem by encoding semantic meaning into numerical vectors, allowing us to find information based on conceptual similarity rather than exact keyword matches.
This is perfect for agent memory because it enables:
- Finding contextually relevant messages even when phrasing differs
- Prioritizing information that’s semantically related to the current query
- Organizing knowledge in a way that mirrors how language models process information
A well-designed vector storage system acts like an extended memory for your agent, dramatically improving its ability to maintain context and access relevant information.
The Architecture of Agent Memory
After experimenting with several approaches, I’ve found that a layered architecture works best for agent memory systems. Let’s break down the key components.
At the core we have the Vector Store Adapter
which provides an abstraction layer for different vector database technologies. This lets you switch between implementations (like ChromaDB, Pinecone, or even a custom solution) without changing your application code.
The Memory Store
manages your conversation history, handling the persistence and retrieval of messages. It takes care of embedding generation, provides operations for saving and retrieving messages, and implements memory management policies like FIFO eviction when you reach storage limits.
The Context Selector
determines which messages to include in the AI’s context window. It implements selection strategies to pick the most relevant messages while respecting token limits, and typically prioritizes certain message types (like system prompts).
Finally, an Embeddings
module generates vector representations of text content. This might use a local model or an external API, but the important part is that it standardizes embedding dimensions for consistent semantic search.
Here’s how to implement each of these components.
Building the Vector Store Layer
The first step is defining a clear interface for our vector store. Here’s an example of what that might look like:
interface VectorMetadata {
role: string;
type?: string;
createdAt?: number;
tool?: string;
parameters?: Record<string, any>;
error?: string;
[key: string]: any;
}
interface VectorSearchResult {
id: string;
score: number;
metadata: VectorMetadata;
content: string;
}
interface VectorGetResult {
id: string;
vector?: number[];
metadata: VectorMetadata;
content: string;
}
interface VectorStoreAdapter {
addVector(id: string, vector: number[], metadata: VectorMetadata, content: string): Promise<void>;
getById(ids: string[]): Promise<VectorGetResult[]>;
search(vector: number[], limit: number): Promise<VectorSearchResult[]>;
searchByMetadata(filter: Partial<VectorMetadata>, limit: number): Promise<VectorGetResult[]>;
deleteVector(id: string): Promise<void>;
}
TypeScriptThis interface defines how we’ll interact with any vector database. The beauty of this approach is that we can swap implementations without changing anything else in our system.
For example, here’s a simplified implementation using Chroma:
import { ChromaClient, Collection } from 'chromadb';
export const createChromaAdapter = async (
config: VectorStoreConfig
): Promise<VectorStoreAdapter> => {
// Initialize the ChromaDB client
const client = new ChromaClient({ path: config.serverUrl || 'http://localhost:8000' });
// Create or get collection
let collectionName = config.collectionName;
if (config.userId) {
collectionName = `${collectionName}_${config.userId}`;
if (config.conversationId) {
collectionName = `${collectionName}_${config.conversationId}`;
}
}
const collection: Collection = await client.getOrCreateCollection({ name: collectionName });
return {
addVector: async (id, vector, metadata, content) => {
await collection.add({
ids: [id],
embeddings: [vector],
metadatas: [metadata],
documents: [content],
});
},
getById: async (ids) => {
const result = await collection.get({ ids });
return ids.map((id, index) => ({
id,
metadata: (result.metadatas?.[index] as VectorMetadata) || {},
content: (result.documents?.[index] as string) || '',
}));
},
search: async (vector, limit) => {
const results = await collection.query({
queryEmbeddings: [vector],
nResults: limit,
});
// Map results to our interface
return (results.ids?.[0] || []).map((id, i) => ({
id,
score: results.distances?.[0]?.[i] || 0,
metadata: (results.metadatas?.[0]?.[i] as VectorMetadata) || {},
content: (results.documents?.[0]?.[i] as string) || '',
}));
},
searchByMetadata: async (filter, limit) => {
// Convert our filter to Chroma's format
const result = await collection.get({
where: filter as Record<string, any>,
limit,
});
return (result.ids || []).map((id, i) => ({
id,
metadata: (result.metadatas?.[i] as VectorMetadata) || {},
content: (result.documents?.[i] as string) || '',
}));
},
deleteVector: async (id) => {
await collection.delete({ ids: [id] });
},
};
};
TypeScriptThis adapter translates our standard interface to ChromaDB’s specific API, handling the details of connection, collection management, and query formatting.
The Memory Store
Now let’s implement the memory store that will use our vector adapter:
interface VectorMemoryStore {
loadInitialContext(systemPrompt?: string, maxRecentCount?: number): Promise<Message[]>;
appendMessage(message: Message): Promise<void>;
updateMessage(message: Message): Promise<void>;
getFullHistory(): Promise<Message[]>;
searchSimilar(query: string, limit?: number): Promise<Message[]>;
searchByMetadata(metadata: Partial<VectorMetadata>, limit?: number): Promise<Message[]>;
getRecentMessages(count: number, filter?: (msg: Message) => boolean): Promise<Message[]>;
}
export const createVectorMemoryStore = async (
config: VectorMemoryStoreConfig
): Promise<VectorMemoryStore> => {
// Create namespaced persist path for file storage
let sequencePath = config.persistPath;
if (config.userId && config.conversationId) {
sequencePath = path.join(config.persistPath, config.userId, config.conversationId);
} else if (config.userId) {
sequencePath = path.join(config.persistPath, config.userId);
}
// Ensure directory exists
await fs.mkdir(sequencePath, { recursive: true });
// Full path to message sequence file
const messageSequencePath = path.join(sequencePath, 'message_sequence.json');
// Memory management settings
let memoryOptions = {
maxMessages: config.maxMessages || 1000,
protectedRoles: ['system'],
};
// Load existing sequence if available
let messageSequence: string[] = [];
try {
const data = await fs.readFile(messageSequencePath, 'utf-8');
messageSequence = JSON.parse(data);
} catch {}
// Implementation of core methods using vector store
const appendMessage = async (message: Message): Promise<void> => {
// Generate embedding for the message content
const content =
typeof message.content === 'string' ? message.content : JSON.stringify(message.content);
const embedding = await getEmbedding(content);
// Add to vector store
await config.vectorStore.addVector(message.id, embedding, messageToMetadata(message), content);
// Update sequence and persist
messageSequence.push(message.id);
await persistSequence();
};
const searchSimilar = async (query: string, limit = 5): Promise<Message[]> => {
const embedding = await getEmbedding(query);
const results = await config.vectorStore.search(embedding, limit);
return results.map((result) => vectorToMessage(result.id, result.content, result.metadata));
};
// Rest of implementation...
return {
loadInitialContext,
appendMessage,
updateMessage,
getFullHistory,
searchSimilar,
searchByMetadata,
getRecentMessages,
// ...other methods
};
};
TypeScriptThis memory store handles the core operations of saving, retrieving, and searching messages, using our vector store adapter for the heavy lifting.
Context Selection: Making Smart Choices
The next key component is the context selector, which determines which messages to include in the AI’s context window:
interface ContextSelector {
selectMessages(
messages: Message[],
options: {
maxTokens?: number;
maxMessages?: number;
requiredIds?: string[];
},
provider: Provider
): Promise<Message[]>;
}
export const createContextSelector = (config: ContextSelectorConfig): ContextSelector => {
return {
selectMessages: async (messages, options, provider) => {
const maxTokens = options.maxTokens || 4000;
const maxMessages = options.maxMessages || config.maxMessages || 10;
// Always include required messages (like system prompts)
const requiredMessages = options.requiredIds
? messages.filter((msg) => options.requiredIds!.includes(msg.id))
: [];
// Allocate tokens for required messages
let usedTokens = await calculateTokens(requiredMessages, provider);
let remainingTokens = maxTokens - usedTokens;
// Filter candidate messages
const candidateMessages = messages
.filter((msg) => !requiredMessages.some((rm) => rm.id === msg.id))
.sort((a, b) => new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime());
// Select recent messages that fit in the token budget
const selectedMessages: Message[] = [];
for (const message of candidateMessages) {
const tokenCount = await provider.countTokens([message]);
if (tokenCount <= remainingTokens && selectedMessages.length < maxMessages) {
selectedMessages.push(message);
remainingTokens -= tokenCount;
}
if (selectedMessages.length >= maxMessages) break;
}
// Combine and sort chronologically
return [...requiredMessages, ...selectedMessages].sort(
(a, b) => new Date(a.createdAt).getTime() - new Date(b.createdAt).getTime()
);
},
};
};
TypeScriptThis selector prioritizes recent messages while respecting the token limits of the language model. In more sophisticated implementations, you might use semantic similarity to the current query to prioritize relevant messages rather than just taking the most recent ones.
Embeddings Generation
The final piece is the embeddings generation module:
export const getEmbedding = async (text: string): Promise<number[]> => {
try {
// Primary method: use external API
const response = await fetch('https://your-embedding-api.com/embed', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text }),
});
if (!response.ok) throw new Error('Embedding API request failed');
const data = await response.json();
return data.embedding;
} catch (error) {
console.error('Error generating embedding', error);
}
};
TypeScriptIn a real implementation, you might use OpenAI’s embeddings API, a local model embedding model if cost is a factor, or a specialized embedding service. The important thing is consistency; always use the same embedding model for both storage and retrieval.
Putting It All Together
Now let’s see how these components work together in practice. Here’s how you might initialize and use the memory system:
// Initialize the memory system
const initMemory = async () => {
// Create the vector store adapter
const vectorStore = await createChromaAdapter({
collectionName: 'agent-memory',
userId: 'user-123',
conversationId: 'conv-456',
});
// Create the memory store
const memoryStore = await createVectorMemoryStore({
persistPath: './data/memory',
vectorStore,
maxMessages: 1000,
userId: 'user-123',
conversationId: 'conv-456',
});
// Create the context selector
const contextSelector = createContextSelector({
maxMessages: 20,
});
return { memoryStore, contextSelector };
};
// Example usage in an agent
const runAgentWithMemory = async (query: string) => {
const { memoryStore, contextSelector } = await initMemory();
const provider = createProvider();
// Load initial context
const historyMessages = await memoryStore.loadInitialContext(
'You are a helpful assistant with memory of past conversations.'
);
// Select relevant messages for the context
const selectedMessages = await contextSelector.selectMessages(
historyMessages,
{ maxTokens: 4000 },
provider
);
// Create user message and add to memory
const userMessage = {
id: randomUUID(),
role: 'user',
type: 'text',
content: query,
createdAt: new Date().toISOString(),
};
await memoryStore.appendMessage(userMessage);
// Process with the agent
const agent = createAgent({ provider });
const response = await agent.processQuery([...selectedMessages, userMessage]);
// Save the assistant's response to memory
await memoryStore.appendMessage({
id: randomUUID(),
role: 'assistant',
type: 'text',
content: response,
createdAt: new Date().toISOString(),
});
return response;
};
TypeScriptThis example shows the complete flow: initializing the memory system, loading the conversation history, selecting relevant messages, processing the query with the agent, and saving the results for future reference.
Advanced Memory Patterns
As you develop your agent memory system further, here are some ideas to continue improving memory management:
- Provide the agent with tools to fetch memories on demand if a user asks about something that is not in the current context.
- Dynamically set the context based on the user request to include all relevant memories, instead of only the most recent interactions.
- Add a time weighting to search results to prioritize more recent information when both older and newer information is semantically relevant.
- Implement a queue for embedding processing if the embedding API is unavailable, provide an identical local fallback, or maintain two separate embedding providers as fallback.
Forget Me Not
Adding persistent vector storage to your AI agents transforms them from forgetful chatbots into assistants with meaningful long-term memory. This memory architecture with vector store adapters, memory stores, and context selection gives you a flexible foundation that can evolve with your needs.
I’ve implemented variations of this system in recent agent projects, the improvement in the experience is significant. Users no longer need to repeat themselves, agents maintain context across sessions, and the quality of responses improves significantly when relevant past information is available.