Supercharging AI Agents with Persistent Vector Storage

April 22, 2025 • 10 min read

When building AI agents you’ll notice something frustrating, they forget everything as soon as the conversation ends. This is obviously not ideal, and it’s where vector storage comes in. I’m going to share how you can implement a practical, persistent memory system for your agents.

The Memory Problem

When building agents it makes sense to initially focus on their reasoning capabilities and tool integration. But as soon as you start using these agents for real work, you’ll quickly discover that their ephemeral memory severely limits their usefulness. Users get frustrated repeating information, agents lose track of long-running tasks, and contextual knowledge vanishes between sessions.

What we need is a way to give our agents memory that persists, can be searched semantically, and intelligently fits within the context limits of language models. That’s what I’ll explain in this post.

Why Vector Storage?

Traditional databases work well for structured data, but conversations and knowledge are often unstructured and nuanced. Vector embeddings solve this problem by encoding semantic meaning into numerical vectors, allowing us to find information based on conceptual similarity rather than exact keyword matches.

This is perfect for agent memory because it enables:

Finding contextually relevant messages even when phrasing differs
Prioritizing information that’s semantically related to the current query
Organizing knowledge in a way that mirrors how language models process information

A well-designed vector storage system acts like an extended memory for your agent, dramatically improving its ability to maintain context and access relevant information.

The Architecture of Agent Memory

After experimenting with several approaches, I’ve found that a layered architecture works best for agent memory systems. Let’s break down the key components.

At the core we have the Vector Store Adapter which provides an abstraction layer for different vector database technologies. This lets you switch between implementations (like ChromaDB, Pinecone, or even a custom solution) without changing your application code.

The Memory Store manages your conversation history, handling the persistence and retrieval of messages. It takes care of embedding generation, provides operations for saving and retrieving messages, and implements memory management policies like FIFO eviction when you reach storage limits.

The Context Selector determines which messages to include in the AI’s context window. It implements selection strategies to pick the most relevant messages while respecting token limits, and typically prioritizes certain message types (like system prompts).

Finally, an Embeddings module generates vector representations of text content. This might use a local model or an external API, but the important part is that it standardizes embedding dimensions for consistent semantic search.

Here’s how to implement each of these components.

Building the Vector Store Layer

The first step is defining a clear interface for our vector store. Here’s an example of what that might look like:

interface VectorMetadata {
  role: string;
  type?: string;
  createdAt?: number;
  tool?: string;
  parameters?: Record<string, any>;
  error?: string;
  [key: string]: any;
}

interface VectorSearchResult {
  id: string;
  score: number;
  metadata: VectorMetadata;
  content: string;
}

interface VectorGetResult {
  id: string;
  vector?: number[];
  metadata: VectorMetadata;
  content: string;
}

interface VectorStoreAdapter {
  addVector(id: string, vector: number[], metadata: VectorMetadata, content: string): Promise<void>;
  getById(ids: string[]): Promise<VectorGetResult[]>;
  search(vector: number[], limit: number): Promise<VectorSearchResult[]>;
  searchByMetadata(filter: Partial<VectorMetadata>, limit: number): Promise<VectorGetResult[]>;
  deleteVector(id: string): Promise<void>;
}

TypeScript

This interface defines how we’ll interact with any vector database. The beauty of this approach is that we can swap implementations without changing anything else in our system.

For example, here’s a simplified implementation using Chroma:

import { ChromaClient, Collection } from 'chromadb';

export const createChromaAdapter = async (
  config: VectorStoreConfig
): Promise<VectorStoreAdapter> => {
  // Initialize the ChromaDB client
  const client = new ChromaClient({ path: config.serverUrl || 'http://localhost:8000' });

  // Create or get collection
  let collectionName = config.collectionName;
  if (config.userId) {
    collectionName = `${collectionName}_${config.userId}`;
    if (config.conversationId) {
      collectionName = `${collectionName}_${config.conversationId}`;
    }
  }

  const collection: Collection = await client.getOrCreateCollection({ name: collectionName });

  return {
    addVector: async (id, vector, metadata, content) => {
      await collection.add({
        ids: [id],
        embeddings: [vector],
        metadatas: [metadata],
        documents: [content],
      });
    },

    getById: async (ids) => {
      const result = await collection.get({ ids });
      return ids.map((id, index) => ({
        id,
        metadata: (result.metadatas?.[index] as VectorMetadata) || {},
        content: (result.documents?.[index] as string) || '',
      }));
    },

    search: async (vector, limit) => {
      const results = await collection.query({
        queryEmbeddings: [vector],
        nResults: limit,
      });

      // Map results to our interface
      return (results.ids?.[0] || []).map((id, i) => ({
        id,
        score: results.distances?.[0]?.[i] || 0,
        metadata: (results.metadatas?.[0]?.[i] as VectorMetadata) || {},
        content: (results.documents?.[0]?.[i] as string) || '',
      }));
    },

    searchByMetadata: async (filter, limit) => {
      // Convert our filter to Chroma's format
      const result = await collection.get({
        where: filter as Record<string, any>,
        limit,
      });

      return (result.ids || []).map((id, i) => ({
        id,
        metadata: (result.metadatas?.[i] as VectorMetadata) || {},
        content: (result.documents?.[i] as string) || '',
      }));
    },

    deleteVector: async (id) => {
      await collection.delete({ ids: [id] });
    },
  };
};

TypeScript

This adapter translates our standard interface to ChromaDB’s specific API, handling the details of connection, collection management, and query formatting.

The Memory Store

Now let’s implement the memory store that will use our vector adapter:

interface VectorMemoryStore {
  loadInitialContext(systemPrompt?: string, maxRecentCount?: number): Promise<Message[]>;
  appendMessage(message: Message): Promise<void>;
  updateMessage(message: Message): Promise<void>;
  getFullHistory(): Promise<Message[]>;
  searchSimilar(query: string, limit?: number): Promise<Message[]>;
  searchByMetadata(metadata: Partial<VectorMetadata>, limit?: number): Promise<Message[]>;
  getRecentMessages(count: number, filter?: (msg: Message) => boolean): Promise<Message[]>;
}

export const createVectorMemoryStore = async (
  config: VectorMemoryStoreConfig
): Promise<VectorMemoryStore> => {
  // Create namespaced persist path for file storage
  let sequencePath = config.persistPath;
  if (config.userId && config.conversationId) {
    sequencePath = path.join(config.persistPath, config.userId, config.conversationId);
  } else if (config.userId) {
    sequencePath = path.join(config.persistPath, config.userId);
  }

  // Ensure directory exists
  await fs.mkdir(sequencePath, { recursive: true });

  // Full path to message sequence file
  const messageSequencePath = path.join(sequencePath, 'message_sequence.json');

  // Memory management settings
  let memoryOptions = {
    maxMessages: config.maxMessages || 1000,
    protectedRoles: ['system'],
  };

  // Load existing sequence if available
  let messageSequence: string[] = [];
  try {
    const data = await fs.readFile(messageSequencePath, 'utf-8');
    messageSequence = JSON.parse(data);
  } catch {}

  // Implementation of core methods using vector store

  const appendMessage = async (message: Message): Promise<void> => {
    // Generate embedding for the message content
    const content =
      typeof message.content === 'string' ? message.content : JSON.stringify(message.content);

    const embedding = await getEmbedding(content);

    // Add to vector store
    await config.vectorStore.addVector(message.id, embedding, messageToMetadata(message), content);

    // Update sequence and persist
    messageSequence.push(message.id);
    await persistSequence();
  };

  const searchSimilar = async (query: string, limit = 5): Promise<Message[]> => {
    const embedding = await getEmbedding(query);
    const results = await config.vectorStore.search(embedding, limit);

    return results.map((result) => vectorToMessage(result.id, result.content, result.metadata));
  };

  // Rest of implementation...

  return {
    loadInitialContext,
    appendMessage,
    updateMessage,
    getFullHistory,
    searchSimilar,
    searchByMetadata,
    getRecentMessages,
    // ...other methods
  };
};

TypeScript

This memory store handles the core operations of saving, retrieving, and searching messages, using our vector store adapter for the heavy lifting.

Context Selection: Making Smart Choices

The next key component is the context selector, which determines which messages to include in the AI’s context window:

interface ContextSelector {
  selectMessages(
    messages: Message[],
    options: {
      maxTokens?: number;
      maxMessages?: number;
      requiredIds?: string[];
    },
    provider: Provider
  ): Promise<Message[]>;
}

export const createContextSelector = (config: ContextSelectorConfig): ContextSelector => {
  return {
    selectMessages: async (messages, options, provider) => {
      const maxTokens = options.maxTokens || 4000;
      const maxMessages = options.maxMessages || config.maxMessages || 10;

      // Always include required messages (like system prompts)
      const requiredMessages = options.requiredIds
        ? messages.filter((msg) => options.requiredIds!.includes(msg.id))
        : [];

      // Allocate tokens for required messages
      let usedTokens = await calculateTokens(requiredMessages, provider);
      let remainingTokens = maxTokens - usedTokens;

      // Filter candidate messages
      const candidateMessages = messages
        .filter((msg) => !requiredMessages.some((rm) => rm.id === msg.id))
        .sort((a, b) => new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime());

      // Select recent messages that fit in the token budget
      const selectedMessages: Message[] = [];

      for (const message of candidateMessages) {
        const tokenCount = await provider.countTokens([message]);

        if (tokenCount <= remainingTokens && selectedMessages.length < maxMessages) {
          selectedMessages.push(message);
          remainingTokens -= tokenCount;
        }

        if (selectedMessages.length >= maxMessages) break;
      }

      // Combine and sort chronologically
      return [...requiredMessages, ...selectedMessages].sort(
        (a, b) => new Date(a.createdAt).getTime() - new Date(b.createdAt).getTime()
      );
    },
  };
};

TypeScript

This selector prioritizes recent messages while respecting the token limits of the language model. In more sophisticated implementations, you might use semantic similarity to the current query to prioritize relevant messages rather than just taking the most recent ones.

Embeddings Generation

The final piece is the embeddings generation module:

export const getEmbedding = async (text: string): Promise<number[]> => {
  try {
    // Primary method: use external API
    const response = await fetch('https://your-embedding-api.com/embed', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ text }),
    });

    if (!response.ok) throw new Error('Embedding API request failed');

    const data = await response.json();
    return data.embedding;
  } catch (error) {
    console.error('Error generating embedding', error);
  }
};

TypeScript

In a real implementation, you might use OpenAI’s embeddings API, a local model embedding model if cost is a factor, or a specialized embedding service. The important thing is consistency; always use the same embedding model for both storage and retrieval.

Putting It All Together

Now let’s see how these components work together in practice. Here’s how you might initialize and use the memory system:

// Initialize the memory system
const initMemory = async () => {
  // Create the vector store adapter
  const vectorStore = await createChromaAdapter({
    collectionName: 'agent-memory',
    userId: 'user-123',
    conversationId: 'conv-456',
  });

  // Create the memory store
  const memoryStore = await createVectorMemoryStore({
    persistPath: './data/memory',
    vectorStore,
    maxMessages: 1000,
    userId: 'user-123',
    conversationId: 'conv-456',
  });

  // Create the context selector
  const contextSelector = createContextSelector({
    maxMessages: 20,
  });

  return { memoryStore, contextSelector };
};

// Example usage in an agent
const runAgentWithMemory = async (query: string) => {
  const { memoryStore, contextSelector } = await initMemory();
  const provider = createProvider();

  // Load initial context
  const historyMessages = await memoryStore.loadInitialContext(
    'You are a helpful assistant with memory of past conversations.'
  );

  // Select relevant messages for the context
  const selectedMessages = await contextSelector.selectMessages(
    historyMessages,
    { maxTokens: 4000 },
    provider
  );

  // Create user message and add to memory
  const userMessage = {
    id: randomUUID(),
    role: 'user',
    type: 'text',
    content: query,
    createdAt: new Date().toISOString(),
  };
  await memoryStore.appendMessage(userMessage);

  // Process with the agent
  const agent = createAgent({ provider });
  const response = await agent.processQuery([...selectedMessages, userMessage]);

  // Save the assistant's response to memory
  await memoryStore.appendMessage({
    id: randomUUID(),
    role: 'assistant',
    type: 'text',
    content: response,
    createdAt: new Date().toISOString(),
  });

  return response;
};

TypeScript

This example shows the complete flow: initializing the memory system, loading the conversation history, selecting relevant messages, processing the query with the agent, and saving the results for future reference.

Advanced Memory Patterns

As you develop your agent memory system further, here are some ideas to continue improving memory management:

Provide the agent with tools to fetch memories on demand if a user asks about something that is not in the current context.
Dynamically set the context based on the user request to include all relevant memories, instead of only the most recent interactions.
Add a time weighting to search results to prioritize more recent information when both older and newer information is semantically relevant.
Implement a queue for embedding processing if the embedding API is unavailable, provide an identical local fallback, or maintain two separate embedding providers as fallback.

Forget Me Not

Adding persistent vector storage to your AI agents transforms them from forgetful chatbots into assistants with meaningful long-term memory. This memory architecture with vector store adapters, memory stores, and context selection gives you a flexible foundation that can evolve with your needs.

I’ve implemented variations of this system in recent agent projects, the improvement in the experience is significant. Users no longer need to repeat themselves, agents maintain context across sessions, and the quality of responses improves significantly when relevant past information is available.

Andy Peatling

The Memory Problem

Why Vector Storage?

The Architecture of Agent Memory

Building the Vector Store Layer

The Memory Store

Context Selection: Making Smart Choices

Embeddings Generation

Putting It All Together

Advanced Memory Patterns

Forget Me Not