Frontend Maximalism and LLM Context — Whole Context, Selective Context, or Memory?

As we build increasingly sophisticated LLM-powered applications, a fundamental architectural question emerges: should we pass our entire application state as a single JSON document to the LLM, should we carefully curate specific context for each interaction, or should we leverage persistent memory to maintain context across sessions?

This question is familiar to anyone who's read about Frontend Maximalism. Just as that philosophy challenges conventional wisdom about frontend-backend data distribution, LLM integration forces us to reconsider what constitutes "too much" context and whether our instinct to be selective is actually making things worse. And now, with persistent memory, we have a third path that enables new approaches.

The Parallel to Frontend Maximalism

Frontend Maximalism argues that we should fetch more data upfront and process it locally rather than making repeated backend calls with filtered queries. The reasoning: modern browsers can handle it, the user experience is faster, the code is simpler, and the architecture is more flexible.

LLM integration presents a similar tradeoff, but now with three distinct approaches:

Serialize entire application state into a single JSON document and send it with every LLM request
Leverage persistent memory for stable context and send only dynamic/current state with each request
Carefully select which pieces of state are relevant and construct minimal context for each request

Most engineers' instincts lean toward option 3. It feels wasteful to send the LLM data it doesn't need. But is that instinct serving us well? And does persistent memory (option 2) give us the best of both worlds?

Let's explore each approach.

The Case for Whole State

Let's start with a concrete example. Imagine you're building a project management app with LLM features. Here's what whole-state serialization might look like:

interface AppState {
  user: {
    id: string;
    name: string;
    preferences: UserPreferences;
  };
  projects: Project[];
  tasks: Task[];
  comments: Comment[];
  recentActivity: Activity[];
  ui: {
    selectedProjectId: string | null;
    selectedTaskId: string | null;
    viewMode: 'list' | 'board' | 'timeline';
  };
}

async function queryLLM(prompt: string, state: AppState) {
  const response = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 4000,
      messages: [{
        role: 'user',
        content: `Application State:
${JSON.stringify(state, null, 2)}

User Request: ${prompt}`
      }]
    })
  });

  return response.json();
}

This approach has significant advantages:

1. The LLM Can Make Unexpected Connections

When you ask "What should I focus on today?", you might think the answer only needs tasks and user.preferences. But with complete state, the LLM can notice that:

Your selected project has a deadline approaching
Multiple team members commented on a blocking task yesterday
You've been viewing the timeline view, suggesting planning concerns
Recent activity shows a pattern of delayed tasks in one area

The LLM discovers relationships you didn't anticipate. This is exactly analogous to Frontend Maximalism's observation that fetching extra data enables features you wouldn't have considered otherwise.

2. Dramatically Simpler Code

Compare the whole-state approach above to a selective approach:

async function queryLLM(prompt: string, state: AppState) {
  // What context do we need? Let's think...
  const context: Partial<AppState> = { user: state.user };

  // If they're asking about tasks...
  if (prompt.toLowerCase().includes('task')) {
    context.tasks = state.tasks;
  }

  // If they mention a project...
  const projectMention = extractProjectMention(prompt);
  if (projectMention) {
    context.projects = state.projects.filter(p =>
      p.id === projectMention || p.name.includes(projectMention)
    );
  }

  // Should we include recent activity? Maybe?
  // What about comments? Only relevant ones?
  // What if the question is ambiguous?
  // ...this is getting complicated fast

  const response = await fetch(/* ... */);
  return response.json();
}

The selective approach forces you to predict what the LLM needs for every possible query. This prediction logic becomes a maintenance burden and a source of bugs. When you inevitably get it wrong, the LLM gives poor answers and you don't immediately understand why.

3. Debugging Becomes Trivial

With whole state, debugging is straightforward: log the exact JSON you sent to the LLM. With selective context, you need to also log your selection logic, understand why certain data was included or excluded, and reproduce the exact decision tree that ran for that specific prompt.

// With whole state - dead simple
console.log('LLM Input:', JSON.stringify(state, null, 2));

// With selective context - good luck
console.log('LLM Input:', JSON.stringify(context, null, 2));
console.log('Selection logic triggered:', {
  hadTaskMention,
  hadProjectMention,
  includedComments: includedComments.length,
  excludedComments: excludedComments.length,
  reasonForExclusion: /* ... */
});

4. Prompt Engineering Gets Easier

When the LLM has complete context, your prompts can be simpler and more natural. You don't need to explicitly tell the LLM "here are tasks, here are projects, here's how they relate" because that structure is evident in the JSON itself.

The LLM can be instructed once, in a system prompt, about the shape of your application state. Every subsequent user prompt operates on the same consistent context structure.

The Case Against Whole State

Before you go refactor your entire app, let's talk about the real costs.

Token Economics Matter

LLMs charge per token, and tokens add up fast. If your AppState serializes to 10,000 tokens and you're making 100 LLM requests per user session, that's 1 million tokens per session in input alone. At current pricing:

Claude Sonnet: ~$3 per million input tokens
So: ~$3 per user session just for context

For a B2C app with thousands of daily active users, this gets expensive. For a B2B app with hundreds of users, it might be totally fine. Know your economics.

// Quick token estimator (rough heuristic: ~4 chars per token)
function estimateTokens(obj: any): number {
  return Math.ceil(JSON.stringify(obj).length / 4);
}

console.log(`State size: ~${estimateTokens(appState)} tokens`);
// State size: ~8,432 tokens

Context Windows Have Limits

Even with Claude's 200K context window, you can hit limits if:

Your app state is genuinely massive (10,000+ items)
You're including binary data or large text fields
You want to include conversation history alongside state

This is the genuine "too much data" scenario that Frontend Maximalism acknowledges as a valid exception.

Privacy and Security Boundaries

Just as Frontend Maximalism warns against shipping data the user shouldn't have, LLM integration raises similar concerns. If your state includes:

API keys or secrets (never!)
Data from other users (carefully consider)
Sensitive personal information (be thoughtful about logging)

You need clear boundaries. That said, this is less about whole state vs. selective context and more about properly structuring your state tree. If sensitive data shouldn't go to the LLM, it probably shouldn't be in your frontend state at all.

A Practical Decision Framework

Like most architectural decisions, the answer is "it depends." Here's a framework for deciding:

Start with Whole State If:

Your serialized state is under 20,000 tokens
You're early in product development and features are evolving rapidly
Token costs are acceptable for your business model
You want maximum LLM capability and are willing to pay for it
Your app state is already well-structured and clean
You don't want to manage memory staleness

Graduate to Memory + Current State If:

Your application structure has stabilized
Token costs are becoming significant but not prohibitive
You have clear boundaries between structural and dynamic data
Users have multi-session workflows
Your state regularly exceeds 20K tokens but most is structural
You're willing to manage memory lifecycle (staleness detection, refreshes)

Move to Selective Context If:

Token costs are prohibitive even with memory
Your state regularly exceeds 50,000 tokens of truly dynamic data
You have clear use-case boundaries (e.g., "task planning" vs. "project reporting")
You're hitting context window limits
Performance of serialization becomes an issue
Privacy boundaries require careful filtering

The Hybrid Approach

Consider a tiered strategy:

interface LLMContext {
  core: CoreAppState;        // Always included (~1K tokens)
  extended?: ExtendedState;   // Include for complex queries
  full?: AppState;           // Only for explicit "analyze everything" requests
}

function buildLLMContext(
  prompt: string,
  state: AppState,
  tier: 'core' | 'extended' | 'full' = 'core'
): LLMContext {
  const context: LLMContext = {
    core: {
      user: state.user,
      ui: state.ui,
      recentActivity: state.recentActivity.slice(0, 10)
    }
  };

  if (tier === 'extended' || tier === 'full') {
    context.extended = {
      projects: state.projects,
      tasks: state.tasks.filter(t => t.status !== 'archived')
    };
  }

  if (tier === 'full') {
    context.full = state;
  }

  return context;
}

This gives you an escape hatch without abandoning the simplicity of whole-state approaches.

Implementation Patterns

If you're going with whole state, here are some patterns that work well:

1. State Normalization is Your Friend

Structure your state to be LLM-friendly:

// Less good - denormalized and verbose
interface VerboseState {
  projects: Array<{
    id: string;
    name: string;
    tasks: Array<{
      id: string;
      title: string;
      assignee: { id: string; name: string; email: string };
      comments: Array<{ /* ... */ }>;
    }>;
  }>;
}

// Better - normalized and concise
interface NormalizedState {
  projects: Record<string, Project>;
  tasks: Record<string, Task>;
  users: Record<string, User>;
  comments: Record<string, Comment>;

  // Relationships as ID arrays
  projectTasks: Record<string, string[]>;
  taskComments: Record<string, string[]>;
}

Normalized state is both more token-efficient and easier for LLMs to process. It's also better for your React app.

2. Pruning Strategies

Even with whole state, you can be smart about what "whole" means:

function pruneStateForLLM(state: AppState): AppState {
  return {
    ...state,
    tasks: state.tasks.filter(t =>
      t.status !== 'archived' &&
      t.status !== 'deleted'
    ).map(t => ({
      ...t,
      // Remove fields the LLM doesn't benefit from
      internalMetadata: undefined,
      cachedComputations: undefined
    })),
    // Limit activity history
    recentActivity: state.recentActivity.slice(0, 50)
  };
}

3. React Context for LLM Interactions

If you're all-in on whole state, create a dedicated context:

interface LLMContextValue {
  queryLLM: (prompt: string) => Promise<LLMResponse>;
  isLoading: boolean;
  lastError: Error | null;
}

const LLMContext = createContext<LLMContextValue | null>(null);

export function LLMProvider({ children }: { children: React.ReactNode }) {
  const appState = useAppState(); // Your existing state management
  const [isLoading, setIsLoading] = useState(false);
  const [lastError, setLastError] = useState<Error | null>(null);

  const queryLLM = useCallback(async (prompt: string) => {
    setIsLoading(true);
    setLastError(null);

    try {
      const prunedState = pruneStateForLLM(appState);
      const response = await fetch(/* API call with prunedState */);
      return await response.json();
    } catch (error) {
      setLastError(error as Error);
      throw error;
    } finally {
      setIsLoading(false);
    }
  }, [appState]);

  return (
    <LLMContext.Provider value={{ queryLLM, isLoading, lastError }}>
      {children}
    </LLMContext.Provider>
  );
}

// Usage in components
function TaskAssistant() {
  const { queryLLM, isLoading } = useLLMContext();

  const handleSuggestNext = async () => {
    const response = await queryLLM("What should I work on next?");
    // LLM has full context, no need to explain what "next" means
  };

  return (/* ... */);
}

When to Reconsider

There are clear signals that whole-state isn't working:

Cost signals:

Token costs exceed 5% of revenue for B2C
Token costs exceed 1% of contract value for B2B
Your AWS/Anthropic bill is growing faster than your user base

Performance signals:

Serialization takes >100ms
State size regularly exceeds 100K tokens
Users experience lag when triggering LLM features

Quality signals:

LLM responses seem confused or inconsistent
You're hitting context window limits
Prompts require increasingly complex instructions to ignore irrelevant data

The Memory Factor: A Third Path

The landscape shifted significantly with Claude's introduction of persistent memory across conversations. This creates a fundamentally new option beyond whole state or selective context: leveraging memory for stable context while passing only dynamic state.

How Memory Changes the Calculation

Memory allows Claude to retain professional context, work patterns, project details, and preferences across sessions, with project-specific boundaries. For applications integrating Claude, this means:

Stable context lives in memory:

// On first interaction or when structure changes, update memory
async function initializeAppMemory(state: AppState) {
  await queryLLM(`
Please remember this application structure for future interactions:

- This is a project management app with projects, tasks, and comments
- Projects have: ${Object.keys(state.projects[0])}
- Tasks have: ${Object.keys(state.tasks[0])}
- Users can view data in list, board, or timeline modes
- The app follows these relationships: ${explainRelationships()}

Store this as your understanding of the application structure.
  `);
}

// Subsequent interactions only need current state
async function queryWithMemory(prompt: string, currentState: AppState) {
  // Memory already knows the structure
  // We only pass what's changed or what's currently relevant
  return await queryLLM(`
Current app state:
- Selected project: ${currentState.ui.selectedProjectId}
- Active tasks: ${currentState.tasks.filter(t => t.status === 'active')}
- Recent activity (last 5): ${currentState.recentActivity.slice(0, 5)}

User request: ${prompt}
  `);
}

Dynamic state in each request:

Current selections
Recently changed items
User-specific real-time data

This hybrid approach could dramatically reduce token usage while maintaining rich context.

The Memory-State Tradeoffs

Memory is project-scoped, with each project maintaining its own separate memory space, which aligns well with application boundaries. However, this introduces new considerations:

Benefits:

Token efficiency: Send 2-5K tokens of current state instead of 20K full state
Structural persistence: Don't re-explain your data model every request
User preference retention: "Always show deadlines in GMT-5" lives in memory, not state
Cross-session continuity: Memory persists even after the user closes your app

Challenges:

Staleness risk: Memory doesn't auto-update when your schema changes
Synchronization complexity: You now manage two sources of truth
Debugging opacity: Issues might stem from stale memory, not current state
Memory scope uncertainty: What exactly has Claude remembered?

Memory Management Strategies

If you're using memory, you need explicit strategies for keeping it current:

interface MemoryManager {
  version: string; // Track schema version
  lastUpdated: Date;
  shouldRefresh: (state: AppState) => boolean;
}

const memoryManager: MemoryManager = {
  version: '2.1.0',
  lastUpdated: new Date(),
  shouldRefresh: (state) => {
    // Refresh memory if schema changed
    if (state.schemaVersion !== memoryManager.version) return true;

    // Refresh periodically (weekly)
    const weekAgo = Date.now() - 7 * 24 * 60 * 60 * 1000;
    if (memoryManager.lastUpdated.getTime() < weekAgo) return true;

    // Refresh on structural changes
    if (state.projects.length === 0 && state.hadProjects) return true;

    return false;
  }
};

async function queryLLMWithManagedMemory(
  prompt: string,
  state: AppState
) {
  if (memoryManager.shouldRefresh(state)) {
    await refreshMemory(state);
    memoryManager.lastUpdated = new Date();
  }

  return await queryWithMemory(prompt, state);
}

async function refreshMemory(state: AppState) {
  await queryLLM(`
MEMORY UPDATE: The application structure has changed.

Please update your memory with this current structure:
${generateStructuralDescription(state)}

This replaces any previous structural information you had stored.
  `);
}

When to Use Each Approach

Use whole state (ignore memory) if:

You're early in development and state structure changes daily
Your state is small (<20K tokens) and token costs aren't a concern
You want maximum reliability and don't want to manage memory staleness
You're building single-session experiences

Use memory + current state if:

Your application structure is stable
Token costs are significant (high usage, large state)
You have clear boundaries between structural and dynamic data
Users have multi-session workflows
You want to leverage user preferences that span sessions

Use selective context if:

Token costs are prohibitive even with memory
You have extremely large state (>100K tokens)
Privacy boundaries require careful filtering
You have distinct use cases with minimal overlap

Memory Import/Export: Programmatic Control

Claude supports importing and exporting memories, which opens up interesting possibilities for application developers:

interface MemorySnapshot {
  version: string;
  timestamp: Date;
  structural: StructuralMemory;
  preferences: UserPreferences;
  domain: DomainKnowledge;
}

class MemoryBackupService {
  // Periodically export memory for version control
  async backupMemory(): Promise<MemorySnapshot> {
    const memory = await this.exportClaudeMemory();
    return {
      version: APP_VERSION,
      timestamp: new Date(),
      ...memory
    };
  }

  // Restore known-good memory state
  async restoreMemory(snapshot: MemorySnapshot) {
    await this.importClaudeMemory(snapshot);
  }

  // Reset memory on major schema changes
  async resetForNewSchema(state: AppState) {
    await this.clearMemory();
    await this.initialize(state);
  }
}

This capability allows you to:

Version control your application's memory state
Roll back to known-good memory configurations
Test with different memory states
Migrate users from other LLM platforms
Provide "starter memories" for new users

However, as of now, memory management is primarily through the Claude interface rather than a programmatic API, so this approach requires manual export/import workflows.

The Incognito Mode Consideration

Claude offers incognito mode for conversations that don't save to memory or chat history. This is important for applications:

interface LLMQueryOptions {
  useMemory: boolean;
  incognitoMode?: boolean; // For sensitive operations
}

async function queryLLM(
  prompt: string,
  state: AppState,
  options: LLMQueryOptions = { useMemory: true }
) {
  if (options.incognitoMode) {
    // Explicitly pass full context since memory won't be used
    return await queryInIncognito(prompt, state);
  }

  if (options.useMemory) {
    return await queryWithMemory(prompt, state);
  }

  return await queryWithFullState(prompt, state);
}

// Example usage
async function handleSensitiveAnalysis() {
  // Financial projections shouldn't persist in memory
  return await queryLLM(
    "Analyze cash flow for Q4",
    appState,
    { useMemory: false, incognitoMode: true }
  );
}

Memory as an Architectural Layer

The memory feature fundamentally changes how we think about LLM integration architecture. Instead of a stateless request-response model, we now have:

Application State (Mutable, Ephemeral)
  - Current Selection: Project #42
  - Active Tasks: [Task #1, Task #5, ...]
  - Recent Changes: [...]
  - Real-time Data: [...]
        ↓
[State Serialization]
        ↓
LLM Request Context (Per-request)
  - Here's what's happening RIGHT NOW
  - Current selections and recent activity
        +
LLM Memory (Persistent, Managed)
  - I know about your app structure
  - I remember your work patterns
  - I recall your preferences
  - I understand your domain
       ↓
[LLM Response]

This creates a three-tier architecture:

Ephemeral state - What's happening now (each request)
Persistent memory - What the LLM should always know (managed updates)
Application context - Structural knowledge that rarely changes (refresh on schema changes)

Practical Example: Memory-First Design

Here's what a memory-optimized implementation looks like:

class MemoryAwareLLMService {
  private memoryInitialized = false;
  private memoryVersion = '1.0.0';

  async initialize(state: AppState) {
    if (this.memoryInitialized) return;

    // One-time memory setup
    await this.queryLLM(`
You're assisting with a project management application. Remember:

**Data Model:**
- Projects: collections of related work
- Tasks: individual work items with status, assignee, deadline
- Comments: discussions on tasks
- Activity: audit log of changes

**User Preferences:**
- Uses timeline view for planning
- Prefers tasks grouped by priority
- Works in EST timezone

**Business Context:**
- Team focuses on product development
- Typical sprint is 2 weeks
- We prioritize shipping over perfection

Store this structural context for all future interactions.
    `);

    this.memoryInitialized = true;
  }

  async query(prompt: string, state: AppState) {
    await this.initialize(state);

    // Now only send minimal current state
    const minimalContext = {
      selectedProject: state.projects.find(
        p => p.id === state.ui.selectedProjectId
      ),
      activeTasks: state.tasks.filter(
        t => t.status === 'active'
      ).slice(0, 10),
      recentActivity: state.recentActivity.slice(0, 5)
    };

    return await this.queryLLM(`
Current Context:
${JSON.stringify(minimalContext, null, 2)}

Request: ${prompt}
    `);
  }
}

// Usage
const llmService = new MemoryAwareLLMService();

// First call initializes memory (one-time cost)
await llmService.query("What should I focus on?", state);
// ~5K tokens (setup) + 2K tokens (current state)

// Subsequent calls only send current state
await llmService.query("Show me blocking tasks", state);
// ~2K tokens (current state only)

// Compare to whole-state approach
await queryLLMWithFullState("Show me blocking tasks", state);
// ~20K tokens every single time

The Memory Sweet Spot

Memory is most valuable when:

Your application has a stable core structure with dynamic data
You have frequent user sessions (memory persists between them)
You want to reduce token costs without sacrificing context quality
You can manage memory lifecycle (detecting staleness, triggering refreshes)
Your users have consistent workflows the LLM can learn

Memory is less valuable when:

State structure changes frequently (early development)
Sessions are isolated/single-use
You need guaranteed consistency (memory can lag behind reality)
Token costs aren't a concern

The Broader Principle

The question of whole state vs. selective context vs. memory-augmented state is really a question about where we draw architectural boundaries in an LLM-powered world. Just as Frontend Maximalism observes that we can push more work to the frontend than conventional wisdom suggests, LLM integration may mean we can push more context to the model than our instincts tell us.

The calculus changes when:

The consumer (the LLM) is remarkably good at handling large, varied inputs
The cost of under-providing context is silent failures and degraded experiences
The cost of over-providing context is measurable and predictable (tokens)
The complexity of "getting it right" with selective context is high

This isn't an argument that whole state is always right. It's an argument that whole state deserves serious consideration, especially in the early stages of product development when simplicity and flexibility matter most.

Just like Frontend Maximalism, the whole-state approach might not scale forever. But it often scales further than you think. And by the time it doesn't scale, you'll have learned enough about your actual usage patterns to make informed decisions about where to add complexity.

Conclusion: Three Paths, All Valid

If you reject a whole-state approach or memory-augmented approach, be rigorous about why. Write down:

How many tokens does our full state serialize to?
What's that cost per user session at our expected usage?
How stable is our application structure vs. our dynamic data?
Could memory handle our structural context while we pass current state?
What's the cost of memory management (staleness detection, refresh logic)?
How does that compare to our unit economics?
What specific problems does selective context solve?
What complexity does each approach introduce?

You might find, like many engineers who've seriously considered Frontend Maximalism, that the simple approach works better than you expected. Or you might find genuine constraints that require a more sophisticated architecture. Memory adds a powerful middle ground: reduced token costs while maintaining rich context.

The winning formula for many applications may be:

Start with whole state - simplest, fastest to build
Add memory when state structure stabilizes - better token economics
Add selective context only when costs demand it - highest complexity, use sparingly

Either way, considering whole-state LLM integration will probably make your architecture better, whatever it ends up being.

Remember Gall's Law: A complex system that works is invariably found to have evolved from a simple system that worked. Start simple. Add complexity only when you've measured the need for it. Get in touch to discuss your requirements.