Context Management

How Grok One-Shot manages conversation context and documentation loading.

Overview

Grok One-Shot uses an efficient on-demand context loading system that balances comprehensive documentation access with token efficiency.

Context Loading Strategy

Traditional Approach (Old System)

Problem with auto-loading everything:

Startup context:
- GROK.md: ~6,400 bytes
- docs-index.md: ~7,600 bytes
- All 49 docs: ~65,000-85,000 tokens

Result: 65k-85k tokens consumed before user sends first message

Issues:

Massive token waste on unused documentation
Slower startup
Higher API costs
Context limit reached quickly

Current Approach (Efficient System)

On-demand loading:

Startup context:
- GROK.md: ~6,400 bytes (1,600 tokens)
- docs-index.md: ~7,600 bytes (1,900 tokens)
Total: ~3,500 tokens (95% reduction!)

Runtime:
- AI reads specific docs as needed via Read tool
- Only loads relevant documentation
- User queries load minimal context

Benefits:

94.6-95.8% token reduction at startup
Faster startup
Lower initial costs
Context budget available for actual work

How It Works

Startup Phase

What's loaded:

// src/hooks/use-claude-md.ts
export function useClaudeMd() {
  const claudeMd = readFileSync("GROK.md", "utf-8");
  const docsIndex = readFileSync("docs-index.md", "utf-8");

  return {
    systemPrompt: `${claudeMd}\n\n${docsIndex}`,
    tokenCount: ~3500,
  };
}

Result:

AI knows project structure (GROK.md)
AI knows available documentation (docs-index.md)
AI can read specific docs when needed

Runtime Phase

When AI needs specific information:

User asks question:

> How do I configure MCP servers?

AI checks docs-index.md:

AI sees:
- configuration/settings.md (covers MCP configuration)
- build-with-claude-code/mcp.md (detailed MCP guide)

AI uses Read tool:

await Read({
  file_path: ".agent/docs/claude-code/configuration/settings.md",
});

AI responds with accurate info:

To configure MCP servers, edit ~/.grok/settings.json...
[provides information from settings.md]

Context in Sessions

Session Context Accumulation

Each message adds context:

User message: +tokens (your prompt)
AI response: +tokens (AI's reply)
Tool calls: +tokens (file contents, command outputs)

Example session growth:

Initial: 3,500 tokens (GROK.md + docs-index.md)
After message 1: 5,000 tokens (+1,500)
After message 5: 12,000 tokens
After message 20: 45,000 tokens
After message 50: 90,000 tokens (approaching limit)

Context Limits

Model context window: 128,000 tokens

Practical considerations:

Good session: 10,000-50,000 tokens
- Enough context for coherent conversation
- Room for file reading and analysis

Large session: 50,000-100,000 tokens
- Still functional but getting expensive
- Consider if all context is needed

Excessive: >100,000 tokens
- Approaching model limit
- Very expensive
- Should start new session

Monitoring Context

Check token usage:

# During session
Press Ctrl+I

Output:
Token Usage:
Input: 45,230 tokens
Output: 12,450 tokens
Total: 57,680 tokens

From session files:

cat ~/.grok/sessions/latest-session.json | jq '.tokenUsage'

Context Optimization

Start New Sessions

When to start fresh:

Unrelated task
Context > 50k tokens and slowing down
No longer need old conversation
Want clean slate

How:

# Exit current session
/exit

# Start new
grok

Headless Mode for Simple Queries

Avoid session accumulation:

# Each query is independent
grok -p "list TypeScript files"
grok -p "find TODO comments"
grok -p "check for console.log"

# No context carries over between queries

Be Specific

Bad (loads lots of context):

> Tell me everything about this codebase
[AI reads many files, context explodes]

Good (targeted context):

> Explain how authentication works in src/auth/
[AI reads specific files, context stays manageable]

Advanced Context Techniques

Incremental Exploration

Build context gradually:

Step 1: "What is the overall architecture?"
[AI reads GROK.md, provides overview]

Step 2: "How does the agent system work?"
[AI reads specific agent docs]

Step 3: "Show me the GrokAgent implementation"
[AI reads src/agent/grok-agent.ts]

Benefits:

Only loads what's needed
Builds understanding progressively
Avoids context explosion

Context Pruning (Manual)

Current state: Manual

No automatic context pruning yet
User must start new session when context is large
Future enhancement: automatic context compression

How to prune manually:

# Save important findings
> Summarize what we've learned so far
[Copy summary]

# Start new session
/exit
grok

# Resume with summary
> Continuing from previous session:
[Paste summary]
Now let's...

Implemented

Efficient startup:

On-demand doc loading
Minimal initial context
Fast session start

Context monitoring:

Ctrl+I shows token usage
Session files track usage
Manual inspection available

Session management:

Save/restore sessions
Session history in ~/.grok/sessions/
Manual session control

Partially Implemented

Context awareness:

AI understands when context is large
Manual pruning via new session
No automatic warnings at thresholds

Multi-session workflows:

Can start multiple sessions
No session linking or merging
No cross-session context sharing

Planned Features

Automatic context management:

Auto-prune old messages when threshold reached
Intelligent context summarization
Keep most relevant parts, summarize old parts

Context caching:

Cache common docs (settings, quickstart)
Reduce repeated API calls
Faster responses for frequent questions

Smart context loading:

Predict which docs user will need
Pre-load related documentation
Balance prediction vs token cost

Best Practices

DO

** Monitor token usage:**

Press Ctrl+I regularly to check context size

** Start new sessions for unrelated tasks:**

/exit # End current task
grok # Fresh start for new task

** Use headless mode for simple queries:**

grok -p "quick query" # No session accumulation

** Be specific in prompts:**

"Analyze authentication in src/auth/"
vs
"Analyze everything"

DON'T

** Let sessions grow indefinitely:**

# Check tokens
Ctrl+I
# If >50k, consider new session

** Load unnecessary files:**

# Avoid: "Read all files"
# Better: "Read src/auth/middleware.ts"

** Repeat context unnecessarily:**

# Session remembers previous messages
# No need to re-explain context

Troubleshooting

High Token Usage

Symptom: Ctrl+I shows >50k tokens

Causes:

Long conversation
AI read many files
Repeated context

Solutions:

# Start new session
/exit
grok

# Or use summary technique
> Summarize findings, then start new session

Slow Responses

Symptom: AI takes long to respond

Possible cause: Large context

Check:

Ctrl+I to see token count
If >80k tokens, context is likely cause

Solution:

# Start fresh session
/exit
grok

Context Confusion

Symptom: AI confuses current task with earlier messages

Cause: Too much context mixing different topics

Solution:

# Start new session for new topic
/exit
grok

# Be explicit
> Focusing on [NEW TOPIC], ignoring previous discussion about [OLD TOPIC]

Real-Time Status Indicators

Grok One-Shot displays real-time context metrics below the input prompt to help users monitor context usage and system state.

Display Format

The status line shows three key metrics in compact format:

1.3k/128.0k (1%) │ 0 files │ 2 msgs

Metric Details

** Token Usage**: Current tokens used / maximum context window (percentage)
Current: Formatted as 1.3k (1300 tokens)
Max: 128.0k (128,000 tokens, Grok's context window)
Percent: Current usage as percentage of max
Color-coded: Green (less than 60%), Blue (60-80%), Yellow (80-90%), Red (more than 90%)
** Files**: Number of files currently loaded in workspace context
Shows files actively referenced in conversation
Helps monitor context breadth
** Messages**: Total number of messages in current conversation session
Includes system prompt, user messages, and AI responses
Indicates conversation length and context depth

Additional Indicators

When memory pressure is high, additional indicators may appear:

** Memory Pressure**: Shows when system is under memory stress (medium/high/critical)

Usage Tips

Monitor token usage to avoid hitting context limits
Start new sessions (/exit) when approaching 80% token usage
Use Ctrl+I for detailed context information and tooltip
Files count helps gauge context specificity

Implementation

These metrics are rendered by the ContextIndicator component in compact mode, providing constant visibility without cluttering the interface.

Technical Details

Implementation

Context loading hook:

// src/hooks/use-claude-md.ts
export function useClaudeMd(): string {
  const grokMd = readFileSync(path.join(cwd, "GROK.md"), "utf-8");
  const docsIndex = readFileSync(path.join(cwd, "docs-index.md"), "utf-8");
  return `${grokMd}\n\n${docsIndex}`;
}

Session context:

// src/agent/grok-agent.ts
const messages = [
  { role: "system", content: systemPrompt }, // GROK.md + docs-index.md
  ...conversationHistory, // Previous messages
  { role: "user", content: userMessage }, // Current message
];

Token counting:

// Approximate: 1 token ≈ 4 characters
const estimatedTokens = text.length / 4;

Future Enhancements

Automatic compaction:

// Planned
if (totalTokens > COMPACTION_THRESHOLD) {
  const summary = await compactOldMessages(messages);
  messages = [systemPrompt, summary, ...recentMessages];
}

Context caching:

// Planned
const cachedDocs = cache.get("common-docs");
if (!cachedDocs) {
  cachedDocs = await loadDocs();
  cache.set("common-docs", cachedDocs, TTL);
}

Overview​

Context Loading Strategy​

Traditional Approach (Old System)​

Current Approach (Efficient System)​

How It Works​

Startup Phase​

Runtime Phase​

Context in Sessions​

Session Context Accumulation​

Context Limits​

Monitoring Context​

Context Optimization​

Start New Sessions​

Headless Mode for Simple Queries​

Be Specific​

Advanced Context Techniques​

Incremental Exploration​

Context Pruning (Manual)​

Context-Related Features​

Implemented​

Partially Implemented​

Planned Features​

Best Practices​

DO​

DON'T​

Troubleshooting​

High Token Usage​

Slow Responses​

Context Confusion​

Real-Time Status Indicators​

Display Format​

Metric Details​

Additional Indicators​

Usage Tips​

Implementation​

Technical Details​

Implementation​

Future Enhancements​

See Also​

Overview

Context Loading Strategy

Traditional Approach (Old System)

Current Approach (Efficient System)

How It Works

Startup Phase

Runtime Phase

Context in Sessions

Session Context Accumulation

Context Limits

Monitoring Context

Context Optimization

Start New Sessions

Headless Mode for Simple Queries

Be Specific

Advanced Context Techniques

Incremental Exploration

Context Pruning (Manual)

Context-Related Features

Implemented

Partially Implemented

Planned Features

Best Practices

DO

DON'T

Troubleshooting

High Token Usage

Slow Responses

Context Confusion

Real-Time Status Indicators

Display Format

Metric Details

Additional Indicators

Usage Tips

Implementation

Technical Details

Implementation

Future Enhancements

See Also