Context Management
How Grok One-Shot manages conversation context and documentation loading.
Overview
Grok One-Shot uses an efficient on-demand context loading system that balances comprehensive documentation access with token efficiency.
Context Loading Strategy
Traditional Approach (Old System)
Problem with auto-loading everything:
Startup context:
- GROK.md: ~6,400 bytes
- docs-index.md: ~7,600 bytes
- All 49 docs: ~65,000-85,000 tokens
Result: 65k-85k tokens consumed before user sends first message
Issues:
- Massive token waste on unused documentation
- Slower startup
- Higher API costs
- Context limit reached quickly
Current Approach (Efficient System)
On-demand loading:
Startup context:
- GROK.md: ~6,400 bytes (1,600 tokens)
- docs-index.md: ~7,600 bytes (1,900 tokens)
Total: ~3,500 tokens (95% reduction!)
Runtime:
- AI reads specific docs as needed via Read tool
- Only loads relevant documentation
- User queries load minimal context
Benefits:
- 94.6-95.8% token reduction at startup
- Faster startup
- Lower initial costs
- Context budget available for actual work
How It Works
Startup Phase
What's loaded:
// src/hooks/use-claude-md.ts
export function useClaudeMd() {
const claudeMd = readFileSync("GROK.md", "utf-8");
const docsIndex = readFileSync("docs-index.md", "utf-8");
return {
systemPrompt: `${claudeMd}\n\n${docsIndex}`,
tokenCount: ~3500,
};
}
Result:
- AI knows project structure (GROK.md)
- AI knows available documentation (docs-index.md)
- AI can read specific docs when needed
Runtime Phase
When AI needs specific information:
- User asks question:
> How do I configure MCP servers?
- AI checks docs-index.md:
AI sees:
- configuration/settings.md (covers MCP configuration)
- build-with-claude-code/mcp.md (detailed MCP guide)
- AI uses Read tool:
await Read({
file_path: ".agent/docs/claude-code/configuration/settings.md",
});
- AI responds with accurate info:
To configure MCP servers, edit ~/.grok/settings.json...
[provides information from settings.md]
Context in Sessions
Session Context Accumulation
Each message adds context:
User message: +tokens (your prompt)
AI response: +tokens (AI's reply)
Tool calls: +tokens (file contents, command outputs)
Example session growth:
Initial: 3,500 tokens (GROK.md + docs-index.md)
After message 1: 5,000 tokens (+1,500)
After message 5: 12,000 tokens
After message 20: 45,000 tokens
After message 50: 90,000 tokens (approaching limit)
Context Limits
Model context window: 128,000 tokens
Practical considerations:
Good session: 10,000-50,000 tokens
- Enough context for coherent conversation
- Room for file reading and analysis
Large session: 50,000-100,000 tokens
- Still functional but getting expensive
- Consider if all context is needed
Excessive: >100,000 tokens
- Approaching model limit
- Very expensive
- Should start new session
Monitoring Context
Check token usage:
# During session
Press Ctrl+I
Output:
Token Usage:
Input: 45,230 tokens
Output: 12,450 tokens
Total: 57,680 tokens
From session files:
cat ~/.grok/sessions/latest-session.json | jq '.tokenUsage'
Context Optimization
Start New Sessions
When to start fresh:
- Unrelated task
- Context > 50k tokens and slowing down
- No longer need old conversation
- Want clean slate
How:
# Exit current session
/exit
# Start new
grok
Headless Mode for Simple Queries
Avoid session accumulation:
# Each query is independent
grok -p "list TypeScript files"
grok -p "find TODO comments"
grok -p "check for console.log"
# No context carries over between queries
Be Specific
Bad (loads lots of context):
> Tell me everything about this codebase
[AI reads many files, context explodes]
Good (targeted context):
> Explain how authentication works in src/auth/
[AI reads specific files, context stays manageable]
Advanced Context Techniques
Incremental Exploration
Build context gradually:
Step 1: "What is the overall architecture?"
[AI reads GROK.md, provides overview]
Step 2: "How does the agent system work?"
[AI reads specific agent docs]
Step 3: "Show me the GrokAgent implementation"
[AI reads src/agent/grok-agent.ts]
Benefits:
- Only loads what's needed
- Builds understanding progressively
- Avoids context explosion
Context Pruning (Manual)
Current state: Manual
- No automatic context pruning yet
- User must start new session when context is large
- Future enhancement: automatic context compression
How to prune manually:
# Save important findings
> Summarize what we've learned so far
[Copy summary]
# Start new session
/exit
grok
# Resume with summary
> Continuing from previous session:
[Paste summary]
Now let's...
Context-Related Features
Implemented
Efficient startup:
- On-demand doc loading
- Minimal initial context
- Fast session start
Context monitoring:
- Ctrl+I shows token usage
- Session files track usage
- Manual inspection available
Session management:
- Save/restore sessions
- Session history in
~/.grok/sessions/ - Manual session control
Partially Implemented
Context awareness:
- AI understands when context is large
- Manual pruning via new session
- No automatic warnings at thresholds
Multi-session workflows:
- Can start multiple sessions
- No session linking or merging
- No cross-session context sharing
Planned Features
Automatic context management:
- Auto-prune old messages when threshold reached
- Intelligent context summarization
- Keep most relevant parts, summarize old parts
Context caching:
- Cache common docs (settings, quickstart)
- Reduce repeated API calls
- Faster responses for frequent questions
Smart context loading:
- Predict which docs user will need
- Pre-load related documentation
- Balance prediction vs token cost
Best Practices
DO
** Monitor token usage:**
Press Ctrl+I regularly to check context size
** Start new sessions for unrelated tasks:**
/exit # End current task
grok # Fresh start for new task
** Use headless mode for simple queries:**
grok -p "quick query" # No session accumulation
** Be specific in prompts:**
"Analyze authentication in src/auth/"
vs
"Analyze everything"
DON'T
** Let sessions grow indefinitely:**
# Check tokens
Ctrl+I
# If >50k, consider new session
** Load unnecessary files:**
# Avoid: "Read all files"
# Better: "Read src/auth/middleware.ts"
** Repeat context unnecessarily:**
# Session remembers previous messages
# No need to re-explain context
Troubleshooting
High Token Usage
Symptom: Ctrl+I shows >50k tokens
Causes:
- Long conversation
- AI read many files
- Repeated context
Solutions:
# Start new session
/exit
grok
# Or use summary technique
> Summarize findings, then start new session
Slow Responses
Symptom: AI takes long to respond
Possible cause: Large context
Check:
Ctrl+I to see token count
If >80k tokens, context is likely cause
Solution:
# Start fresh session
/exit
grok
Context Confusion
Symptom: AI confuses current task with earlier messages
Cause: Too much context mixing different topics
Solution:
# Start new session for new topic
/exit
grok
# Be explicit
> Focusing on [NEW TOPIC], ignoring previous discussion about [OLD TOPIC]
Real-Time Status Indicators
Grok One-Shot displays real-time context metrics below the input prompt to help users monitor context usage and system state.
Display Format
The status line shows three key metrics in compact format:
1.3k/128.0k (1%) │ 0 files │ 2 msgs
Metric Details
-
** Token Usage**: Current tokens used / maximum context window (percentage)
-
Current: Formatted as 1.3k (1300 tokens)
-
Max: 128.0k (128,000 tokens, Grok's context window)
-
Percent: Current usage as percentage of max
-
Color-coded: Green (less than 60%), Blue (60-80%), Yellow (80-90%), Red (more than 90%)
-
** Files**: Number of files currently loaded in workspace context
-
Shows files actively referenced in conversation
-
Helps monitor context breadth
-
** Messages**: Total number of messages in current conversation session
-
Includes system prompt, user messages, and AI responses
-
Indicates conversation length and context depth
Additional Indicators
When memory pressure is high, additional indicators may appear:
- ** Memory Pressure**: Shows when system is under memory stress (medium/high/critical)
Usage Tips
- Monitor token usage to avoid hitting context limits
- Start new sessions (/exit) when approaching 80% token usage
- Use Ctrl+I for detailed context information and tooltip
- Files count helps gauge context specificity
Implementation
These metrics are rendered by the ContextIndicator component in compact mode, providing constant visibility without cluttering the interface.
Technical Details
Implementation
Context loading hook:
// src/hooks/use-claude-md.ts
export function useClaudeMd(): string {
const grokMd = readFileSync(path.join(cwd, "GROK.md"), "utf-8");
const docsIndex = readFileSync(path.join(cwd, "docs-index.md"), "utf-8");
return `${grokMd}\n\n${docsIndex}`;
}
Session context:
// src/agent/grok-agent.ts
const messages = [
{ role: "system", content: systemPrompt }, // GROK.md + docs-index.md
...conversationHistory, // Previous messages
{ role: "user", content: userMessage }, // Current message
];
Token counting:
// Approximate: 1 token ≈ 4 characters
const estimatedTokens = text.length / 4;
Future Enhancements
Automatic compaction:
// Planned
if (totalTokens > COMPACTION_THRESHOLD) {
const summary = await compactOldMessages(messages);
messages = [systemPrompt, summary, ...recentMessages];
}
Context caching:
// Planned
const cachedDocs = cache.get("common-docs");
if (!cachedDocs) {
cachedDocs = await loadDocs();
cache.set("common-docs", cachedDocs, TTL);
}
See Also
- Session Management - Session handling
- Settings - Configuration options
- Interactive Mode - Session features
- Data Usage - Privacy and data
Status: Core functionality implemented, Advanced features in progress
Efficient context management ensures fast, cost-effective AI interactions.