Code Optimization Prompt Templates
AI prompt templates for optimizing code performance. Identify bottlenecks and improve efficiency.
Overview
Performance optimization is about making informed tradeoffs, speed vs. memory, readability vs. efficiency, development time vs. runtime. These prompts help you identify real bottlenecks (not imagined ones), understand the cost of different approaches, and implement optimizations that matter. The golden rule: measure first, improve second.
Best Practices
Always include profiling data or measurements, optimization without data is guessing
Specify your performance targets (e.g. 'must handle 1000 requests/second')
Describe the scale of data being processed, what's slow at 10 items might be fine, at 10M it matters
Mention your constraints: Can you use more memory? Add caching infrastructure? Change the algorithm entirely?
Include the full hot path code, not just the function you think is slow
Prompt Templates
1. Performance Analysis
Analyze this code for performance issues: ```[LANGUAGE] [PASTE CODE] ``` **Context:** - This code: [WHAT IT DOES] - Called: [HOW OFTEN - once, per request, in a loop] - Data size: [TYPICAL AND MAXIMUM INPUT SIZES] **Current performance:** - Measured time: [IF KNOWN] - Target time: [WHAT'S ACCEPTABLE] **Profiling data (if available):** ``` [PASTE PROFILER OUTPUT] ``` **Constraints:** - Memory budget: [IF LIMITED] - Can change: [ALGORITHM / DATA STRUCTURES / EXTERNAL SERVICES] - Cannot change: [ANY CONSTRAINTS] Identify: 1. Time complexity of current approach 2. Space complexity 3. Specific bottlenecks 4. Optimization opportunities (prioritized by impact) 5. Tradeoffs of each optimization
Language: Python Code: Function that finds duplicate entries in a list of user records Called: Once per API request (p95 response time must be <200ms) Data size: Usually 1000 records, max 50,000 Current: 3 seconds for 50K records Target: Under 500ms for 50K records
## Performance Analysis
### Current Complexity
- **Time:** O(n²), nested loops comparing every record pair
- **Space:** O(n), storing duplicates list
### Bottleneck Identified
```python
# Current: O(n²) - compares every pair
for i in range(len(records)):
for j in range(i + 1, len(records)):
if records[i]['email'] == records[j]['email']: # 2.5B comparisons for 50K
duplicates.append(records[i])
```
For 50,000 records: ~1.25 billion comparisons.
### Optimizations (by impact)
**1. Use a hash set, O(n²) → O(n)** ⭐ Highest Impact
```python
def find_duplicates(records):
seen = {}
duplicates = []
for record in records:
email = record['email']
if email in seen:
duplicates.append(record)
else:
seen[email] = record
return duplicates
```
- **Time:** O(n), single pass with O(1) hash lookups
- **Space:** O(n), hash table storage
- **Expected:** 50K records in ~10ms
**2. If memory-constrained, sort first**
```python
records.sort(key=lambda r: r['email']) # O(n log n)
for i in range(1, len(records)):
if records[i]['email'] == records[i-1]['email']:
duplicates.append(records[i])
```
- **Time:** O(n log n), dominated by sort
- **Space:** O(1) extra (in-place sort)
- **Tradeoff:** Mutates input, slightly slower than hash approach
### Tradeoffs Summary
| Approach | Time | Space | Notes |
|----------|------|-------|-------|
| Current (nested loops) | O(n²) | O(n) | Unusable at scale |
| Hash set | O(n) | O(n) | Best for speed |
| Sort + scan | O(n log n) | O(1) | Best for memory |
**Recommendation:** Use hash set approach. For 50K records:
- Current: ~3,000ms
- With hash set: ~10ms (300x faster)When code is slower than needed and you want to understand why and how to fix it
- •Focus on the algorithm first (O(n²) to O(n)), then micro-optimizations
- •A 10x data increase on O(n²) is 100x slower, understand your complexity
2. Memory Optimization
Help me reduce memory usage in this code: ```[LANGUAGE] [PASTE CODE] ``` **Current memory usage:** - Peak memory: [IF MEASURED] - Where it's used: [WHAT'S CONSUMING MEMORY] **Data characteristics:** - Input size: [HOW MUCH DATA] - Data types: [STRINGS, NUMBERS, OBJECTS] - Lifetime: [HOW LONG DATA NEEDS TO STAY IN MEMORY] **Constraints:** - Memory budget: [TARGET] - Can sacrifice: [SPEED / CONVENIENCE / etc.] Suggest optimizations for: 1. Data structure choices 2. Processing patterns (streaming, chunking) 3. Memory layout and allocation 4. Cleanup and garbage collection
Language: Node.js Code: Processing a 2GB CSV file to extract statistics Current: Runs out of memory (4GB limit) reading the whole file Data: 50M rows, 10 columns, mostly strings and numbers Target: Process successfully within 4GB memory Can sacrifice: Speed (within reason)
## Memory Issue Analysis
**Current approach:** Loading entire 2GB file into memory as an array of objects.
**Why it fails:**
- Raw file: 2GB
- Parsed as JS objects: ~6-8GB (string overhead, object headers, property maps)
- Exceeds 4GB limit
## Optimization Strategies
### 1. Stream Processing (Primary Fix)
Process line by line instead of loading all at once:
```javascript
const fs = require('fs');
const readline = require('readline');
const { pipeline } = require('stream/promises');
async function processCSV(filepath) {
const stats = {
count: 0,
sum: 0,
//... other aggregations
};
const fileStream = fs.createReadStream(filepath);
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity,
});
for await (const line of rl) {
const columns = line.split(',');
// Process one row at a time
stats.count++;
stats.sum += parseFloat(columns[3]);
// Row is garbage collected after each iteration
}
return stats;
}
```
**Memory usage:** ~50MB constant (buffer size) regardless of file size.
### 2. Chunked Processing (If you need batches)
```javascript
async function* readInChunks(filepath, chunkSize = 10000) {
const rl = readline.createInterface({
input: fs.createReadStream(filepath),
});
let chunk = [];
for await (const line of rl) {
chunk.push(parseLine(line));
if (chunk.length >= chunkSize) {
yield chunk;
chunk = []; // Release memory
}
}
if (chunk.length > 0) yield chunk;
}
// Usage
for await (const chunk of readInChunks('data.csv')) {
await processChunk(chunk); // 10K rows at a time
}
```
### 3. Use Typed Arrays for Numbers
If columns are numeric:
```javascript
// Instead of: const values = [1.5, 2.3, 3.1...]
// Use: Float64Array uses 8 bytes per number vs ~24 for JS number in array
const values = new Float64Array(expectedCount);
```
### 4. Avoid String Retention
```javascript
// Bad: Holds reference to entire line
rows.push({ name: columns[0], value: columns[3] });
// Better: Only keep what you need, explicitly copy
rows.push({
name: columns[0].slice(), // New string, not substring reference
value: parseFloat(columns[3]) // Number, not string
});
```
### Memory Comparison
| Approach | Peak Memory | Speed |
|----------|-------------|-------|
| Load entire file | 6-8GB (fails) | Fast |
| Stream line by line | ~50MB | Moderate |
| Chunked (10K) | ~200MB | Fast |
| Typed arrays | ~3GB | Fast |
**Recommendation:** Stream processing. You'll use constant memory regardless of file size.When hitting memory limits, processing large datasets, or running in constrained environments
- •Streaming is almost always the answer for large files, process and discard, don't accumulate
- •JavaScript objects have significant overhead, a 10-byte string in an object costs ~60 bytes
3. Algorithm Optimization
Help me find a faster algorithm for this problem: **Problem:** [DESCRIBE WHAT YOU NEED TO SOLVE] **Current approach:** ```[LANGUAGE] [CURRENT IMPLEMENTATION] ``` **Current complexity:** [TIME AND SPACE IF KNOWN] **Input characteristics:** - Size: [TYPICAL AND MAXIMUM] - Properties: [SORTED? UNIQUE? BOUNDED VALUES?] - Access patterns: [RANDOM / SEQUENTIAL / FREQUENT LOOKUPS] **Performance requirements:** - Time target: [ACCEPTABLE LATENCY] - Can preprocess: [YES/NO - can you do upfront work?] - Space available: [MEMORY CONSTRAINTS] Suggest: 1. Better algorithms with their tradeoffs 2. Data structures that enable faster operations 3. Preprocessing opportunities 4. Approximations if exact isn't required
Problem: Find the k closest points to origin from a list of n points Current: Sort all points by distance, take first k Complexity: O(n log n) time Input: n = 10 million points, k = 100 Target: Under 1 second Can preprocess: No, points change each request
## Algorithm Analysis **Current approach:** Full sort - Time: O(n log n) = ~230 million operations for 10M points - Problem: Sorting all 10M points when you only need 100 ## Better Algorithms ### 1. Min-Heap / Priority Queue, O(n log k) ⭐ Recommended Maintain a max-heap of size k (the k smallest seen so far): ```python import heapq def k_closest(points, k): # Max-heap of size k (negate distance for max-heap behavior) heap = [] for x, y in points: dist = -(x*x + y*y) # Negative for max-heap if len(heap) < k: heapq.heappush(heap, (dist, x, y)) elif dist > heap[0][0]: # Closer than the farthest in heap heapq.heapreplace(heap, (dist, x, y)) return [(x, y) for (_, x, y) in heap] ``` **Complexity:** O(n log k) = ~66 million operations for n=10M, k=100 **Improvement:** ~3.5x faster than sort ### 2. Quickselect, O(n) average ⭐ Fastest Partition like quicksort, but only recurse into the side containing k: ```python import random def k_closest_quickselect(points, k): def distance(point): return point[0]**2 + point[1]**2 def partition(left, right, pivot_idx): pivot_dist = distance(points[pivot_idx]) points[pivot_idx], points[right] = points[right], points[pivot_idx] store_idx = left for i in range(left, right): if distance(points[i]) < pivot_dist: points[store_idx], points[i] = points[i], points[store_idx] store_idx += 1 points[right], points[store_idx] = points[store_idx], points[right] return store_idx left, right = 0, len(points) - 1 while left < right: pivot_idx = random.randint(left, right) pivot_idx = partition(left, right, pivot_idx) if pivot_idx == k: break elif pivot_idx < k: left = pivot_idx + 1 else: right = pivot_idx - 1 return points[:k] ``` **Complexity:** O(n) average, O(n²) worst case **Note:** Modifies input array; not stable ### 3. For Repeated Queries: KD-Tree, O(k log n) per query If the points are static and you query often: ```python from scipy.spatial import KDTree # One-time preprocessing: O(n log n) tree = KDTree(points) # Each query: O(k log n) distances, indices = tree.query([0, 0], k=100) closest = [points[i] for i in indices] ``` ### Comparison | Algorithm | Time | Space | Notes | |-----------|------|-------|-------| | Sort | O(n log n) | O(n) | Simple, stable | | Heap | O(n log k) | O(k) | Good when k << n | | Quickselect | O(n) avg | O(1) | Fastest, modifies input | | KD-Tree | O(n log n) build, O(k log n) query | O(n) | Best for many queries | **For your case (10M points, k=100, changing data):** Use Quickselect for best average performance, or Heap if you need stability.
When you have a working solution but need better time or space complexity
- •Look for sorting when you only need partial order, heaps and quickselect often help
- •Consider preprocessing if queries are repeated, trade build time for query speed
4. Caching Strategy
Help me design a caching strategy: **What to cache:** [DESCRIBE THE DATA/COMPUTATION] **Access patterns:** - Read frequency: [HOW OFTEN DATA IS READ] - Write frequency: [HOW OFTEN DATA CHANGES] - Key distribution: [HOT KEYS? EVEN DISTRIBUTION?] - Request patterns: [BURSTY? STEADY?] **Current performance:** - Without cache: [CURRENT LATENCY/COST] - Target: [DESIRED LATENCY] **Data characteristics:** - Size per item: [AVERAGE SIZE] - Total items: [COUNT] - Staleness tolerance: [HOW OLD CAN CACHED DATA BE] **Infrastructure:** - Available: [REDIS / MEMCACHED / IN-MEMORY / CDN] - Constraints: [MEMORY LIMITS, BUDGET] Design: 1. Cache structure (key format, value format) 2. Eviction policy 3. Invalidation strategy 4. Cache warming approach 5. Failure handling
What to cache: User profile data fetched from database Read: 10,000 reads/minute, 90% of reads are for 10% of users Write: 50 updates/minute Without cache: 50ms database query Target: Under 5ms for cache hits Staleness: 1 minute is acceptable Infrastructure: Redis available, 1GB memory budget
## Caching Strategy Design
### 1. Cache Structure
**Key format:**
```
user:profile:{user_id}
```
**Value format:** JSON string of profile data
```json
{
"id": "usr_123",
"name": "Jane Doe",
"email": "jane@example.com",
"avatar_url": "...",
"cached_at": 1705312800
}
```
**Why JSON:** Readable, debuggable, supported by all clients. Consider MessagePack if you need smaller payloads.
### 2. TTL and Eviction
```python
# Set with TTL
redis.setex(
f"user:profile:{user_id}",
60, # 60 second TTL (matches staleness tolerance)
json.dumps(profile)
)
```
**Eviction policy:** `allkeys-lru`
- Given 90% of reads hit 10% of users, LRU naturally keeps hot data
- Cold profiles evict first, hot profiles stay cached
### 3. Invalidation Strategy: Write-Through + TTL
```python
def update_user_profile(user_id, data):
# Update database
db.users.update(user_id, data)
# Immediately update cache
cache_key = f"user:profile:{user_id}"
redis.setex(cache_key, 60, json.dumps(data))
```
**Why write-through:**
- Low write frequency (50/min) means write cost is acceptable
- Guarantees cache is never stale for updated users
- TTL handles cases where invalidation fails
### 4. Read Pattern: Cache-Aside
```python
def get_user_profile(user_id):
cache_key = f"user:profile:{user_id}"
# Try cache first
cached = redis.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss: fetch from database
profile = db.users.find_one(user_id)
if profile:
redis.setex(cache_key, 60, json.dumps(profile))
return profile
```
### 5. Cache Warming
For the hot 10% of users (known from analytics):
```python
def warm_cache():
"""Run on deploy or periodically."""
hot_user_ids = analytics.get_top_users(limit=1000)
for user_id in hot_user_ids:
profile = db.users.find_one(user_id)
redis.setex(f"user:profile:{user_id}", 60, json.dumps(profile))
```
### 6. Failure Handling
```python
def get_user_profile_resilient(user_id):
cache_key = f"user:profile:{user_id}"
try:
cached = redis.get(cache_key)
if cached:
return json.loads(cached)
except redis.RedisError:
# Redis down: fall through to database
logger.warning("Redis unavailable, falling back to DB")
# Database is source of truth
return db.users.find_one(user_id)
```
**Circuit breaker** for sustained Redis failures:
```python
cache_circuit = CircuitBreaker(failure_threshold=5, reset_timeout=30)
@cache_circuit
def get_from_cache(key):
return redis.get(key)
```
### Memory Estimation
- Average profile: ~500 bytes
- Hot users (10%): ~100K users
- Memory needed: 100K × 500B = 50MB
- **Well within 1GB budget** (can cache more if needed)
### Expected Performance
| Scenario | Latency |
|----------|--------|
| Cache hit | <5ms |
| Cache miss (DB query) | ~50ms |
| Expected hit rate | 90%+ (hot users) |
| P95 latency | ~10ms |When database or computation costs dominate latency, and data can tolerate some staleness
- •Start with simple TTL-based caching, only add complexity (invalidation, warming) when needed
- •Monitor hit rates; a cache with low hit rate is just overhead
Common Mistakes to Avoid
Improving without measuring, you might be speeding up code that isn't the bottleneck
Over-improving for scale you don't have, complexity has maintenance cost
Ignoring readability for marginal performance gains in non-critical paths
Frequently Asked Questions
Performance optimization is about making informed tradeoffs, speed vs. memory, readability vs. efficiency, development time vs. runtime. These prompts help you identify real bottlenecks (not imagined ones), understand the cost of different approaches, and implement optimizations that matter. The golden rule: measure first, improve second.
Related Templates
Code Review Prompt Templates
AI prompt templates for thorough code reviews. Get comprehensive feedback on code quality, security, and best practices.
Debugging Prompt Templates
AI prompt templates for debugging code. Identify issues, understand errors, and find solutions faster.
Code Documentation Prompt Templates
AI prompt templates for writing code documentation. Create clear comments, READMEs, and API docs.
Have your own prompt to optimize?