Code Optimization Prompt Templates

AI prompt templates for optimizing code performance. Identify bottlenecks and improve efficiency.

Overview

Performance optimization is about making informed tradeoffs, speed vs. memory, readability vs. efficiency, development time vs. runtime. These prompts help you identify real bottlenecks (not imagined ones), understand the cost of different approaches, and implement optimizations that matter. The golden rule: measure first, improve second.

Best Practices

1

Always include profiling data or measurements, optimization without data is guessing

2

Specify your performance targets (e.g. 'must handle 1000 requests/second')

3

Describe the scale of data being processed, what's slow at 10 items might be fine, at 10M it matters

4

Mention your constraints: Can you use more memory? Add caching infrastructure? Change the algorithm entirely?

5

Include the full hot path code, not just the function you think is slow

Prompt Templates

1. Performance Analysis

Analyze this code for performance issues:

```[LANGUAGE]
[PASTE CODE]
```

**Context:**
- This code: [WHAT IT DOES]
- Called: [HOW OFTEN - once, per request, in a loop]
- Data size: [TYPICAL AND MAXIMUM INPUT SIZES]

**Current performance:**
- Measured time: [IF KNOWN]
- Target time: [WHAT'S ACCEPTABLE]

**Profiling data (if available):**
```
[PASTE PROFILER OUTPUT]
```

**Constraints:**
- Memory budget: [IF LIMITED]
- Can change: [ALGORITHM / DATA STRUCTURES / EXTERNAL SERVICES]
- Cannot change: [ANY CONSTRAINTS]

Identify:
1. Time complexity of current approach
2. Space complexity
3. Specific bottlenecks
4. Optimization opportunities (prioritized by impact)
5. Tradeoffs of each optimization
Language: Python
Code: Function that finds duplicate entries in a list of user records
Called: Once per API request (p95 response time must be <200ms)
Data size: Usually 1000 records, max 50,000
Current: 3 seconds for 50K records
Target: Under 500ms for 50K records
## Performance Analysis

### Current Complexity
- **Time:** O(n²), nested loops comparing every record pair
- **Space:** O(n), storing duplicates list

### Bottleneck Identified

```python
# Current: O(n²) - compares every pair
for i in range(len(records)):
 for j in range(i + 1, len(records)):
 if records[i]['email'] == records[j]['email']: # 2.5B comparisons for 50K
 duplicates.append(records[i])
```

For 50,000 records: ~1.25 billion comparisons.

### Optimizations (by impact)

**1. Use a hash set, O(n²) → O(n)** ⭐ Highest Impact
```python
def find_duplicates(records):
 seen = {}
 duplicates = []
 for record in records:
 email = record['email']
 if email in seen:
 duplicates.append(record)
 else:
 seen[email] = record
 return duplicates
```
- **Time:** O(n), single pass with O(1) hash lookups
- **Space:** O(n), hash table storage
- **Expected:** 50K records in ~10ms

**2. If memory-constrained, sort first**
```python
records.sort(key=lambda r: r['email']) # O(n log n)
for i in range(1, len(records)):
 if records[i]['email'] == records[i-1]['email']:
 duplicates.append(records[i])
```
- **Time:** O(n log n), dominated by sort
- **Space:** O(1) extra (in-place sort)
- **Tradeoff:** Mutates input, slightly slower than hash approach

### Tradeoffs Summary

| Approach | Time | Space | Notes |
|----------|------|-------|-------|
| Current (nested loops) | O(n²) | O(n) | Unusable at scale |
| Hash set | O(n) | O(n) | Best for speed |
| Sort + scan | O(n log n) | O(1) | Best for memory |

**Recommendation:** Use hash set approach. For 50K records:
- Current: ~3,000ms
- With hash set: ~10ms (300x faster)

When code is slower than needed and you want to understand why and how to fix it

  • Focus on the algorithm first (O(n²) to O(n)), then micro-optimizations
  • A 10x data increase on O(n²) is 100x slower, understand your complexity

2. Memory Optimization

Help me reduce memory usage in this code:

```[LANGUAGE]
[PASTE CODE]
```

**Current memory usage:**
- Peak memory: [IF MEASURED]
- Where it's used: [WHAT'S CONSUMING MEMORY]

**Data characteristics:**
- Input size: [HOW MUCH DATA]
- Data types: [STRINGS, NUMBERS, OBJECTS]
- Lifetime: [HOW LONG DATA NEEDS TO STAY IN MEMORY]

**Constraints:**
- Memory budget: [TARGET]
- Can sacrifice: [SPEED / CONVENIENCE / etc.]

Suggest optimizations for:
1. Data structure choices
2. Processing patterns (streaming, chunking)
3. Memory layout and allocation
4. Cleanup and garbage collection
Language: Node.js
Code: Processing a 2GB CSV file to extract statistics
Current: Runs out of memory (4GB limit) reading the whole file
Data: 50M rows, 10 columns, mostly strings and numbers
Target: Process successfully within 4GB memory
Can sacrifice: Speed (within reason)
## Memory Issue Analysis

**Current approach:** Loading entire 2GB file into memory as an array of objects.

**Why it fails:** 
- Raw file: 2GB
- Parsed as JS objects: ~6-8GB (string overhead, object headers, property maps)
- Exceeds 4GB limit

## Optimization Strategies

### 1. Stream Processing (Primary Fix)

Process line by line instead of loading all at once:

```javascript
const fs = require('fs');
const readline = require('readline');
const { pipeline } = require('stream/promises');

async function processCSV(filepath) {
 const stats = {
 count: 0,
 sum: 0,
 //... other aggregations
 };

 const fileStream = fs.createReadStream(filepath);
 const rl = readline.createInterface({
 input: fileStream,
 crlfDelay: Infinity,
 });

 for await (const line of rl) {
 const columns = line.split(',');
 // Process one row at a time
 stats.count++;
 stats.sum += parseFloat(columns[3]);
 // Row is garbage collected after each iteration
 }

 return stats;
}
```

**Memory usage:** ~50MB constant (buffer size) regardless of file size.

### 2. Chunked Processing (If you need batches)

```javascript
async function* readInChunks(filepath, chunkSize = 10000) {
 const rl = readline.createInterface({
 input: fs.createReadStream(filepath),
 });

 let chunk = [];
 for await (const line of rl) {
 chunk.push(parseLine(line));
 if (chunk.length >= chunkSize) {
 yield chunk;
 chunk = []; // Release memory
 }
 }
 if (chunk.length > 0) yield chunk;
}

// Usage
for await (const chunk of readInChunks('data.csv')) {
 await processChunk(chunk); // 10K rows at a time
}
```

### 3. Use Typed Arrays for Numbers

If columns are numeric:

```javascript
// Instead of: const values = [1.5, 2.3, 3.1...]
// Use: Float64Array uses 8 bytes per number vs ~24 for JS number in array
const values = new Float64Array(expectedCount);
```

### 4. Avoid String Retention

```javascript
// Bad: Holds reference to entire line
rows.push({ name: columns[0], value: columns[3] });

// Better: Only keep what you need, explicitly copy
rows.push({ 
 name: columns[0].slice(), // New string, not substring reference
 value: parseFloat(columns[3]) // Number, not string
});
```

### Memory Comparison

| Approach | Peak Memory | Speed |
|----------|-------------|-------|
| Load entire file | 6-8GB (fails) | Fast |
| Stream line by line | ~50MB | Moderate |
| Chunked (10K) | ~200MB | Fast |
| Typed arrays | ~3GB | Fast |

**Recommendation:** Stream processing. You'll use constant memory regardless of file size.

When hitting memory limits, processing large datasets, or running in constrained environments

  • Streaming is almost always the answer for large files, process and discard, don't accumulate
  • JavaScript objects have significant overhead, a 10-byte string in an object costs ~60 bytes

3. Algorithm Optimization

Help me find a faster algorithm for this problem:

**Problem:**
[DESCRIBE WHAT YOU NEED TO SOLVE]

**Current approach:**
```[LANGUAGE]
[CURRENT IMPLEMENTATION]
```

**Current complexity:** [TIME AND SPACE IF KNOWN]

**Input characteristics:**
- Size: [TYPICAL AND MAXIMUM]
- Properties: [SORTED? UNIQUE? BOUNDED VALUES?]
- Access patterns: [RANDOM / SEQUENTIAL / FREQUENT LOOKUPS]

**Performance requirements:**
- Time target: [ACCEPTABLE LATENCY]
- Can preprocess: [YES/NO - can you do upfront work?]
- Space available: [MEMORY CONSTRAINTS]

Suggest:
1. Better algorithms with their tradeoffs
2. Data structures that enable faster operations
3. Preprocessing opportunities
4. Approximations if exact isn't required
Problem: Find the k closest points to origin from a list of n points
Current: Sort all points by distance, take first k
Complexity: O(n log n) time
Input: n = 10 million points, k = 100
Target: Under 1 second
Can preprocess: No, points change each request
## Algorithm Analysis

**Current approach:** Full sort
- Time: O(n log n) = ~230 million operations for 10M points
- Problem: Sorting all 10M points when you only need 100

## Better Algorithms

### 1. Min-Heap / Priority Queue, O(n log k) ⭐ Recommended

Maintain a max-heap of size k (the k smallest seen so far):

```python
import heapq

def k_closest(points, k):
 # Max-heap of size k (negate distance for max-heap behavior)
 heap = []
 
 for x, y in points:
 dist = -(x*x + y*y) # Negative for max-heap
 
 if len(heap) < k:
 heapq.heappush(heap, (dist, x, y))
 elif dist > heap[0][0]: # Closer than the farthest in heap
 heapq.heapreplace(heap, (dist, x, y))
 
 return [(x, y) for (_, x, y) in heap]
```

**Complexity:** O(n log k) = ~66 million operations for n=10M, k=100
**Improvement:** ~3.5x faster than sort

### 2. Quickselect, O(n) average ⭐ Fastest

Partition like quicksort, but only recurse into the side containing k:

```python
import random

def k_closest_quickselect(points, k):
 def distance(point):
 return point[0]**2 + point[1]**2
 
 def partition(left, right, pivot_idx):
 pivot_dist = distance(points[pivot_idx])
 points[pivot_idx], points[right] = points[right], points[pivot_idx]
 store_idx = left
 
 for i in range(left, right):
 if distance(points[i]) < pivot_dist:
 points[store_idx], points[i] = points[i], points[store_idx]
 store_idx += 1
 
 points[right], points[store_idx] = points[store_idx], points[right]
 return store_idx
 
 left, right = 0, len(points) - 1
 while left < right:
 pivot_idx = random.randint(left, right)
 pivot_idx = partition(left, right, pivot_idx)
 
 if pivot_idx == k:
 break
 elif pivot_idx < k:
 left = pivot_idx + 1
 else:
 right = pivot_idx - 1
 
 return points[:k]
```

**Complexity:** O(n) average, O(n²) worst case
**Note:** Modifies input array; not stable

### 3. For Repeated Queries: KD-Tree, O(k log n) per query

If the points are static and you query often:

```python
from scipy.spatial import KDTree

# One-time preprocessing: O(n log n)
tree = KDTree(points)

# Each query: O(k log n)
distances, indices = tree.query([0, 0], k=100)
closest = [points[i] for i in indices]
```

### Comparison

| Algorithm | Time | Space | Notes |
|-----------|------|-------|-------|
| Sort | O(n log n) | O(n) | Simple, stable |
| Heap | O(n log k) | O(k) | Good when k << n |
| Quickselect | O(n) avg | O(1) | Fastest, modifies input |
| KD-Tree | O(n log n) build, O(k log n) query | O(n) | Best for many queries |

**For your case (10M points, k=100, changing data):**
Use Quickselect for best average performance, or Heap if you need stability.

When you have a working solution but need better time or space complexity

  • Look for sorting when you only need partial order, heaps and quickselect often help
  • Consider preprocessing if queries are repeated, trade build time for query speed

4. Caching Strategy

Help me design a caching strategy:

**What to cache:**
[DESCRIBE THE DATA/COMPUTATION]

**Access patterns:**
- Read frequency: [HOW OFTEN DATA IS READ]
- Write frequency: [HOW OFTEN DATA CHANGES]
- Key distribution: [HOT KEYS? EVEN DISTRIBUTION?]
- Request patterns: [BURSTY? STEADY?]

**Current performance:**
- Without cache: [CURRENT LATENCY/COST]
- Target: [DESIRED LATENCY]

**Data characteristics:**
- Size per item: [AVERAGE SIZE]
- Total items: [COUNT]
- Staleness tolerance: [HOW OLD CAN CACHED DATA BE]

**Infrastructure:**
- Available: [REDIS / MEMCACHED / IN-MEMORY / CDN]
- Constraints: [MEMORY LIMITS, BUDGET]

Design:
1. Cache structure (key format, value format)
2. Eviction policy
3. Invalidation strategy
4. Cache warming approach
5. Failure handling
What to cache: User profile data fetched from database
Read: 10,000 reads/minute, 90% of reads are for 10% of users
Write: 50 updates/minute
Without cache: 50ms database query
Target: Under 5ms for cache hits
Staleness: 1 minute is acceptable
Infrastructure: Redis available, 1GB memory budget
## Caching Strategy Design

### 1. Cache Structure

**Key format:**
```
user:profile:{user_id}
```

**Value format:** JSON string of profile data
```json
{
 "id": "usr_123",
 "name": "Jane Doe",
 "email": "jane@example.com",
 "avatar_url": "...",
 "cached_at": 1705312800
}
```

**Why JSON:** Readable, debuggable, supported by all clients. Consider MessagePack if you need smaller payloads.

### 2. TTL and Eviction

```python
# Set with TTL
redis.setex(
 f"user:profile:{user_id}",
 60, # 60 second TTL (matches staleness tolerance)
 json.dumps(profile)
)
```

**Eviction policy:** `allkeys-lru`
- Given 90% of reads hit 10% of users, LRU naturally keeps hot data
- Cold profiles evict first, hot profiles stay cached

### 3. Invalidation Strategy: Write-Through + TTL

```python
def update_user_profile(user_id, data):
 # Update database
 db.users.update(user_id, data)
 
 # Immediately update cache
 cache_key = f"user:profile:{user_id}"
 redis.setex(cache_key, 60, json.dumps(data))
```

**Why write-through:**
- Low write frequency (50/min) means write cost is acceptable
- Guarantees cache is never stale for updated users
- TTL handles cases where invalidation fails

### 4. Read Pattern: Cache-Aside

```python
def get_user_profile(user_id):
 cache_key = f"user:profile:{user_id}"
 
 # Try cache first
 cached = redis.get(cache_key)
 if cached:
 return json.loads(cached)
 
 # Cache miss: fetch from database
 profile = db.users.find_one(user_id)
 if profile:
 redis.setex(cache_key, 60, json.dumps(profile))
 
 return profile
```

### 5. Cache Warming

For the hot 10% of users (known from analytics):

```python
def warm_cache():
 """Run on deploy or periodically."""
 hot_user_ids = analytics.get_top_users(limit=1000)
 
 for user_id in hot_user_ids:
 profile = db.users.find_one(user_id)
 redis.setex(f"user:profile:{user_id}", 60, json.dumps(profile))
```

### 6. Failure Handling

```python
def get_user_profile_resilient(user_id):
 cache_key = f"user:profile:{user_id}"
 
 try:
 cached = redis.get(cache_key)
 if cached:
 return json.loads(cached)
 except redis.RedisError:
 # Redis down: fall through to database
 logger.warning("Redis unavailable, falling back to DB")
 
 # Database is source of truth
 return db.users.find_one(user_id)
```

**Circuit breaker** for sustained Redis failures:
```python
cache_circuit = CircuitBreaker(failure_threshold=5, reset_timeout=30)

@cache_circuit
def get_from_cache(key):
 return redis.get(key)
```

### Memory Estimation

- Average profile: ~500 bytes
- Hot users (10%): ~100K users
- Memory needed: 100K × 500B = 50MB
- **Well within 1GB budget** (can cache more if needed)

### Expected Performance

| Scenario | Latency |
|----------|--------|
| Cache hit | <5ms |
| Cache miss (DB query) | ~50ms |
| Expected hit rate | 90%+ (hot users) |
| P95 latency | ~10ms |

When database or computation costs dominate latency, and data can tolerate some staleness

  • Start with simple TTL-based caching, only add complexity (invalidation, warming) when needed
  • Monitor hit rates; a cache with low hit rate is just overhead

Common Mistakes to Avoid

Improving without measuring, you might be speeding up code that isn't the bottleneck

Over-improving for scale you don't have, complexity has maintenance cost

Ignoring readability for marginal performance gains in non-critical paths

Frequently Asked Questions

Performance optimization is about making informed tradeoffs, speed vs. memory, readability vs. efficiency, development time vs. runtime. These prompts help you identify real bottlenecks (not imagined ones), understand the cost of different approaches, and implement optimizations that matter. The golden rule: measure first, improve second.

Related Templates

Have your own prompt to optimize?