Skip to main content

📝 Latest Blog Post

Python Generators vs Lists: The Memory Management Guide (Master the Yield Keyword)

Python Generators vs Lists: The Memory Management Guide

Python Generators vs Lists: The Memory Management Guide

Have you ever tried to process a large CSV file and had your Python script crash with a `MemoryError`? You are likely using Lists when you should be using Generators.

In Python, Lists are "Eager." When you create a list, Python reserves memory for every single item immediately. If you have a list with 1 million numbers, Python loads 1 million numbers into RAM.

Generators, on the other hand, are "Lazy." They don't store data. They generate one item at a time, on demand, and then forget it. Whether you are processing 10 items or 10 billion items, a Generator uses the exact same tiny amount of RAM.

The Visual: Think of a List as buying a 12-pack of soda and carrying it all at once. Think of a Generator as a vending machine that gives you one soda every time you press a button.

Square Brackets vs. Parentheses

The easiest way to switch is in Comprehensions. Look at the syntax difference:

import sys # LIST COMPREHENSION (Eager) my_list = [x * 2 for x in range(10000)] print(sys.getsizeof(my_list)) # Output: ~87,000 bytes (Large) # GENERATOR EXPRESSION (Lazy) my_gen = (x * 2 for x in range(10000)) print(sys.getsizeof(my_gen)) # Output: ~100 bytes (Tiny)

Notice the output? The Generator takes up almost zero space, regardless of the size of the range. If you changed the range to 10 billion, the list would crash your computer. The generator would still be ~100 bytes.

The Power of `yield`

For more complex logic, we write Generator Functions. Instead of using `return` (which sends back a value and kills the function), we use yield.

When Python sees `yield`, it pauses the function, saves its state, and sends the value to the caller. When the function is called again (via `next()` or a loop), it resumes exactly where it left off.

def infinite_sequence(): num = 0 while True: yield num num += 1 # This loop never crashes memory for i in infinite_sequence(): if i > 50: break print(i)

This function generates infinite numbers. You literally cannot do this with a List, because an infinite list would require infinite RAM.

Comparison: When to Use Which?

Generators aren't always the answer. They have limitations.

Feature List `[]` Generator `()`
Memory Usage High (Stores all data) Very Low (Stores logic only)
Access Random Access (`list[5]`) Sequential Only (`next()`)
Reusability Can iterate multiple times One-time use (Exhausted)
Speed Faster for small data Faster start-up time
The Gotcha: You cannot index a generator. You cannot say `my_gen[10]`. If you need to jump around the data or use it multiple times, use a List. If you just need to loop through it once (like reading lines in a file), use a Generator.

Real World Use Case: Reading Logs

If you are reading a 10GB server log file to find error lines:

  • List approach: lines = file.readlines() tries to load 10GB into RAM. Crash.
  • Generator approach: Iterate over the file object directly. Python lazy-loads one line at a time. Success.

Conclusion

Mastering the `yield` keyword is often the dividing line between a beginner and an intermediate Python developer. It allows you to write scalable code that runs on minimal hardware.

Stop being eager. Be lazy.

Download January Skills: Python Optimization Guide

Comments

🔗 Related Blog Post

🌟 Popular Blog Post