How many bytes is a python list?
Python is a fantastic programming language. It is also known for being pretty slow, due mostly to its enormous flexibility and dynamic features. For many applications and domains, it is not a problem due to their requirements and various optimization techniques. It is less known that Python object graphs (nested dictionaries of lists and tuples and primitive types) take a significant amount of memory. This can be a much more severe limiting factor due to its effects on caching, virtual memory, multi-tenancy with other programs, and in general exhausting the available memory, which is a scarce and expensive resource. Show
It turns out that it is not difficult to figure out how much memory is actually consumed. In this article, I'll walk you through the intricacies of a Python object's memory management and show how to measure the consumed memory accurately. In this article, I focus solely on CPython—the primary implementation of the Python programming language. The experiments and conclusions here don't apply to other Python implementations like IronPython, Jython, and PyPy. Depending on the Python version, the numbers are sometimes a little different (especially for strings, which are always Unicode), but the concepts are the same. In my case, am using Python 3.10. As of 1st January 2020, Python 2 is no longer supported, and you should have already upgraded to Python 3. Hands-On Exploration of Python Memory UsageFirst, let's explore a little bit and get a concrete sense of the actual memory usage of Python objects. The sys.getsizeof() Built-in FunctionThe standard library's sys module provides the Measuring the Memory of Python ObjectsLet's start with some numeric types: import sys sys.getsizeof(5) 28 Interesting. An integer takes 28 bytes. sys.getsizeof(5.3) 24 Hmm… a float takes 24 bytes. from decimal import Decimal sys.getsizeof(Decimal(5.3)) 104 Wow. 104 bytes! This really makes you think about whether you want to represent a large number of real numbers as Let's move on to strings and collections: sys.getsizeof('') 49 sys.getsizeof('1') 50 sys.getsizeof('12') 51 sys.getsizeof('123') 52 sys.getsizeof('1234') 53 OK. An empty string takes 49 bytes, and each additional character adds another byte. That says a lot about the tradeoffs of keeping multiple short strings where you'll pay the 49 bytes overhead for each one vs. a single long string where you pay the overhead only once. The sys.getsizeof(bytes()) 33 Lets look at lists. sys.getsizeof([]) 56 sys.getsizeof([1]) 64 sys.getsizeof([1, 2]) 72 sys.getsizeof([1, 2,3]) 80 sys.getsizeof([1, 2, 3, 4]) 88 sys.getsizeof(['a long longlong string']) 64 What's going on? An empty list takes 56 bytes, but each additional The answer is simple. The list doesn't contain the sys.getsizeof(()) 40 sys.getsizeof((1,)) 48 sys.getsizeof((1,2,)) 56 sys.getsizeof((1,2,3,)) 64 sys.getsizeof((1, 2, 3, 4)) 72 sys.getsizeof(('a long longlong string',)) 48 The story is similar for tuples. The overhead of an empty tuple is 40 bytes vs. the 56 of a list. Again, this 16 bytes difference per sequence is low-hanging fruit if you have a data structure with a lot of small, immutable sequences. sys.getsizeof(set()) 216 sys.getsizeof(set([1)) 216 sys.getsizeof(set([1, 2, 3, 4])) 216 sys.getsizeof({}) 64 sys.getsizeof(dict(a=1)) 232 sys.getsizeof(dict(a=1, b=2, c=3)) 232 Sets and dictionaries ostensibly don't grow at all when you add items, but note the enormous overhead. The bottom line is that Python objects have a huge fixed overhead. If your data structure is composed of a large number of collection objects like strings, lists and dictionaries that contain a small number of items each, you pay a heavy toll. The deep\_getsizeof() FunctionNow that I've scared you half to death and also demonstrated that from collections.abc import Mapping, Container from sys import getsizeof def deep\_getsizeof(o, ids): """Find the memory footprint of a Python object This is a recursive function that drills down a Python object graph like a dictionary holding nested dictionaries with lists of lists and tuples and sets. The sys.getsizeof function does a shallow size of only. It counts each object inside a container as pointer only regardless of how big it really is. :param o: the object :param ids: :return: """ d = deep\_getsizeof if id(o) in ids: return 0 r = getsizeof(o) ids.add(id(o)) if isinstance(o, str) or isinstance(0, str): return r if isinstance(o, Mapping): return r + sum(d(k, ids) + d(v, ids) for k, v in o.iteritems()) if isinstance(o, Container): return r + sum(d(x, ids) for x in o) return r There are several interesting aspects to this function. It takes into account objects that are referenced multiple times and counts them only once by keeping track of object ids. The other interesting feature of the implementation is that it takes full advantage of the
collections module's abstract base classes. That allows the function very concisely to handle any collection that implements either the Mapping or Container base classes instead of dealing directly with myriad collection types like: Let's see it in action: x = '1234567' deep\_getsizeof(x, set()) 56 A string of length 7 takes 56 bytes (49 overhead + 7 bytes for each character). deep\_getsizeof([], set()) 56 An empty list takes 56 bytes (just overhead). deep\_getsizeof([x], set()) 120 A list that contains the string "x" takes 124 bytes (56 + 8 + 56). deep\_getsizeof([x, x, x, x, x], set()) 152 A list that contains the string "x" five times takes 156 bytes (56 + 5\*8 + 56). The last example shows that Treats or TricksIt turns
out that CPython has several tricks up its sleeve, so the numbers you get from Reference CountingPython manages memory using reference counting semantics. Once an object is not referenced anymore, its memory is deallocated. But as long as there is a reference, the object will not be deallocated. Things like cyclical references can bite you pretty hard. Small ObjectsCPython manages small objects (less than 256 bytes) in special pools on 8-byte boundaries. There are pools for 1-8 bytes, 9-16 bytes, and all the way to 249-256 bytes. When an object of size 10 is allocated, it is allocated from the 16-byte pool for objects 9-16 bytes in size. So, even though it contains only 10 bytes of data, it will cost 16 bytes of memory. If you allocate 1,000,000 objects of size 10, you actually use 16,000,000 bytes and not 10,000,000 bytes as you may assume. This 60% extra overhead is obviously not trivial. IntegersCPython keeps a global list of all the integers in the range -5 to 256. This optimization strategy makes sense because small integers pop up all over the place, and given that each integer takes 28 bytes, it saves a lot of memory for a typical program. It also means that CPython pre-allocates 266 * 28 = 7448 bytes for all these integers, even if you don't use most of them. You can verify it by using the Here are a few examples within the range: id(-3) 9788832 id(-3) 9788832 id(-3) 9788832 id(201) 9795360 id(201) 9795360 id(201) 9795360 Here are some examples outside the range: id(257) 140276939034224 id(301) 140276963839696 id(301) 140276963839696 id(-6) 140276963839696 id(-6) 140276963839696 Python Memory vs. System MemoryCPython is kind of possessive. In many cases, when memory objects in your program are not referenced anymore, they are not returned to the system (e.g. the small objects). This is good for your program if you allocate and deallocate many objects that belong to the same 8-byte pool because Python doesn't have to bother the system, which is relatively expensive. But it's not so great if your program normally uses X bytes and under some temporary condition it uses 100 times as much (e.g. parsing and processing a big configuration file only when it starts). Now, that 100X memory may be trapped uselessly in your program, never to be used again and denying the system from allocating it to other programs. The irony is that if you use the processing module to run multiple instances of your program, you'll severely limit the number of instances you can run on a given machine. Memory ProfilerTo gauge and measure the actual memory usage of your program, you can use the
memory\_profiler module. I played with it a little bit and I'm not sure I trust the results. Using it is very simple. You decorate a function (could be the main function) with an from memory\_profiler import profile @profile def main(): a = [] b = [] c = [] for i in range(100000): a.append(5) for i in range(100000): b.append(300) for i in range(100000): c.append('123456789012345678901234567890') del a del b del c print('Done!') if __name__ == '__main__': main() Here is the output: Filename: python_obj.py Line # Mem usage Increment Occurrences Line Contents ============================================================= 3 17.3 MiB 17.3 MiB 1 @profile 4 def main(): 5 17.3 MiB 0.0 MiB 1 a = [] 6 17.3 MiB 0.0 MiB 1 b = [] 7 17.3 MiB 0.0 MiB 1 c = [] 8 18.0 MiB 0.0 MiB 100001 for i in range(100000): 9 18.0 MiB 0.8 MiB 100000 a.append(5) 10 18.7 MiB 0.0 MiB 100001 for i in range(100000): 11 18.7 MiB 0.7 MiB 100000 b.append(300) 12 19.5 MiB 0.0 MiB 100001 for i in range(100000): 13 19.5 MiB 0.8 MiB 100000 c.append('123456789012345678901234567890') 14 18.9 MiB -0.6 MiB 1 del a 15 18.2 MiB -0.8 MiB 1 del b 16 17.4 MiB -0.8 MiB 1 del c 17 18 17.4 MiB 0.0 MiB 1 print('Done!') As you can see, there is 17.3 MB of memory overhead. The reason the memory doesn't increase when adding integers both inside and outside the [-5, 256] range and also when adding the string is that a single object is used in all cases. It's not clear why the first loop of range(100000) on line 9 adds 0.8MB while the second on line 11 adds just 0.7MB and the third loop on line 13 adds 0.8MB. Finally, when deleting the a, b and c lists, -0.6MB is released for a, -0.8MB is released for b, and -0.8MB is released for c. How To Trace Memory Leaks in Your Python application with tracemalloctracemalloc is a Python module that acts as a debug tool to trace memory blocks allocated by Python. Once tracemalloc is enabled, you can obtain the following information :
Consider the example below: import tracemalloc tracemalloc.start() a = [] b = [] c = [] for i in range(100000): a.append(5) for i in range(100000): b.append(300) for i in range(100000): c.append('123456789012345678901234567890') # del a # del b # del c snapshot = tracemalloc.take_snapshot() for stat in snapshot.statistics('lineno'): print(stat) print(stat.traceback.format()) Explanation
When you run the code, the output will be: [' File "python_obj.py", line 13', " c.append('123456789012345678901234567890')"] python_obj.py:11: size=782 KiB, count=1, average=782 KiB [' File "python_obj.py", line 11', ' b.append(300)'] python_obj.py:9: size=782 KiB, count=1, average=782 KiB [' File "python_obj.py", line 9', ' a.append(5)'] python_obj.py:5: size=576 B, count=1, average=576 B [' File "python_obj.py", line 5', ' a = []'] python_obj.py:12: size=28 B, count=1, average=28 B [' File "python_obj.py", line 12', ' for i in range(100000):'] ConclusionCPython uses a lot of memory for its objects. It also uses various tricks and optimizations for memory management. By keeping track of your object's memory usage and being aware of the memory management model, you can significantly reduce the memory footprint of your program. This post has been updated with contributions from Esther Vaati. Esther is a software developer and writer for Envato Tuts+. Did you find this post useful? Principal Software Architect at Helix Gigi Sayfan is a principal software architect at Helix — a bioinformatics and genomics start-up. Gigi has been developing software professionally for more than 20 years in domains as diverse as instant messaging, morphing, chip fabrication process control, embedded multimedia applications for game consoles, brain-inspired machine learning, custom browser development, web services for 3D distributed game platforms, IoT sensors and virtual reality. He has written production code in many programming languages such as Go, Python, C, C++, C#, Java, Delphi, JavaScript, and even Cobol and PowerBuilder for operating systems such as Windows (3.11 through 7), Linux, Mac OSX, Lynx (embedded), and Sony PlayStation. His technical expertise includes databases, low-level networking, distributed systems, unorthodox user interfaces, and the general software development lifecycle. How many bytes does a list take?An empty list takes 56 bytes, but each additional int adds just 8 bytes, where the size of an int is 28 bytes. A list that contains a long string takes just 64 bytes.
How much memory does a Python list take?When you create a list object, the list object by itself takes 64 bytes of memory, and each item adds 8 bytes of memory to the size of the list because of references to other objects.
How do I check memory list size?In order to determine the size of the list, we have passed the list object to the getsizeof() function which on execution return the sizes of list1 and list2 in bytes along with garbage collector overhead. List1 and list2 are occupying 112 bytes and 104 bytes in the memory.
How many bytes is a string Python?Since Python 3, the str type uses Unicode representation. Unicode strings can take up to 4 bytes per character depending on the encoding, which sometimes can be expensive from a memory perspective.
|