In this article, we will discuss how to achieve multi-threading in Python(Python 3.12 and lower) and test the latest free-threaded experimental feature available in Python 3.13.
Python and GIL (Global Interpreter Lock)
The Global Interpreter Lock (GIL) in Python is a mutex that ensures only one thread executes Python bytecode at a time. Because of GIL, we cannot run more than one thread at a time in Python Code. Let’s understand different Python implementations and GIL.
Python has several implementations :
- CPython – The default and most widely used implementation of Python written in C
- Jython – Python implementation written in Java
- IronPython – A version of Python that is integrated with the .NET framework.
- PyPy – A Python implementation written in Python itself (Just-in-time (JIT) compiler to improve the execution speed)
Out of these implementations, the following Python implementations have GIL:
- CPython
- PyPy
Since Jython and IronPython have no GIL, there is no issue with multithreading. Here we only discuss the default CPython implementation and multi-threading.
Types of tasks
Let’s understand the types of tasks Python code needs to execute:
- CPU bound tasks
- IO bound tasks
- Asynchronous tasks
1. CPU-bound tasks
CPU-bound tasks require significant computing power. It depends on the processing speed of the CPU, for example :
- Mathematical computations
- Data processing – sorting data or image processing
2. IO-bound tasks
I/O-bound tasks are operations where the execution time is primarily determined by input/output operations rather than CPU processing speed, for example :
- Network requests – Making HTTP requests
- File operations – Reading or writing to a file
- Database queries – fetching data from the database
3. Asynchronous Tasks
Python also supports asynchronous programming. It is particularly suited for I/O-bound tasks. Using libraries like asyncio, developers can write non-blocking code that allows other tasks to run while waiting for I/O operations to complete.
Multithreading in Python
Let’s write some code to test multithreading in Python. The following Python codes were tested on the below environment :
- Processor: Intel Core i7-12700
- Operating system: Windows 11, 64-bit
We are going to use Python 3.13 for this code execution. Let’s write a CPU-bound task to calculate the nth Fibonacci number. Let’s calculate 38th Fibonacci number:
import time
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
start = time.time()
fibonacci(38)
elapsed = time.time() - start
print(f"Elapsed time : {elapsed:.2f}s")
Elapsed time : 3.24s
We can see that the elapsed time is 3.24 seconds for Fibonacci calculation. Next, we do the same calculation on two parallel threads using ThreadPoolExecutor.
from concurrent.futures import ThreadPoolExecutor
import time
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
def compute_fibonacci(num):
start = time.time()
fibonacci(num)
elapsed = time.time() - start
print(f"Elapsed time : {elapsed:.2f}s")
if __name__ == "__main__":
numbers = [38, 38]
with ThreadPoolExecutor() as executor:
for num in numbers:
executor.submit(
compute_fibonacci,
num,
)
The output of the above code :
Elapsed time : 6.20s
Elapsed time : 6.23s
We can see that each thread took 6.2 seconds for the calculation. Due to GIL, we are not getting any parallel execution, and we need to wait for 6.2 seconds for both results.
Process-based Parallelism
In Python, process-based parallelism does not directly achieve multithreading; rather, it is an alternative approach to achieve parallel execution by creating multiple processes. Let’s do the same compuation using ProcessPoolExecutor.
from concurrent.futures import ProcessPoolExecutor
import time
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
def compute_fibonacci(num):
start = time.time()
fibonacci(num)
elapsed = time.time() - start
print(f"Elapsed time : {elapsed:.2f}s")
if __name__ == "__main__":
numbers = [38, 38]
with ProcessPoolExecutor() as executor:
for num in numbers:
executor.submit(
compute_fibonacci,
num,
)
The output of the above script is :
Elapsed time : 3.24s
Elapsed time : 3.24s
Now the Fibonacci calculation happens in parallel through Process-based parallelism.
Free threaded Python3.13
I’ve tried the latest experimental free-threaded Python 3.13 which removes GIL limitation. Here is the result using ThreadPoolExecutor using the above Fibonacci code in free-threaded Python.
Elapsed time : 5.49s
Elapsed time : 5.50s
We can see that we got speed improvement. However, this free threading has been observed to significantly slow down single-threaded performance(3.2 to 5.4 seconds), while potentially improving multi-threaded execution(6.2 to 5.4 seconds).
Conclusion
Due to GIL limitation, process-based parallelism can be used for the CPU-bound tasks to execute parallelly. At the time of this writing, the Python 3.13 freethreaded feature is experimental and is not ready for production use. In the future, we can expect full multithreading support in Python.