Python Data Science Jobs & Interviews

#Python #InterviewQuestion #Concurrency #Threading #Multithreading #Programming #IntermediateLevel

Question: How can you use threading in Python to speed up I/O-bound tasks, such as fetching data from multiple URLs simultaneously, and what are the key considerations when using threads?

Answer:

To speed up I/O-bound tasks like fetching data from multiple URLs, you can use Python's threading module to perform concurrent operations. This is effective because threads can wait for I/O (like network requests) without blocking the entire program.

Here’s a detailed example using threading and requests:

import threading
import requests
from time import time

# List of URLs to fetch
urls = [
    'https://httpbin.org/json',
    'https://api.github.com/users/octocat',
    'https://jsonplaceholder.typicode.com/posts/1',
    'https://www.google.com',
]

# Shared list to store results
results = []
lock = threading.Lock()  # To safely append to shared list

def fetch_url(url: str):
    """Fetches a URL and stores the response text."""
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        with lock:
            results.append({
                'url': url,
                'status': response.status_code,
                'length': len(response.text)
            })
    except Exception as e:
        with lock:
            results.append({
                'url': url,
                'status': 'Error',
                'error': str(e)
            })

def fetch_urls_concurrently():
    """Fetches all URLs using multiple threads."""
    start_time = time()

    # Create a thread for each URL
    threads = []
    for url in urls:
        thread = threading.Thread(target=fetch_url, args=(url,))
        threads.append(thread)
        thread.start()

    # Wait for all threads to complete
    for thread in threads:
        thread.join()

    end_time = time()
    print(f"Time taken: {end_time - start_time:.2f} seconds")
    print("Results:")
    for result in results:
        print(result)

if __name__ == "__main__":
    fetch_urls_concurrently()

### Explanation:
- **threading.Thread**: Creates a new thread for each URL.
- **target**: The function to run in the thread (fetch_url).
- **args**: Arguments passed to the target start() **start()**: Begins execution of thjoin()- **join()**: Waits for the thread to finish before coLock.
- **Lock**: Ensures safe access to shared resources (like results) to avoid race conditions.

### Key ConsidGIL (Global Interpreter Lock)eter Lock)**: Python’s GIL limits true parallelism for CPU-bound tasks, but threads work well for I/O-bouThread Safetyead Safety**: Use locks or queues when sharing data betweenOverhead**Overhead**: Creating too many threads can degrade perTimeouts**Timeouts**: Always set timeouts to avoid hanging on slow responses.

This pattern is commonly used in web scraping, API clients, and backend services handling multiple external calls efficiently.

By: @DataScienceQ 🚀

❤1

441 viewsedited 18:19

About

Blog

Apps

Platform