Python Data Science Jobs & Interviews

⁉️ Interview question
What happens when you use __enter__ and __exit__ methods in a context manager that opens a file with mode 'r+' but the file is simultaneously being written to by another process using os.fsync()? How does Python’s internal buffering interact with system-level synchronization mechanisms, and what potential race conditions could arise if the file is not properly closed?

When the file is opened in `'r+'` mode, Python's buffered I/O interacts with the OS's `fsync()` call, which forces data to be written to disk immediately. However, if another process calls `fsync()` while the Python context manager is still active, the buffer might contain stale or partially written data, leading to inconsistent reads. The `__exit__` method may flush the buffer before closing, but if the external process has already synced, the file content can become corrupted due to overlapping write operations. This scenario highlights the importance of using atomic operations or file locks (e.g., `fcntl`) when sharing files across processes.

#️⃣ tags: #Python #AdvancedPython #FileHandling #ContextManager #Multithreading #RaceCondition #OSInteraction #Buffering #Synchronization #ProgrammingInterview

By: t.iss.one/DataScienceQ

🚀

Please open Telegram to view this post

VIEW IN TELEGRAM

235 viewsedited 09:14

Python Data Science Jobs & Interviews

Q: How can you implement a thread-safe, connection-pooling mechanism using Python's sqlite3 with concurrent.futures.ThreadPoolExecutor, while ensuring atomic transactions and handling database schema migrations dynamically? Provide a complete example with error handling and logging.

A:

import sqlite3
import threading
import logging
from concurrent.futures import ThreadPoolExecutor, as_completed
from contextlib import contextmanager
import os
import time

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Database path
DB_PATH = "example.db"

# Schema definition
SCHEMA = """
CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    email TEXT UNIQUE NOT NULL
);
"""

# Connection pool with threading
class DatabaseConnectionPool:
    def __init__(self, db_path, max_connections=5):
        self.db_path = db_path
        self.max_connections = max_connections
        self._connections = []
        self._lock = threading.Lock()

    def get_connection(self):
        with self._lock:
            if self._connections:
                return self._connections.pop()
            else:
                return sqlite3.connect(self.db_path)

    def release_connection(self, conn):
        with self._lock:
            if len(self._connections) < self.max_connections:
                self._connections.append(conn)
            else:
                conn.close()

    def close_all(self):
        with self._lock:
            for conn in self._connections:
                conn.close()
            self._connections.clear()

@contextmanager
def get_db_connection(pool):
    conn = pool.get_connection()
    try:
        yield conn
    except Exception as e:
        conn.rollback()
        logger.error(f"Database error: {e}")
        raise
    finally:
        pool.release_connection(conn)

def execute_transaction(pool, query, params=None):
    with get_db_connection(pool) as conn:
        cursor = conn.cursor()
        cursor.execute(query, params or ())
        conn.commit()

def create_user(pool, name, email):
    query = "INSERT INTO users (name, email) VALUES (?, ?)"
    try:
        execute_transaction(pool, query, (name, email))
        logger.info(f"User {name} created.")
    except sqlite3.IntegrityError:
        logger.warning(f"Email {email} already exists.")

def fetch_users(pool):
    query = "SELECT id, name, email FROM users"
    with get_db_connection(pool) as conn:
        cursor = conn.cursor()
        cursor.execute(query)
        return cursor.fetchall()

def schema_migration(pool, new_schema):
    with get_db_connection(pool) as conn:
        cursor = conn.cursor()
        cursor.executescript(new_schema)
        conn.commit()
        logger.info("Schema migration applied.")

# Example usage
if __name__ == "__main__":
    # Initialize pool
    pool = DatabaseConnectionPool(DB_PATH)

    # Apply schema
    schema_migration(pool, SCHEMA)

    # Simulate concurrent user creation
    names_emails = [("Alice", "[email protected]"), ("Bob", "[email protected]")]

    with ThreadPoolExecutor(max_workers=4) as executor:
        futures = [
            executor.submit(create_user, pool, name, email)
            for name, email in names_emails
        ]
        for future in as_completed(futures):
            try:
                future.result()
            except Exception as e:
                logger.error(f"Task failed: {e}")

    # Fetch results
    users = fetch_users(pool)
    logger.info(f"Users: {users}")

    # Cleanup
    pool.close_all()

#Python #SQLite #Database #Multithreading #ThreadSafety #ConnectionPooling #AtomicTransactions #SchemaMigration #Concurrency #Programming #AdvancedPython

By: @DataScienceQ 🚀

371 viewsedited 21:41

Python Data Science Jobs & Interviews

Question:
How can you use Python’s asyncio and concurrent.futures to efficiently handle both I/O-bound and CPU-bound tasks in a single application, and what are the best practices for structuring such a system?

Answer:
To efficiently handle both I/O-bound (e.g., network requests, file I/O) and CPU-bound (e.g., data processing, math operations) tasks in Python, you should combine asyncio for I/O-bound work and concurrent.futures.ThreadPoolExecutor or ProcessPoolExecutor for CPU-bound tasks. This avoids blocking the event loop and maximizes performance.

Here’s an example:

import asyncio
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import aiohttp
import requests

# Simulated I/O-bound task (e.g., API call)
async def fetch_url(session, url):
    try:
        async with session.get(url) as response:
            return await response.text()
    except Exception as e:
        return f"Error: {e}"

# Simulated CPU-bound task (e.g., heavy computation)
def cpu_intensive_task(n):
    return sum(i * i for i in range(n))

# Main function using asyncio + thread/process pools
async def main():
    # I/O-bound tasks with asyncio
    urls = [
        "https://httpbin.org/json",
        "https://httpbin.org/headers",
        "https://httpbin.org/status/200"
    ]

    # Use aiohttp for concurrent HTTP requests
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url(session, url) for url in urls]
        results = await asyncio.gather(*tasks)

    print("I/O-bound results:", results)

    # CPU-bound tasks with ProcessPoolExecutor
    with ProcessPoolExecutor() as executor:
        # Run CPU-intensive work in separate processes
        futures = [executor.submit(cpu_intensive_task, 1000000) for _ in range(3)]
        cpu_results = [future.result() for future in futures]

    print("CPU-bound results:", cpu_results)

# Run the async main function
if __name__ == "__main__":
    asyncio.run(main())

Explanation:
- asyncio handles I/O-bound tasks asynchronously without blocking the main thread.
- aiohttp is used for efficient HTTP requests.
- ProcessPoolExecutor runs CPU-heavy functions in separate processes (bypassing GIL).
- Mixing both ensures optimal resource usage: async for I/O, multiprocessing for CPU.

Best practices:
- Use ThreadPoolExecutor for light I/O or blocking code.
- Use ProcessPoolExecutor for CPU-intensive work.
- Avoid mixing async and blocking code directly — always offload CPU tasks.
- Use asyncio.gather() to run multiple coroutines concurrently.

#Python #AsyncIO #Concurrency #Multithreading #Multiprocessing #AdvancedPython #Programming #WebDevelopment #Performance

By: @DataScienceQ 🚀

❤1

584 viewsedited 04:19

Python Data Science Jobs & Interviews

#Python #InterviewQuestion #Concurrency #Threading #Multithreading #Programming #IntermediateLevel

Question: How can you use threading in Python to speed up I/O-bound tasks, such as fetching data from multiple URLs simultaneously, and what are the key considerations when using threads?

Answer:

To speed up I/O-bound tasks like fetching data from multiple URLs, you can use Python's threading module to perform concurrent operations. This is effective because threads can wait for I/O (like network requests) without blocking the entire program.

Here’s a detailed example using threading and requests:

import threading
import requests
from time import time

# List of URLs to fetch
urls = [
    'https://httpbin.org/json',
    'https://api.github.com/users/octocat',
    'https://jsonplaceholder.typicode.com/posts/1',
    'https://www.google.com',
]

# Shared list to store results
results = []
lock = threading.Lock()  # To safely append to shared list

def fetch_url(url: str):
    """Fetches a URL and stores the response text."""
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        with lock:
            results.append({
                'url': url,
                'status': response.status_code,
                'length': len(response.text)
            })
    except Exception as e:
        with lock:
            results.append({
                'url': url,
                'status': 'Error',
                'error': str(e)
            })

def fetch_urls_concurrently():
    """Fetches all URLs using multiple threads."""
    start_time = time()

    # Create a thread for each URL
    threads = []
    for url in urls:
        thread = threading.Thread(target=fetch_url, args=(url,))
        threads.append(thread)
        thread.start()

    # Wait for all threads to complete
    for thread in threads:
        thread.join()

    end_time = time()
    print(f"Time taken: {end_time - start_time:.2f} seconds")
    print("Results:")
    for result in results:
        print(result)

if __name__ == "__main__":
    fetch_urls_concurrently()

### Explanation:
- **threading.Thread**: Creates a new thread for each URL.
- **target**: The function to run in the thread (fetch_url).
- **args**: Arguments passed to the target start() **start()**: Begins execution of thjoin()- **join()**: Waits for the thread to finish before coLock.
- **Lock**: Ensures safe access to shared resources (like results) to avoid race conditions.

### Key ConsidGIL (Global Interpreter Lock)eter Lock)**: Python’s GIL limits true parallelism for CPU-bound tasks, but threads work well for I/O-bouThread Safetyead Safety**: Use locks or queues when sharing data betweenOverhead**Overhead**: Creating too many threads can degrade perTimeouts**Timeouts**: Always set timeouts to avoid hanging on slow responses.

This pattern is commonly used in web scraping, API clients, and backend services handling multiple external calls efficiently.

By: @DataScienceQ 🚀

❤1

450 viewsedited 18:19

About

Blog

Apps

Platform