Python Data Science Jobs & Interviews

❔ Interview question

What does the following code do?

import os
os.makedirs("folder/subfolder", exist_ok=True)

Answer:

Creates a directory named 'folder' with a subdirectory 'subfolder' if it doesn't already exist

tags: #python #os #filehandling #coding #programming #directory #makedirs #dev

By: t.iss.one/DataScienceQ 🚀

❤1

254 viewsedited 08:27

Python Data Science Jobs & Interviews

⁉️ Interview question
What happens when you use __enter__ and __exit__ methods in a context manager that opens a file with mode 'r+' but the file is simultaneously being written to by another process using os.fsync()? How does Python’s internal buffering interact with system-level synchronization mechanisms, and what potential race conditions could arise if the file is not properly closed?

When the file is opened in `'r+'` mode, Python's buffered I/O interacts with the OS's `fsync()` call, which forces data to be written to disk immediately. However, if another process calls `fsync()` while the Python context manager is still active, the buffer might contain stale or partially written data, leading to inconsistent reads. The `__exit__` method may flush the buffer before closing, but if the external process has already synced, the file content can become corrupted due to overlapping write operations. This scenario highlights the importance of using atomic operations or file locks (e.g., `fcntl`) when sharing files across processes.

#️⃣ tags: #Python #AdvancedPython #FileHandling #ContextManager #Multithreading #RaceCondition #OSInteraction #Buffering #Synchronization #ProgrammingInterview

By: t.iss.one/DataScienceQ

🚀

Please open Telegram to view this post

VIEW IN TELEGRAM

232 viewsedited 09:14

Python Data Science Jobs & Interviews

⁉️ Interview question
How does Python’s mmap module behave when mapping a file that is concurrently being truncated by another process using os.ftruncate()? What are the implications for memory safety, and under what conditions might this lead to segmentation faults or undefined behavior?

When a file is mapped via `mmap` and simultaneously truncated by another process, the virtual memory pages remain valid until accessed. However, if the mapped region refers to data beyond the new file size, accessing those pages results in undefined behavior, potentially causing segmentation faults. The operating system may not immediately invalidate the mappings, leading to crashes or data corruption. This scenario highlights the need for synchronization mechanisms like file locks or signals to ensure safe concurrent access

#️⃣ tags: #Python #AdvancedPython #FileHandling #MemoryMapping #mmap #ConcurrentProgramming #OS #SystemCalls #UndefinedBehavior #SegmentationFault #FileLocking

By: t.iss.one/DataScienceQ

🚀

Please open Telegram to view this post

VIEW IN TELEGRAM

189 viewsedited 09:16

Python Data Science Jobs & Interviews

⁉️ Interview question
What happens when you use os.fdopen() to wrap a file descriptor that was opened with O_DIRECT flag on a Linux system, and then attempt to read or write using Python’s buffered I/O? How does this affect data consistency and performance?

When a file descriptor opened with `O_DIRECT` is wrapped by `os.fdopen()`, Python’s buffered I/O may interfere with the direct I/O semantics because it uses its own internal buffer. This can lead to data being copied through the kernel’s page cache, effectively bypassing the `O_DIRECT` requirement for direct memory-to-disk transfers. As a result, performance gains from `O_DIRECT` are lost, and data consistency may be compromised if the buffer isn’t flushed properly. Additionally, misaligned memory access due to Python’s buffering can cause crashes or undefined behavior.

#️⃣ tags: #Python #AdvancedPython #FileHandling #OS #Linux #O_DIRECT #BufferedIO #SystemCalls #Performance #DataConsistency #LowLevelProgramming

By: t.iss.one/DataScienceQ 🚀

Python Data Science Jobs & Interviews

Your go-to hub for Python and Data Science—featuring questions, answers, quizzes, and interview tips to sharpen your skills and boost your career in the data-driven world.

Admin: @Hussein_Sheikho

193 viewsedited 09:17

Python Data Science Jobs & Interviews

⁉️ Interview question
Can you explain the behavior of Python’s shutil.copyfile() when copying a file that is currently being written to by another process, and how does the underlying system call interact with file locks and inodes? What happens if the source file is deleted during the copy?

When `shutil.copyfile()` copies a file that's actively being written to, it reads the file at the moment the system call opens it. If the source file is deleted during the copy, the file may still be accessible as long as it remains open by the writing process due to Unix-like filesystem semantics (file deletion doesn't free inode until all references are closed). However, the copy operation might fail or produce incomplete data if the file size changes dramatically during the read. Additionally, if the source uses mandatory locking, the copy could be blocked or result in EACCES errors.

#️⃣ tags: #Python #AdvancedPython #FileHandling #shutil #SystemCalls #FileLocks #Inodes #Unix #ConcurrentWriting #CopyOperation #FileDeletion

By: t.iss.one/DataScienceQ 🚀

167 viewsedited 09:20

Python Data Science Jobs & Interviews

⁉️ Interview question
What happens when you use os.link() to create a hard link to a file that is already open in write mode by another process, and how does this affect the file’s inode reference count, data integrity, and potential for race conditions during deletion?

Creating a hard link via `

os.link

()` increases the inode reference count, meaning the file won’t be deleted until all links are removed. However, if the original file is being written to, the new link points to the same underlying data blocks. If the original file is truncated or deleted while the link exists, the data remains accessible through the link until all processes close it. This can lead to data inconsistency if the writing process modifies the file size but the link still references old data. Additionally, concurrent operations on the same inode without proper synchronization may cause corruption or unexpected behavior.

#️⃣ tags: #Python #AdvancedPython #FileHandling #HardLink #Inode #OS #RaceCondition #DataIntegrity #FileOperations #SystemCalls #Linux #FileDeletion

By: t.iss.one/DataScienceQ 🚀

175 viewsedited 09:21

Python Data Science Jobs & Interviews

⁉️ Interview question
What happens when you open a file in Python using the mode `'r+b'` and immediately attempt to write to it without seeking to the end, assuming the file already exists and contains data?

😝 Answer:

When you open a file in `'r+b'` mode, you're opening it for both reading and writing in binary format. However, if you don't seek to the end of the file before writing, your writes will **overwrite existing data at the current file position**, which is typically the beginning unless you've moved the cursor. This can corrupt the original content, especially if the new data is larger than the portion being overwritten. The key insight is that **the file pointer starts at the beginning**, so even though the file was opened for reading, writing begins from the start unless explicitly moved. Additionally, this behavior may raise `OSError` or `IOError` if the file is locked or permissions are denied, but more commonly results in silent data corruption.

#️⃣ tags: #Python #AdvancedPython #FileHandling #BinaryFiles #FilePointer #DataCorruption #InterviewQuestion

By: t.iss.one/DataScienceQ 🚀

195 viewsedited 09:22

Python Data Science Jobs & Interviews

#Python #InterviewQuestion #DataProcessing #FileHandling #Programming #IntermediateLevel

Question: How can you efficiently process large CSV files in Python without loading the entire file into memory, and what are the best practices for handling such scenarios?

Answer:

To process large CSV files efficiently in Python without loading the entire file into memory, you can use generators or stream the data line by line. This approach is especially useful when working with files that exceed available RAM.

Here’s a detailed example using csv module and generator patterns:

import csv
from typing import Dict, Generator

def read_csv_large_file(file_path: str) -> Generator[Dict, None, None]:
    """
    Generator function to read a large CSV file line by line.
    Yields one row at a time as a dictionary.
    """
    with open(file_path, mode='r', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        for row in reader:
            yield row

def process_large_csv(file_path: str, threshold: int):
    """
    Process a large CSV file, filtering rows based on a condition.
    Example: Only process rows where 'age' > threshold.
    """
    total_processed = 0
    valid_rows = []

    for row in read_csv_large_file(file_path):
        try:
            age = int(row['age'])
            if age > threshold:
                valid_rows.append(row)
                total_processed += 1
                # Optional: process row immediately instead of storing
                # print(f"Processing: {row}")
        except (ValueError, KeyError):
            continue  # Skip invalid or missing age fields

    print(f"Total valid rows processed: {total_processed}")
    return valid_rows

# Example usage
if __name__ == "__main__":
    file_path = 'large_data.csv'
    result = process_large_csv(file_path, threshold=30)
    print("Processing complete.")

### Explanation:
- **csv.DictReader**: Reads each line of the CSV as a dictionary, allowing access by column name.
- **Generator (read_csv_large_file)**: Yields one row at a time, avoiding memory overMemory Efficiencyciency**: No need to load all data into memory; only one row is held at a Error Handlingndling**: Skips malformed or missing data gracefScalabilitybility**: Suitable for gigabyte-sized files.

This technique is essential in data engineering and analytics roles, where performance and memory efficiency are critical.

By: @DataScienceQ 🚀

343 viewsedited 18:19

About

Blog

Apps

Platform