Python | Algorithms | Data Structures | Cyber Security

Python | Algorithms | Data Structures | Cyber Security | Networks

Topic: Python – Reading Images from Datasets and Organizing Them

---

1. Reading Images from Folder Structure

Assuming your dataset folder looks like this:

dataset/
   class1/
      img1.jpg
      img2.jpg
   class2/
      img3.jpg
      img4.jpg

You can use Python libraries like os, OpenCV, or PIL to read and organize images by their classes.

---

2. Code Example Using OpenCV

import os
import cv2

dataset_path = "dataset"
data = []
labels = []

for class_name in os.listdir(dataset_path):
    class_dir = os.path.join(dataset_path, class_name)
    if os.path.isdir(class_dir):
        for img_name in os.listdir(class_dir):
            img_path = os.path.join(class_dir, img_name)
            img = cv2.imread(img_path)
            if img is not None:
                data.append(img)
                labels.append(class_name)

print(f"Total images: {len(data)}")
print(f"Total labels: {len(labels)}")

---

3. Optional: Resize Images for Uniformity

target_size = (128, 128)
resized_img = cv2.resize(img, target_size)

Use this inside the loop before appending img to data.

---

4. Using PIL (Pillow) Instead of OpenCV

from PIL import Image

img = Image.open(img_path)
img = img.resize((128, 128))
img_array = np.array(img)

---

5. Organizing Images in a Dictionary

dataset_dict = {}

for class_name in os.listdir(dataset_path):
    class_dir = os.path.join(dataset_path, class_name)
    if os.path.isdir(class_dir):
        dataset_dict[class_name] = []
        for img_name in os.listdir(class_dir):
            img_path = os.path.join(class_dir, img_name)
            img = cv2.imread(img_path)
            if img is not None:
                dataset_dict[class_name].append(img)

---

6. Summary

• Use os.listdir() to iterate dataset directories.

• Read images with cv2.imread() or PIL.Image.open().

• Resize images to a uniform shape for model input.

• Store images and labels in lists or dictionaries for easy access.

---

Exercise

• Extend the code to save the loaded images and labels as numpy arrays for faster loading in the future.

---

#Python #ImageProcessing #DatasetHandling #OpenCV #PIL

https://t.iss.one/DataScience4

❤3

1.44K views08:09

Python | Algorithms | Data Structures | Cyber Security | Networks

Topic: Python – Reading Images from Datasets and Organizing Them (Part 2): Using PyTorch and TensorFlow Data Loaders

---

1. Using PyTorch’s `ImageFolder` and `DataLoader`

PyTorch provides an easy way to load image datasets organized in folders by classes.

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define transformations (resize, normalize, convert to tensor)
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

dataset = datasets.ImageFolder(root='dataset/', transform=transform)

# Create DataLoader for batching and shuffling
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Access class names
class_names = dataset.classes
print(class_names)

---

2. Iterating Through DataLoader

for images, labels in dataloader:
    print(images.shape)  # (batch_size, 3, 128, 128)
    print(labels)
    # Use images and labels for training or validation
    break

---

3. Using TensorFlow `image_dataset_from_directory`

TensorFlow Keras also provides utilities for loading datasets organized in folders.

import tensorflow as tf

dataset = tf.keras.preprocessing.image_dataset_from_directory(
    'dataset/',
    image_size=(128, 128),
    batch_size=32,
    label_mode='int'  # can be 'categorical', 'binary', or None
)

class_names = dataset.class_names
print(class_names)

for images, labels in dataset.take(1):
    print(images.shape)
    print(labels)

---

4. Dataset Splitting

You can split datasets into training and validation sets easily:

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    'dataset/',
    validation_split=0.2,
    subset="training",
    seed=123,
    image_size=(128, 128),
    batch_size=32
)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    'dataset/',
    validation_split=0.2,
    subset="validation",
    seed=123,
    image_size=(128, 128),
    batch_size=32
)

---

5. Summary

• PyTorch’s ImageFolder + DataLoader offers a quick way to load and batch datasets.

• TensorFlow’s image\_dataset\_from\_directory provides similar high-level dataset loading.

• Both allow easy transformations, batching, and shuffling.

---

Exercise

• Write code to normalize images in TensorFlow dataset using map() with Rescaling.

---

#Python #DatasetHandling #PyTorch #TensorFlow #ImageProcessing

https://t.iss.one/DataScience4

❤4

1.41K views11:10

Python | Algorithms | Data Structures | Cyber Security | Networks

Topic: Python – Reading Images from Datasets and Organizing Them (Part 3): Custom Dataset Class and Data Augmentation

---

1. Creating a Custom Dataset Class (PyTorch)

Sometimes you need more control over how images and labels are loaded and processed. You can create a custom dataset class by extending torch.utils.data.Dataset.

import os
from PIL import Image
from torch.utils.data import Dataset

class CustomImageDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.image_paths = []
        self.labels = []
        self.class_to_idx = {}

        classes = sorted(os.listdir(root_dir))
        self.class_to_idx = {cls_name: idx for idx, cls_name in enumerate(classes)}

        for cls_name in classes:
            cls_dir = os.path.join(root_dir, cls_name)
            for img_name in os.listdir(cls_dir):
                img_path = os.path.join(cls_dir, img_name)
                self.image_paths.append(img_path)
                self.labels.append(self.class_to_idx[cls_name])

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert("RGB")
        label = self.labels[idx]

        if self.transform:
            image = self.transform(image)

        return image, label

---

2. Using Data Augmentation with `transforms`

Data augmentation helps improve model generalization by artificially increasing dataset diversity.

from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

Pass this transform to the custom dataset:

dataset = CustomImageDataset(root_dir='dataset/', transform=transform)

---

3. Loading Dataset with DataLoader

from torch.utils.data import DataLoader

dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

---

4. Summary

• Custom dataset classes offer flexibility in how data is loaded and labeled.

• Data augmentation techniques such as flipping and rotation can be applied using torchvision transforms.

• Use DataLoader for batching and shuffling during training.

---

Exercise

• Extend the custom dataset to handle grayscale images and apply a random brightness adjustment transform.

---

#Python #DatasetHandling #PyTorch #DataAugmentation #ImageProcessing

https://t.iss.one/DataScience4

❤2

1.53K views16:12

Python | Algorithms | Data Structures | Cyber Security | Networks

Topic: 20 Important Python Questions on Reading and Organizing Images from Datasets

---

1. How can you read images from a directory using Python?
Use libraries like OpenCV (cv2.imread) or PIL (Image.open).

2. How do you organize images by class labels if they are stored in subfolders?
Iterate over each subfolder, treat folder names as labels, and map images accordingly.

3. What is the difference between OpenCV and PIL for image reading?
OpenCV reads images in BGR format and uses NumPy arrays; PIL uses RGB and has more image manipulation utilities.

4. How do you resize images before feeding them to a model?
Use cv2.resize() or PIL’s resize() method.

5. What is a good practice to handle different image sizes in datasets?
Resize all images to a fixed size or use data loaders that apply transformations.

6. How to convert images to NumPy arrays?
In OpenCV, images are already NumPy arrays; with PIL, use np.array(image).

7. How do you normalize images?
Scale pixel values, typically to \[0,1] by dividing by 255 or standardize with mean and std.

8. How can you load large datasets efficiently?
Use generators or data loaders to load images batch-wise instead of loading all at once.

9. What is `torchvision.datasets.ImageFolder`?
A PyTorch utility to load images from a directory with subfolders as class labels.

10. How do you apply transformations and augmentations during image loading?
Use torchvision.transforms or TensorFlow preprocessing layers.

11. How can you split datasets into training and validation sets?
Use libraries like sklearn.model_selection.train_test_split or parameters in dataset loaders.

12. How do you handle corrupted or unreadable images during loading?
Use try-except blocks to catch exceptions and skip those files.

13. How do you batch images for training deep learning models?
Use DataLoader in PyTorch or TensorFlow datasets with batching enabled.

14. What are common image augmentations used during training?
Flips, rotations, scaling, cropping, color jittering, and normalization.

15. How do you convert labels (class names) to numeric indices?
Create a mapping dictionary from class names to indices.

16. How can you visualize images and labels after loading?
Use matplotlib’s imshow() and print labels alongside.

17. How to read images in grayscale?
With OpenCV: cv2.imread(path, cv2.IMREAD_GRAYSCALE).

18. How to save processed images after loading?
Use cv2.imwrite() or PIL.Image.save().

19. How do you organize dataset information (images and labels) in Python?
Use lists, dictionaries, or pandas DataFrames.

20. How to handle imbalanced datasets?
Use class weighting, oversampling, or undersampling techniques during data loading.

---

Summary

Mastering image loading and organization is fundamental for effective data preprocessing in computer vision projects.

---

#Python #ImageProcessing #DatasetHandling #OpenCV #DeepLearning

https://t.iss.one/DataScience4

❤3

1.52K views20:13

About

Blog

Apps

Platform