Topic: Handling Datasets of All Types – Part 1 of 5: Introduction and Basic Concepts
---
1. What is a Dataset?
• A dataset is a structured collection of data, usually organized in rows and columns, used for analysis or training machine learning models.
---
2. Types of Datasets
• Structured Data: Tables, spreadsheets with rows and columns (e.g., CSV, Excel).
• Unstructured Data: Images, text, audio, video.
• Semi-structured Data: JSON, XML files containing hierarchical data.
---
3. Common Dataset Formats
• CSV (Comma-Separated Values)
• Excel (.xls, .xlsx)
• JSON (JavaScript Object Notation)
• XML (eXtensible Markup Language)
• Images (JPEG, PNG, TIFF)
• Audio (WAV, MP3)
---
4. Loading Datasets in Python
• Use libraries like
• Use libraries like
---
5. Basic Dataset Exploration
• Check shape and size:
• Preview data:
• Check for missing values:
---
6. Summary
• Understanding dataset types is crucial before processing.
• Loading and exploring datasets helps identify cleaning and preprocessing needs.
---
Exercise
• Load a CSV and JSON dataset in Python, print their shapes, and identify missing values.
---
#DataScience #Datasets #DataLoading #Python #DataExploration
The rest of the parts👇
https://t.iss.one/DataScienceM🌟
---
1. What is a Dataset?
• A dataset is a structured collection of data, usually organized in rows and columns, used for analysis or training machine learning models.
---
2. Types of Datasets
• Structured Data: Tables, spreadsheets with rows and columns (e.g., CSV, Excel).
• Unstructured Data: Images, text, audio, video.
• Semi-structured Data: JSON, XML files containing hierarchical data.
---
3. Common Dataset Formats
• CSV (Comma-Separated Values)
• Excel (.xls, .xlsx)
• JSON (JavaScript Object Notation)
• XML (eXtensible Markup Language)
• Images (JPEG, PNG, TIFF)
• Audio (WAV, MP3)
---
4. Loading Datasets in Python
• Use libraries like
pandas
for structured data:import pandas as pd
df = pd.read_csv('data.csv')
• Use libraries like
json
for JSON files:import json
with open('data.json') as f:
data = json.load(f)
---
5. Basic Dataset Exploration
• Check shape and size:
print(df.shape)
• Preview data:
print(df.head())
• Check for missing values:
print(df.isnull().sum())
---
6. Summary
• Understanding dataset types is crucial before processing.
• Loading and exploring datasets helps identify cleaning and preprocessing needs.
---
Exercise
• Load a CSV and JSON dataset in Python, print their shapes, and identify missing values.
---
#DataScience #Datasets #DataLoading #Python #DataExploration
The rest of the parts
https://t.iss.one/DataScienceM
Please open Telegram to view this post
VIEW IN TELEGRAM
❤27👍1
🚀 Comprehensive Guide: How to Prepare for a Graph Neural Networks (GNN) Job Interview – 350 Most Common Interview Questions
Read: https://hackmd.io/@husseinsheikho/GNN-interview
#GNN #GraphNeuralNetworks #MachineLearning #DeepLearning #AI #DataScience #PyTorchGeometric #DGL #NodeClassification #LinkPrediction #GraphML
Read: https://hackmd.io/@husseinsheikho/GNN-interview
#GNN #GraphNeuralNetworks #MachineLearning #DeepLearning #AI #DataScience #PyTorchGeometric #DGL #NodeClassification #LinkPrediction #GraphML
✉️ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk
📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
❤8
𝗣𝗿𝗲𝗽𝗮𝗿𝗲 𝗳𝗼𝗿 𝗝𝗼𝗯 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄𝘀.
In DS or AI/ML interviews, you need to be able to explain models, debug them live, and design AI/ML systems from scratch. If you can’t demonstrate this during an interview, expect to hear, “We’ll get back to you.”
The attached person's name is Chip Huyen. Hopefully you know her; if not, then I can't help you here. She is probably one of the finest authors in the field of AI/ML.
She designed proper documentation/a book for common ML interview questions.
Target Audiences: ML engineer, a platform engineer, a research scientist, or you want to do ML but don’t yet know the differences among those titles.Check the comment section for links and repos.
📌 link:
https://huyenchip.com/ml-interviews-book/
https://t.iss.one/CodeProgrammer🌟
In DS or AI/ML interviews, you need to be able to explain models, debug them live, and design AI/ML systems from scratch. If you can’t demonstrate this during an interview, expect to hear, “We’ll get back to you.”
The attached person's name is Chip Huyen. Hopefully you know her; if not, then I can't help you here. She is probably one of the finest authors in the field of AI/ML.
She designed proper documentation/a book for common ML interview questions.
Target Audiences: ML engineer, a platform engineer, a research scientist, or you want to do ML but don’t yet know the differences among those titles.Check the comment section for links and repos.
https://huyenchip.com/ml-interviews-book/
#JobInterview #MachineLearning #AI #DataScience #MLEngineer #AIInterview #TechCareers #DeepLearning #AICommunity #MLSystems #CareerGrowth #AIJobs #ChipHuyen #InterviewPrep #DataScienceCommunit
https://t.iss.one/CodeProgrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
❤6💯2
This media is not supported in your browser
VIEW IN TELEGRAM
👨🏻💻 This Python library helps you extract usable data for language models from complex files like tables, images, charts, or multi-page documents.
📝 The idea of Agentic Document Extraction is that unlike common methods like OCR that only read text, it can also understand the structure and relationships between different parts of the document. For example, it understands which title belongs to which table or image.
✅ Works with PDFs, images, and website links.
☑️ Can chunk and process very large documents (up to 1000 pages) by itself.
✔️ Outputs both JSON and Markdown formats.
☑️ Even specifies the exact location of each section on the page.
✔️ Supports parallel and batch processing.
┌🥵 Agentic Document Extraction
├🌎 Website
└🐱 GitHub Repos
🌐 #DataScience #DataScience
➖➖➖➖➖➖➖➖➖➖➖➖➖
https://t.iss.one/CodeProgrammer
pip install agentic-doc
┌
├
└
➖➖➖➖➖➖➖➖➖➖➖➖➖
https://t.iss.one/CodeProgrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
❤2👍2