#DataScience #SQL #Python #MachineLearning #Statistics #BusinessAnalytics #ProductCaseStudies #DataScienceProjects #InterviewPrep #LearnDataScience #YouTubeLearning #CodingInterview #MLInterview #SQLProjects #PythonForDataScience
Please open Telegram to view this post
VIEW IN TELEGRAM
โค15๐3๐1
Topic: Handling Datasets of All Types โ Part 1 of 5: Introduction and Basic Concepts
---
1. What is a Dataset?
โข A dataset is a structured collection of data, usually organized in rows and columns, used for analysis or training machine learning models.
---
2. Types of Datasets
โข Structured Data: Tables, spreadsheets with rows and columns (e.g., CSV, Excel).
โข Unstructured Data: Images, text, audio, video.
โข Semi-structured Data: JSON, XML files containing hierarchical data.
---
3. Common Dataset Formats
โข CSV (Comma-Separated Values)
โข Excel (.xls, .xlsx)
โข JSON (JavaScript Object Notation)
โข XML (eXtensible Markup Language)
โข Images (JPEG, PNG, TIFF)
โข Audio (WAV, MP3)
---
4. Loading Datasets in Python
โข Use libraries like
โข Use libraries like
---
5. Basic Dataset Exploration
โข Check shape and size:
โข Preview data:
โข Check for missing values:
---
6. Summary
โข Understanding dataset types is crucial before processing.
โข Loading and exploring datasets helps identify cleaning and preprocessing needs.
---
Exercise
โข Load a CSV and JSON dataset in Python, print their shapes, and identify missing values.
---
#DataScience #Datasets #DataLoading #Python #DataExploration
The rest of the parts๐
https://t.iss.one/DataScienceM๐
---
1. What is a Dataset?
โข A dataset is a structured collection of data, usually organized in rows and columns, used for analysis or training machine learning models.
---
2. Types of Datasets
โข Structured Data: Tables, spreadsheets with rows and columns (e.g., CSV, Excel).
โข Unstructured Data: Images, text, audio, video.
โข Semi-structured Data: JSON, XML files containing hierarchical data.
---
3. Common Dataset Formats
โข CSV (Comma-Separated Values)
โข Excel (.xls, .xlsx)
โข JSON (JavaScript Object Notation)
โข XML (eXtensible Markup Language)
โข Images (JPEG, PNG, TIFF)
โข Audio (WAV, MP3)
---
4. Loading Datasets in Python
โข Use libraries like
pandas
for structured data:import pandas as pd
df = pd.read_csv('data.csv')
โข Use libraries like
json
for JSON files:import json
with open('data.json') as f:
data = json.load(f)
---
5. Basic Dataset Exploration
โข Check shape and size:
print(df.shape)
โข Preview data:
print(df.head())
โข Check for missing values:
print(df.isnull().sum())
---
6. Summary
โข Understanding dataset types is crucial before processing.
โข Loading and exploring datasets helps identify cleaning and preprocessing needs.
---
Exercise
โข Load a CSV and JSON dataset in Python, print their shapes, and identify missing values.
---
#DataScience #Datasets #DataLoading #Python #DataExploration
The rest of the parts
https://t.iss.one/DataScienceM
Please open Telegram to view this post
VIEW IN TELEGRAM
โค27๐1
๐ Comprehensive Guide: How to Prepare for a Graph Neural Networks (GNN) Job Interview โ 350 Most Common Interview Questions
Read: https://hackmd.io/@husseinsheikho/GNN-interview
#GNN #GraphNeuralNetworks #MachineLearning #DeepLearning #AI #DataScience #PyTorchGeometric #DGL #NodeClassification #LinkPrediction #GraphML
Read: https://hackmd.io/@husseinsheikho/GNN-interview
#GNN #GraphNeuralNetworks #MachineLearning #DeepLearning #AI #DataScience #PyTorchGeometric #DGL #NodeClassification #LinkPrediction #GraphML
โ๏ธ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk
๐ฑ Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
โค8
๐ฃ๐ฟ๐ฒ๐ฝ๐ฎ๐ฟ๐ฒ ๐ณ๐ผ๐ฟ ๐๐ผ๐ฏ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐๐.
In DS or AI/ML interviews, you need to be able to explain models, debug them live, and design AI/ML systems from scratch. If you canโt demonstrate this during an interview, expect to hear, โWeโll get back to you.โ
The attached person's name is Chip Huyen. Hopefully you know her; if not, then I can't help you here. She is probably one of the finest authors in the field of AI/ML.
She designed proper documentation/a book for common ML interview questions.
Target Audiences: ML engineer, a platform engineer, a research scientist, or you want to do ML but donโt yet know the differences among those titles.Check the comment section for links and repos.
๐ link:
https://huyenchip.com/ml-interviews-book/
๏ปฟ
https://t.iss.one/CodeProgrammer๐
In DS or AI/ML interviews, you need to be able to explain models, debug them live, and design AI/ML systems from scratch. If you canโt demonstrate this during an interview, expect to hear, โWeโll get back to you.โ
The attached person's name is Chip Huyen. Hopefully you know her; if not, then I can't help you here. She is probably one of the finest authors in the field of AI/ML.
She designed proper documentation/a book for common ML interview questions.
Target Audiences: ML engineer, a platform engineer, a research scientist, or you want to do ML but donโt yet know the differences among those titles.Check the comment section for links and repos.
https://huyenchip.com/ml-interviews-book/
#JobInterview #MachineLearning #AI #DataScience #MLEngineer #AIInterview #TechCareers #DeepLearning #AICommunity #MLSystems #CareerGrowth #AIJobs #ChipHuyen #InterviewPrep #DataScienceCommunit
๏ปฟ
https://t.iss.one/CodeProgrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
โค6๐ฏ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐จ๐ปโ๐ป This Python library helps you extract usable data for language models from complex files like tables, images, charts, or multi-page documents.
๐ The idea of Agentic Document Extraction is that unlike common methods like OCR that only read text, it can also understand the structure and relationships between different parts of the document. For example, it understands which title belongs to which table or image.
โ
Works with PDFs, images, and website links.
โ๏ธ Can chunk and process very large documents (up to 1000 pages) by itself.
โ๏ธ Outputs both JSON and Markdown formats.
โ๏ธ Even specifies the exact location of each section on the page.
โ๏ธ Supports parallel and batch processing.
โ๐ฅต Agentic Document Extraction
โ๐ Website
โ๐ฑ GitHub Repos
๐ #DataScience #DataScience
โโโโโโโโโโโโโ
https://t.iss.one/CodeProgrammer
pip install agentic-doc
โ
โ
โ
โโโโโโโโโโโโโ
https://t.iss.one/CodeProgrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
โค7๐2๐ฅ1
๐จ๐ปโ๐ป Each playlist is designed to be simple and understandable for beginners, and then gradually dive deeper into the topics.
โโโโโโโโโโโโโ
https://t.iss.one/CodeProgrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
โค15๐1