Data Science Jupyter Notebooks
11K subscribers
269 photos
31 videos
9 files
726 links
Explore the world of Data Science through Jupyter Notebooksโ€”insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽฅ Ditto: Innovations in Video Editing with AI

Ditto is an advanced platform for generating high-quality data for instruction-based video editing. It combines the power of image and video generators, creating a unique Ditto-1M dataset with one million examples, enabling the training of models like Editto with outstanding results.

๐Ÿš€Key highlights:
- Innovative data generation for video editing.
- Unique Ditto-1M dataset with one million examples.
- Efficient model architecture to reduce costs and improve quality.
- Use of an intelligent agent for filtering and quality control.

๐Ÿ“Œ GitHub: https://github.com/EzioBy/Ditto
๐Ÿ”ฅ Trending Repository: opentelemetry-collector

๐Ÿ“ Description: OpenTelemetry Collector

๐Ÿ”— Repository URL: https://github.com/open-telemetry/opentelemetry-collector

๐ŸŒ Website: https://opentelemetry.io

๐Ÿ“– Readme: https://github.com/open-telemetry/opentelemetry-collector#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 5.9K stars
๐Ÿ‘€ Watchers: 88
๐Ÿด Forks: 1.8K forks

๐Ÿ’ป Programming Languages: Go

๐Ÿท๏ธ Related Topics:
#monitoring #metrics #telemetry #observability #opentelemetry #open_telemetry


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
๐Ÿ”ฅ Trending Repository: Web-Dev-For-Beginners

๐Ÿ“ Description: 24 Lessons, 12 Weeks, Get Started as a Web Developer

๐Ÿ”— Repository URL: https://github.com/microsoft/Web-Dev-For-Beginners

๐Ÿ“– Readme: https://github.com/microsoft/Web-Dev-For-Beginners#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 92.5K stars
๐Ÿ‘€ Watchers: 2.7k
๐Ÿด Forks: 14.4K forks

๐Ÿ’ป Programming Languages: JavaScript - HTML - CSS - Vue - Python

๐Ÿท๏ธ Related Topics:
#javascript #css #html #learning #education #curriculum #tutorials #microsoft_for_beginners


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
๐Ÿ”ฅ Trending Repository: VoiceInk

๐Ÿ“ Description: Voice-to-text app for macOS to transcribe what you say to text almost instantly

๐Ÿ”— Repository URL: https://github.com/Beingpax/VoiceInk

๐ŸŒ Website: https://tryvoiceink.com

๐Ÿ“– Readme: https://github.com/Beingpax/VoiceInk#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 2.2K stars
๐Ÿ‘€ Watchers: 10
๐Ÿด Forks: 276 forks

๐Ÿ’ป Programming Languages: Swift

๐Ÿท๏ธ Related Topics:
#macos #swift #macos_app


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
๐Ÿ”ฅ Trending Repository: olmocr

๐Ÿ“ Description: Toolkit for linearizing PDFs for LLM datasets/training

๐Ÿ”— Repository URL: https://github.com/allenai/olmocr

๐Ÿ“– Readme: https://github.com/allenai/olmocr#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 14.9K stars
๐Ÿ‘€ Watchers: 77
๐Ÿด Forks: 1.1K forks

๐Ÿ’ป Programming Languages: Python - Shell - HTML

๐Ÿท๏ธ Related Topics: Not available

==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
This channels is for Programmers, Coders, Software Engineers.

0๏ธโƒฃ Python
1๏ธโƒฃ Data Science
2๏ธโƒฃ Machine Learning
3๏ธโƒฃ Data Visualization
4๏ธโƒฃ Artificial Intelligence
5๏ธโƒฃ Data Analysis
6๏ธโƒฃ Statistics
7๏ธโƒฃ Deep Learning
8๏ธโƒฃ programming Languages

โœ… https://t.iss.one/addlist/8_rRW2scgfRhOTc0

โœ… https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
โค1
๐Ÿ”ฅ Trending Repository: cpp-httplib

๐Ÿ“ Description: A C++ header-only HTTP/HTTPS server and client library

๐Ÿ”— Repository URL: https://github.com/yhirose/cpp-httplib

๐Ÿ“– Readme: https://github.com/yhirose/cpp-httplib#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 15.2K stars
๐Ÿ‘€ Watchers: 189
๐Ÿด Forks: 2.5K forks

๐Ÿ’ป Programming Languages: C++ - CMake - C - Meson - Makefile - Python

๐Ÿท๏ธ Related Topics:
#http #cpp #https #cpp11 #header_only


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
๐Ÿ”ฅ Trending Repository: MONAI

๐Ÿ“ Description: AI Toolkit for Healthcare Imaging

๐Ÿ”— Repository URL: https://github.com/Project-MONAI/MONAI

๐ŸŒ Website: https://monai.io/

๐Ÿ“– Readme: https://github.com/Project-MONAI/MONAI#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 7K stars
๐Ÿ‘€ Watchers: 95
๐Ÿด Forks: 1.3K forks

๐Ÿ’ป Programming Languages: Python - C++ - Cuda

๐Ÿท๏ธ Related Topics:
#deep_learning #python3 #pytorch #medical_image_computing #medical_image_processing #healthcare_imaging #monai


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
๐Ÿ”ฅ Trending Repository: jan

๐Ÿ“ Description: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.

๐Ÿ”— Repository URL: https://github.com/janhq/jan

๐ŸŒ Website: https://jan.ai/

๐Ÿ“– Readme: https://github.com/janhq/jan#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 38.4K stars
๐Ÿ‘€ Watchers: 203
๐Ÿด Forks: 2.3K forks

๐Ÿ’ป Programming Languages: TypeScript - Rust - Python - JavaScript - Shell - PowerShell

๐Ÿท๏ธ Related Topics:
#open_source #self_hosted #gpt #tauri #llm #chatgpt #llamacpp #localai


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
๐Ÿ”ฅ Trending Repository: mem0

๐Ÿ“ Description: Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.

๐Ÿ”— Repository URL: https://github.com/mem0ai/mem0

๐ŸŒ Website: https://mem0.ai

๐Ÿ“– Readme: https://github.com/mem0ai/mem0#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 42.1K stars
๐Ÿ‘€ Watchers: 203
๐Ÿด Forks: 4.5K forks

๐Ÿ’ป Programming Languages: Python - TypeScript - MDX - Jupyter Notebook - JavaScript - Shell

๐Ÿท๏ธ Related Topics:
#python #application #state_management #ai #memory #chatbots #memory_management #agents #hacktoberfest #ai_agents #long_term_memory #rag #llm #chatgpt #genai


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
โค1
๐Ÿ”ฅ Trending Repository: WeKnora

๐Ÿ“ Description: LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

๐Ÿ”— Repository URL: https://github.com/Tencent/WeKnora

๐ŸŒ Website: https://weknora.weixin.qq.com

๐Ÿ“– Readme: https://github.com/Tencent/WeKnora#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 6.8K stars
๐Ÿ‘€ Watchers: 43
๐Ÿด Forks: 778 forks

๐Ÿ’ป Programming Languages: Go - Vue - Python - TypeScript - Shell - Less

๐Ÿท๏ธ Related Topics:
#agent #golang #multi_tenant #ai #chatbot #evaluation #embeddings #openai #question_answering #chatbots #knowledge_base #semantic_search #reranking #multimodel #rag #vector_search #llm #generative_ai #agentic #ollama


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
๐Ÿ”ฅ Trending Repository: claude-relay-service

๐Ÿ“ Description: CRS-่‡ชๅปบClaude Code้•œๅƒ๏ผŒไธ€็ซ™ๅผๅผ€ๆบไธญ่ฝฌๆœๅŠก๏ผŒ่ฎฉ Claudeใ€OpenAIใ€Geminiใ€Droid ่ฎข้˜…็ปŸไธ€ๆŽฅๅ…ฅ๏ผŒๆ”ฏๆŒๆ‹ผ่ฝฆๅ…ฑไบซ๏ผŒๆ›ด้ซ˜ๆ•ˆๅˆ†ๆ‘Šๆˆๆœฌ๏ผŒๅŽŸ็”Ÿๅทฅๅ…ทๆ— ็ผไฝฟ็”จใ€‚

๐Ÿ”— Repository URL: https://github.com/Wei-Shaw/claude-relay-service

๐ŸŒ Website: https://pincc.ai

๐Ÿ“– Readme: https://github.com/Wei-Shaw/claude-relay-service#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 4.6K stars
๐Ÿ‘€ Watchers: 13
๐Ÿด Forks: 769 forks

๐Ÿ’ป Programming Languages: JavaScript - Vue - Shell - CSS - Makefile - Dockerfile

๐Ÿท๏ธ Related Topics:
#droid #crs #claude #claude_api #gemini_cli #claude_code #codex_cli #claude_proxy #droid_cli #droid2api


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
๐Ÿ”ฅ Trending Repository: Ventoy

๐Ÿ“ Description: A new bootable USB solution.

๐Ÿ”— Repository URL: https://github.com/ventoy/Ventoy

๐ŸŒ Website: https://www.ventoy.net

๐Ÿ“– Readme: https://github.com/ventoy/Ventoy#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 71.7K stars
๐Ÿ‘€ Watchers: 683
๐Ÿด Forks: 4.5K forks

๐Ÿ’ป Programming Languages: C - Shell - HTML - C++ - CSS - Makefile

๐Ÿท๏ธ Related Topics:
#windows #linux #unix #legacy #usb #multiboot #persistence #bsd #uefi #chromeos #iso_files #secure_boot #unattended #auto_install #bootable_usb


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
๐Ÿ”ฅ Trending Repository: BettaFish

๐Ÿ“ Description: ๅพฎ่ˆ†๏ผšไบบไบบๅฏ็”จ็š„ๅคšAgent่ˆ†ๆƒ…ๅˆ†ๆžๅŠฉๆ‰‹๏ผŒๆ‰“็ ดไฟกๆฏ่Œงๆˆฟ๏ผŒ่ฟ˜ๅŽŸ่ˆ†ๆƒ…ๅŽŸ่ฒŒ๏ผŒ้ข„ๆต‹ๆœชๆฅ่ตฐๅ‘๏ผŒ่พ…ๅŠฉๅ†ณ็ญ–๏ผไปŽ0ๅฎž็Žฐ๏ผŒไธไพ่ต–ไปปไฝ•ๆก†ๆžถใ€‚

๐Ÿ”— Repository URL: https://github.com/666ghj/BettaFish

๐Ÿ“– Readme: https://github.com/666ghj/BettaFish#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 2.1K stars
๐Ÿ‘€ Watchers:
๐Ÿด Forks: 295 forks

๐Ÿ’ป Programming Languages: Python - HTML

๐Ÿท๏ธ Related Topics:
#nlp #sentiment_analysis #python3 #data_analysis #deep_search #multi_agent_system #agent_framework #public_opinion_analysis #llms #deep_research


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
๐Ÿ”ฅ Trending Repository: LLaMA-Factory

๐Ÿ“ Description: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

๐Ÿ”— Repository URL: https://github.com/hiyouga/LLaMA-Factory

๐ŸŒ Website: https://llamafactory.readthedocs.io

๐Ÿ“– Readme: https://github.com/hiyouga/LLaMA-Factory#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 61.3K stars
๐Ÿ‘€ Watchers: 295
๐Ÿด Forks: 7.4K forks

๐Ÿ’ป Programming Languages: Python

๐Ÿท๏ธ Related Topics:
#nlp #agent #ai #transformers #moe #llama #gpt #lora #quantization #gemma #fine_tuning #peft #large_language_models #llm #rlhf #instruction_tuning #qlora #qwen #deepseek #llama3


==================================
๐Ÿง  By: https://t.iss.one/DataScienceM
Clean Code Tip:

For reusable setup and teardown logic, you can create your own context managers. Instead of writing a full class with __enter__ and __exit__, use the @contextmanager decorator from the contextlib module for a more concise and elegant solution. This is a pro-level technique for robust resource management. ๐Ÿš€

Example:

import contextlib

# The verbose, class-based way to create a context manager
class DatabaseConnection:
def __init__(self, db_name):
self._db_name = db_name
self._conn = None
print(f"Initializing connection to {self._db_name}...")

def __enter__(self):
print("-> Entering context: Opening connection.")
self._conn = f"CONNECTION_TO_{self._db_name}" # Simulate connection
return self._conn

def __exit__(self, exc_type, exc_val, exc_tb):
print("<- Exiting context: Closing connection.")
self._conn = None # Simulate closing

print("--- Class-Based Way ---")
with DatabaseConnection("users.db") as conn:
print(f" Performing operations with {conn}")


# The clean, Pythonic way using a generator and @contextmanager
@contextlib.contextmanager
def managed_database(db_name):
print(f"Initializing connection to {db_name}...")
conn = f"CONNECTION_TO_{db_name}"
try:
print("-> Entering context: Yielding connection.")
yield conn # The code inside the 'with' block runs here
finally:
# This code is guaranteed to run, just like __exit__
print("<- Exiting context: Closing connection in 'finally'.")
conn = None

print("\n--- @contextmanager Way ---")
with managed_database("products.db") as conn:
print(f" Performing operations with {conn}")


โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
By: @DataScienceN โœจ
#YOLOv8 #ComputerVision #ObjectDetection #Python #AI

Audience Analysis with YOLOv8: Counting People & Estimating Gender Ratios

This lesson demonstrates how to use the YOLOv8 model to perform a computer vision task: analyzing an image of a crowd to count the total number of people and estimate the ratio of men to women.

---

Step 1: Setup and Installation

First, we need to install the necessary libraries. ultralytics for the YOLOv8 model, opencv-python for image manipulation, and cvlib for a simple, pre-trained gender classification model.

#Setup #Installation

# Open your terminal or command prompt and run:
pip install ultralytics opencv-python cvlib tensorflow


---

Step 2: Loading Models and Image

We will load two models: the official YOLOv8 model pre-trained for object detection, and we'll use cvlib for gender detection. We also need to load the image we want to analyze. Make sure you have an image named crowd.jpg in the same directory.

#DataLoading #Model

import cv2
from ultralytics import YOLO
import cvlib as cv
import numpy as np

# Load the YOLOv8 model (pre-trained on COCO dataset)
model = YOLO('yolov8n.pt')

# Load the image
image_path = 'crowd.jpg' # Make sure this image exists
img = cv2.imread(image_path)

# Check if the image was loaded correctly
if img is None:
print(f"Error: Could not load image from {image_path}")
else:
print("Image and YOLOv8 model loaded successfully.")


---

Step 3: Person Detection with YOLOv8

Now, we'll run the YOLOv8 model on our image to detect all objects and then filter those results to keep only the ones identified as a 'person'.

#PersonDetection #Inference

# Run inference on the image
results = model(img)

# A list to store the bounding boxes of detected people
person_boxes = []

# Process the results
for result in results:
boxes = result.boxes
for box in boxes:
# Get class id and check if it's a person (class 0 in COCO)
if model.names[int(box.cls)] == 'person':
# Get bounding box coordinates
x1, y1, x2, y2 = map(int, box.xyxy[0])
person_boxes.append((x1, y1, x2, y2))

# Print the total number of people found
total_people = len(person_boxes)
print(f"Total people detected: {total_people}")


---

Step 4: Gender Classification

For each detected person, we will crop their bounding box from the image. Then, we'll use cvlib to detect a face within that crop and predict the gender. This is a multi-step pipeline.

#GenderClassification #CV
# Counters for male and female
male_count = 0
female_count = 0

# A copy of the original image for drawing results
output_img = img.copy()

# Loop through each person's bounding box
for (x1, y1, x2, y2) in person_boxes:
# Crop the person from the image
person_crop = img[y1:y2, x1:x2]

label = "Unknown"

try:
# Apply gender detection on the person crop
# padding is used to better detect faces at the edge of the crop
face, confidence = cv.detect_face(person_crop, threshold=0.5)

# We process only if one face is detected to avoid ambiguity
if len(face) > 0:
# Get the first face detected
(startX, startY, endX, endY) = face[0]
face_crop = np.copy(person_crop[startY:endY, startX:endX])

# Predict gender of the detected face
(gender_label, gender_confidence) = cv.detect_gender(face_crop)

if gender_confidence > 0.6: # Confidence threshold
label = gender_label
if label == 'male':
male_count += 1
elif label == 'female':
female_count += 1

except Exception as e:
# Sometimes cvlib can fail on small or unclear crops
label = "Error"

# Draw bounding box and label on the output image
color = (0, 255, 0) if label in ["male", "female"] else (0, 0, 255)
cv2.rectangle(output_img, (x1, y1), (x2, y2), color, 2)
cv2.putText(output_img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

print(f"Males detected: {male_count}")
print(f"Females detected: {female_count}")


---

Step 5: Displaying Final Results

Finally, we calculate the percentages and display the annotated image along with a summary of our findings.

#Results #Visualization

# Calculate percentages
known_gender_count = male_count + female_count
if known_gender_count > 0:
male_percentage = (male_count / known_gender_count) * 100
female_percentage = (female_count / known_gender_count) * 100
else:
male_percentage = 0
female_percentage = 0

# Prepare the summary text
summary_text1 = f"Total People: {total_people}"
summary_text2 = f"Men: {male_count} ({male_percentage:.1f}%)"
summary_text3 = f"Women: {female_count} ({female_percentage:.1f}%)"

# Add summary text to the image
cv2.putText(output_img, summary_text1, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 3)
cv2.putText(output_img, summary_text2, (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 3)
cv2.putText(output_img, summary_text3, (10, 90), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 3)

# Save or display the final image
cv2.imwrite('crowd_analysis_result.jpg', output_img)
print("\n--- Analysis Complete ---")
print(summary_text1)
print(summary_text2)
print(summary_text3)
print("Result image saved as 'crowd_analysis_result.jpg'")


---

Step 6: Discussion of Results and Limitations

#Discussion #Ethics #AI
Person Detection Accuracy: YOLOv8 is highly effective at detecting people, but it can struggle with heavy occlusion (people hiding others) or very low-resolution images, potentially leading to an undercount.
Gender Classification Reliability: The secondary gender classification model (cvlib) is a simplified model. Its accuracy depends heavily on clear, front-facing views of faces. It may fail on side profiles, poor lighting, or small faces.
Ethical Considerations & Bias: Gender classification from images is an inherently problematic task. These models are trained on datasets that may contain biases and often rely on stereotypical features (e.g., hair length). The model's performance can be worse for certain ethnicities and it cannot account for non-binary gender identities. This tool should be seen as a rough estimator based on visual stereotypes, not a definitive measure of gender.
Performance: For real-time video, this multi-step process (YOLO -> crop -> face detection -> gender detection) can be slow. A more advanced approach would be to fine-tune a single object detection model on a custom dataset with 'man' and 'woman' classes for much faster and more integrated performance.

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
By: @DataScienceN โœจ
๐Ÿ”ฅ Trending Repository: chef

๐Ÿ“ Description: The only AI app builder that knows backend

๐Ÿ”— Repository URL: https://github.com/get-convex/chef

๐ŸŒ Website: https://chef.convex.dev

๐Ÿ“– Readme: https://github.com/get-convex/chef#readme

๐Ÿ“Š Statistics:
๐ŸŒŸ Stars: 2.5K stars
๐Ÿ‘€ Watchers: 22
๐Ÿด Forks: 497 forks

๐Ÿ’ป Programming Languages: TypeScript - JavaScript - CSS

๐Ÿท๏ธ Related Topics: Not available

==================================
๐Ÿง  By: https://t.iss.one/DataScienceM