Инструмент Python & Command-Line для сбора текста в Интернете: ползание в Интернете, извлечение текста, метаданные, комментарии
Languages: #Python
Topics: #article_extractor #corpus #corpus_builder #corpus_tools #crawler .
Languages: #Python
Topics: #article_extractor #corpus #corpus_builder #corpus_tools #crawler .
GitHub
GitHub - adbar/trafilatura: Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction,…
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML - adbar/trafilatura
👍2🔥1