Python | Algorithms | Data Structures | Cyber Security

Python | Algorithms | Data Structures | Cyber Security | Networks

💡 Python: Converting Numbers to Human-Readable Words

Transforming numerical values into their word equivalents is crucial for various applications like financial reports, check writing, educational software, or enhancing accessibility. While complex to implement from scratch for all cases, Python's num2words library provides a robust and easy solution. Install it with pip install num2words.

from num2words import num2words

# Example 1: Basic integer
number1 = 123
words1 = num2words(number1)
print(f"'{number1}' in words: {words1}")

# Example 2: Larger integer
number2 = 543210
words2 = num2words(number2, lang='en') # Explicitly set language
print(f"'{number2}' in words: {words2}")

# Example 3: Decimal number
number3 = 100.75
words3 = num2words(number3)
print(f"'{number3}' in words: {words3}")

# Example 4: Negative number
number4 = -45
words4 = num2words(number4)
print(f"'{number4}' in words: {words4}")

# Example 5: Number for an ordinal form
number5 = 3
words5 = num2words(number5, to='ordinal')
print(f"Ordinal '{number5}' in words: {words5}")

Code explanation: This script uses the num2words library to convert various integers, decimals, and negative numbers into their English word representations. It also demonstrates how to generate ordinal forms (third instead of three) and explicitly set the output language.

#Python #TextProcessing #NumberToWords #num2words #DataManipulation

━━━━━━━━━━━━━━━
By: @DataScience4 ✨

725 views06:06

Python | Algorithms | Data Structures | Cyber Security | Networks

from nltk.corpus import stopwords
# nltk.download('stopwords') # Run once
stop_words = set(stopwords.words('english'))
filtered = [w for w in words if not w.lower() in stop_words]

VII. Word Normalization (Stemming & Lemmatization)

• Stemming (reduce words to their root form).

from nltk.stem import PorterStemmer
ps = PorterStemmer()
stemmed = ps.stem("running") # 'run'

• Lemmatization (reduce words to their dictionary form).

from nltk.stem import WordNetLemmatizer
# nltk.download('wordnet') # Run once
lemmatizer = WordNetLemmatizer()
lemma = lemmatizer.lemmatize("better", pos="a") # 'good'

VIII. Advanced NLP Analysis
(Requires pip install spacy and python -m spacy download en_core_web_sm)

• Part-of-Speech (POS) Tagging.

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying a U.K. startup.")
for token in doc: print(token.text, token.pos_)

• Named Entity Recognition (NER).

for ent in doc.ents:
    print(ent.text, ent.label_) # Apple ORG, U.K. GPE

• Get word frequency distribution.

from nltk.probability import FreqDist
fdist = FreqDist(word_tokenize("this is a test this is only a test"))

IX. Text Formatting & Encoding

• Format strings with f-strings.

name = "Alice"
age = 30
message = f"Name: {name}, Age: {age}"

• Pad a string with leading zeros.

number = "42".zfill(5) # '00042'

• Encode a string to bytes.

byte_string = "hello".encode('utf-8')

• Decode bytes to a string.

original_string = byte_string.decode('utf-8')

X. Text Vectorization
(Requires pip install scikit-learn)

• Create a Bag-of-Words (BoW) model.

from sklearn.feature_extraction.text import CountVectorizer
corpus = ["This is the first document.", "This is the second document."]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)

• Get feature names (the vocabulary).

print(vectorizer.get_feature_names_out())

• Create a TF-IDF model.

from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(corpus)

XI. More String Utilities

• Center a string within a width.

centered = "Hello".center(20, '-') # '-------Hello--------'

• Check if a string is in title case.

"This Is A Title".istitle() # True

• Find the highest index of a substring.

"test test".rfind("test") # Returns 5

• Split from the right.

"path/to/file.txt".rsplit('/', 1) # ['path/to', 'file.txt']

• Create a character translation table.

table = str.maketrans('aeiou', '12345')
vowels_to_num = "hello".translate(table) # 'h2ll4'

• Remove a specific prefix.

"TestCase".removeprefix("Test") # 'Case'

• Remove a specific suffix.

"filename.txt".removesuffix(".txt") # 'filename'

• Check for unicode decimal characters.

"½".isdecimal() # False
"123".isdecimal() # True

• Check for unicode numeric characters.

"½".isnumeric() # True
"²".isnumeric() # True

#Python #TextProcessing #NLP #RegEx #NLTK

━━━━━━━━━━━━━━━
By: @DataScience4 ✨

629 views11:04

About

Blog

Apps

Platform