Skip to content

raynardj/category

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

category

Categorical transformation for data science

PyPI version Python version License Test PyPI Downloads

Installation

pip install works for this library.

pip install category

Single Category

# using python core
>>> from category import Category
# using rust core, faster
>>> from category.fast import Category 
>>> book = Category(['a', 'b', 'c', 'Category_d', 'e', 'f', 'g', 'h', 'i', 'j'], pad_mst = False)
>>> book.i2c[2]
'c'

>>> book.c2i[['Category_d','f']]
array([3, 5])

You can set pad_mst to True to handle the missing token

# using python core
>>> from category import Category 
# using rust core, faster
>>> from category.fast import Category 
>>> book = Category(['a', 'b', 'c', 'Category_d', 'e', 'f', 'g', 'h', 'i', 'j'], pad_mst = True)
>>> book.i2c[2] # the 1st token is the missing token, not 'a' any more
'b'
>>> book.c2i[['Stranger','Category_d','Unknown','f']]
array([0, 4, 0, 6])

Multi-Category

# using python core
>>> from category import (Category, MultiCategory)
# using rust core, faster
>>> from category.fast import (Category, MultiCategory)
>>> cates = list(f"category{i}" for i in range(1000))
>>> multi_cate = MultiCategory(Category(cates, pad_mst = True))
>>> multi_cate.string_to_index("category42, category108")
array([42, 108])

You can also try to convert a list of strings, containing multicategorical info (which the data input is frequently used in tabular data), to nhot encoded array, and back

>>> nhot = multi_cate.batch_strings_to_nhot(["category42, category108","category999"])
>>> multi_cate.nhot_to_list(nhot)[0]
["category42", "category108"]

Performance

The running speed of this library mostly depends on python dictionary and numpy operations. Though python is a 'slow' language, such application is pretty fast, our own rust alternative is faster, by not by a huge lead

Here we compare the this library with the Rust implementation

References

Releases

No releases published

Packages

No packages published