-
Notifications
You must be signed in to change notification settings - Fork 0
Helpers
jairomelo edited this page May 20, 2025
·
2 revisions
A series of helper modules have been created to assist data exploration and wrangling.
The Preprocessing module contains functions for cleaning and standardizing different types of data.
The PreprocessingDates class contains methods for cleaning and standardizing date strings.
-
clean_date_strings: Cleans the date strings by replacing all/characters with-and removing all non-numeric characters (except for the-character). -
reverse_date_string: Reverses the date string from either "%d-%m-%Y" or "%m-%Y" to "%Y-%m-%d" or "%Y-%m" respectively. -
standardize_date_strings: Standardizes the date strings from "%d-%m-%Y" to "%Y-%m-%d".
import pandas as pd
from project_code.helpers.Preprocessing import PreprocessingDates
baptism_dates = pd.Series([
"1800/05/03",
"1800-05-03",
"05/1800",
"1800-05",
"05-1800",
"[ilegible]"
])
preprocessor = PreprocessingDates(baptism_dates)
cleaned_dates = preprocessor.clean_date_strings()
print(cleaned_dates)
standardized_dates = preprocessor.standardize_date_strings()
print(standardized_dates)0 1800-05-03
1 1800-05-03
2 05-1800
3 1800-05
4 05-1800
5
dtype: object
0 1800-05-03
1 1800-05-03
2 1800-05
3 1800-05
4 1800-05
5
dtype: object