Skip to content

Helpers

jairomelo edited this page May 20, 2025 · 2 revisions

Helper Modules

A series of helper modules have been created to assist data exploration and wrangling.

Preprocessing

The Preprocessing module contains functions for cleaning and standardizing different types of data.

Dates

The PreprocessingDates class contains methods for cleaning and standardizing date strings.

Methods

  • clean_date_strings: Cleans the date strings by replacing all / characters with - and removing all non-numeric characters (except for the - character).
  • reverse_date_string: Reverses the date string from either "%d-%m-%Y" or "%m-%Y" to "%Y-%m-%d" or "%Y-%m" respectively.
  • standardize_date_strings: Standardizes the date strings from "%d-%m-%Y" to "%Y-%m-%d".

Example

import pandas as pd
from project_code.helpers.Preprocessing import PreprocessingDates

baptism_dates = pd.Series([
    "1800/05/03",
    "1800-05-03",
    "05/1800",
    "1800-05",
    "05-1800",
    "[ilegible]"
])

preprocessor = PreprocessingDates(baptism_dates)
cleaned_dates = preprocessor.clean_date_strings()
print(cleaned_dates)

standardized_dates = preprocessor.standardize_date_strings()
print(standardized_dates)
0    1800-05-03
1    1800-05-03
2       05-1800
3       1800-05
4       05-1800
5              
dtype: object
0    1800-05-03
1    1800-05-03
2       1800-05
3       1800-05
4       1800-05
5              
dtype: object

Clone this wiki locally