Releases: nao1215/fileprep
Releases · nao1215/fileprep
v0.2.0
Added
- Conditional Required Validators (9caa374): New validators for conditional field requirements
required_if: Required if another field equals a specific valuerequired_unless: Required unless another field equals a specific valuerequired_with: Required if another field is presentrequired_without: Required if another field is not present
- Date/Time Validator (9caa374):
datetimevalidator with custom Go layout format support - Phone Number Validator (9caa374):
e164validator for E.164 international phone number format - Geolocation Validators (9caa374):
latitude(-90 to 90) andlongitude(-180 to 180) validators - UUID Variant Validators (9caa374):
uuid3,uuid4,uuid5for specific UUID versions, andulidfor ULID format - Hexadecimal and Color Validators (9caa374):
hexadecimal,hexcolor,rgb,rgba,hsl,hslavalidators - MAC Address Validator (9caa374):
macvalidator for MAC address format - Advanced Examples (f771f9b): Comprehensive documentation examples
- Complex Data Preprocessing and Validation example with real-world messy data
- Detailed Error Reporting example demonstrating validation error handling
- Benchmark Tests (607b868): Comprehensive benchmark suite for performance testing
Changed
- Performance Improvement (PR #6, 607b868): ~10% faster processing through optimized preprocessing and validation pipeline
- Documentation (f771f9b): Complete update of all README translations (Japanese, Spanish, French, Korean, Russian, Chinese) to match the English version with full feature documentation
v0.1.0
Added
- Initial Release: First stable release of fileprep library
- File Format Support: CSV, TSV, LTSV, Parquet, Excel (.xlsx) with compression support (gzip, bzip2, xz, zstd)
- Preprocessing Tags (
prep): Comprehensive struct tag-based preprocessing- Basic preprocessors:
trim,ltrim,rtrim,lowercase,uppercase,default - String transformation:
replace,prefix,suffix,truncate,strip_html,strip_newline,collapse_space - Character filtering:
remove_digits,remove_alpha,keep_digits,keep_alpha,trim_set - Padding:
pad_left,pad_right - Advanced:
normalize_unicode,nullify,coerce,fix_scheme,regex_replace
- Basic preprocessors:
- Validation Tags (
validate): Compatible with go-playground/validator syntax- Basic validators:
required,boolean - Character type validators:
alpha,alphaunicode,alphaspace,alphanumeric,alphanumunicode,numeric,number,ascii,printascii,multibyte - Numeric comparison:
eq,ne,gt,gte,lt,lte,min,max,len - String validators:
oneof,lowercase,uppercase,eq_ignore_case,ne_ignore_case - String content:
startswith,startsnotwith,endswith,endsnotwith,contains,containsany,containsrune,excludes,excludesall,excludesrune - Format validators:
email,uri,url,http_url,https_url,url_encoded,datauri,uuid - Network validators:
ip_addr,ip4_addr,ip6_addr,cidr,cidrv4,cidrv6,fqdn,hostname,hostname_rfc1123,hostname_port - Cross-field validators:
eqfield,nefield,gtfield,gtefield,ltfield,ltefield,fieldcontains,fieldexcludes
- Basic validators:
- Name-Based Column Binding: Automatic snake_case conversion with
nametag override - filesql Integration: Returns
io.Readerfor direct use with filesql - Detailed Error Reporting: Row and column information for each validation error
Technical Details
- Memory Optimization: In-place record processing, pre-allocated output buffers, streaming parsers for CSV/TSV/LTSV
- XLSX Streaming: Uses excelize streaming API to reduce memory usage for large files
- Parquet Buffer Reuse: Reusable row buffer across row groups to reduce allocations
- Format-Specific Limitations:
- XLSX: Only the first sheet is processed
- LTSV: Maximum line size is 10MB