-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Summary
Implement a new validator that executes sqlExpression from custom TestDefinitions in the rule library. This validator will compile parameterized SQL templates and execute them against SQLAlchemy-based data sources.
Background
PR #24748 added the ability to define custom TestDefinitions with sqlExpression fields, but there is no validator to execute these expressions. This issue implements the execution logic.
Acceptance Criteria
- New validator class
ruleLibrarySqlExpressioncreated - Validator compiles Jinja2 templates with test case parameters
-
{{ table }}resolves to the target table's fully qualified name - Parameter type coercion follows the defined mapping (STRING quoted, INT/NUMBER unquoted, etc.)
- SQL safety validation occurs before execution
- Pass/fail logic: 0 rows returned = pass, 1+ rows = fail (default)
- Proper error handling with TestCaseResult status mapping
- Unit and integration tests
Implementation Details
File Structure
ingestion/src/metadata/data_quality/validations/table/
├── base/
│ └── ruleLibrarySqlExpression.py # Base class with shared logic
└── sqlalchemy/
└── ruleLibrarySqlExpression.py # SQLAlchemy implementation
Validator Discovery
The validator will be auto-discovered by importer.py when a TestDefinition has:
fullyQualifiedName:ruleLibrarySqlExpression
Alternatively, modify the validator builder to check for sqlExpression presence and route to this validator.
Error Handling
| Failure Mode | TestCaseResult Status | Message |
|---|---|---|
| No sqlExpression | Aborted | "TestDefinition has no sqlExpression" |
| Invalid template syntax | Aborted | Jinja2 error message |
| Missing required parameter | Aborted | "Missing parameter: {name}" |
| SQL safety check fails | Aborted | "SQL safety validation failed" |
| SQL syntax error | Aborted | Database error message |
| SQL timeout | Aborted | Timeout error message |
| 0 rows returned | Success | "Query returned 0 row(s)" |
| 1+ rows returned | Failed | "Query returned N row(s)" |
Testing
Unit Tests
# ingestion/tests/unit/data_quality/validations/table/test_rule_library_sql_expression.py
class TestRuleLibrarySqlExpressionValidator:
def test_coerce_string_value(self):
validator = create_validator(...)
assert validator._coerce_value("hello", TestDataType.STRING) == "'hello'"
def test_coerce_string_escapes_quotes(self):
validator = create_validator(...)
assert validator._coerce_value("it's", TestDataType.STRING) == "'it''s'"
def test_coerce_int_value(self):
validator = create_validator(...)
assert validator._coerce_value("42", TestDataType.INT) == "42"
def test_coerce_boolean_true(self):
validator = create_validator(...)
assert validator._coerce_value("true", TestDataType.BOOLEAN) == "TRUE"
def test_coerce_array_value(self):
validator = create_validator(...)
assert validator._coerce_value("a,b,c", TestDataType.ARRAY) == "('a', 'b', 'c')"
def test_template_compilation(self):
# Test {{ table }} and {{ param }} substitution
...
def test_unsafe_sql_rejected(self):
# Test that DROP, DELETE, etc. in template raises error
...Integration Tests
Add integration test that:
- Creates a TestDefinition with sqlExpression
- Creates a TestCase with parameter values
- Runs the test against a test database
- Verifies correct pass/fail based on row count
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
No status
Status
No status