Skip to content

Create SQL Expression Validator for Rule Library #25482

@TeddyCr

Description

@TeddyCr

Summary

Implement a new validator that executes sqlExpression from custom TestDefinitions in the rule library. This validator will compile parameterized SQL templates and execute them against SQLAlchemy-based data sources.

Background

PR #24748 added the ability to define custom TestDefinitions with sqlExpression fields, but there is no validator to execute these expressions. This issue implements the execution logic.

Acceptance Criteria

  • New validator class ruleLibrarySqlExpression created
  • Validator compiles Jinja2 templates with test case parameters
  • {{ table }} resolves to the target table's fully qualified name
  • Parameter type coercion follows the defined mapping (STRING quoted, INT/NUMBER unquoted, etc.)
  • SQL safety validation occurs before execution
  • Pass/fail logic: 0 rows returned = pass, 1+ rows = fail (default)
  • Proper error handling with TestCaseResult status mapping
  • Unit and integration tests

Implementation Details

File Structure

ingestion/src/metadata/data_quality/validations/table/
├── base/
│   └── ruleLibrarySqlExpression.py      # Base class with shared logic
└── sqlalchemy/
    └── ruleLibrarySqlExpression.py      # SQLAlchemy implementation

Validator Discovery

The validator will be auto-discovered by importer.py when a TestDefinition has:

  • fullyQualifiedName: ruleLibrarySqlExpression

Alternatively, modify the validator builder to check for sqlExpression presence and route to this validator.

Error Handling

Failure Mode TestCaseResult Status Message
No sqlExpression Aborted "TestDefinition has no sqlExpression"
Invalid template syntax Aborted Jinja2 error message
Missing required parameter Aborted "Missing parameter: {name}"
SQL safety check fails Aborted "SQL safety validation failed"
SQL syntax error Aborted Database error message
SQL timeout Aborted Timeout error message
0 rows returned Success "Query returned 0 row(s)"
1+ rows returned Failed "Query returned N row(s)"

Testing

Unit Tests

# ingestion/tests/unit/data_quality/validations/table/test_rule_library_sql_expression.py

class TestRuleLibrarySqlExpressionValidator:

    def test_coerce_string_value(self):
        validator = create_validator(...)
        assert validator._coerce_value("hello", TestDataType.STRING) == "'hello'"

    def test_coerce_string_escapes_quotes(self):
        validator = create_validator(...)
        assert validator._coerce_value("it's", TestDataType.STRING) == "'it''s'"

    def test_coerce_int_value(self):
        validator = create_validator(...)
        assert validator._coerce_value("42", TestDataType.INT) == "42"

    def test_coerce_boolean_true(self):
        validator = create_validator(...)
        assert validator._coerce_value("true", TestDataType.BOOLEAN) == "TRUE"

    def test_coerce_array_value(self):
        validator = create_validator(...)
        assert validator._coerce_value("a,b,c", TestDataType.ARRAY) == "('a', 'b', 'c')"

    def test_template_compilation(self):
        # Test {{ table }} and {{ param }} substitution
        ...

    def test_unsafe_sql_rejected(self):
        # Test that DROP, DELETE, etc. in template raises error
        ...

Integration Tests

Add integration test that:

  1. Creates a TestDefinition with sqlExpression
  2. Creates a TestCase with parameter values
  3. Runs the test against a test database
  4. Verifies correct pass/fail based on row count

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

No status

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions