-
Notifications
You must be signed in to change notification settings - Fork 16.5k
fix(mcp): use SQLScript for all SQL parsing in execute_sql #36599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(mcp): use SQLScript for all SQL parsing in execute_sql #36599
Conversation
CTE queries (WITH ... AS ... SELECT) were returning empty rows because _is_select_query() only checked for queries starting with "select". CTEs start with "with", causing them to be routed to _process_dml_results() instead of _process_select_results(). Changes: - Updated _is_select_query() to recognize WITH clauses as SELECT queries - Updated _apply_limit() to add LIMIT to CTE queries - Added comprehensive unit tests for CTE query handling
|
CodeAnt AI is reviewing your PR. |
Code Review Agent Run #1a3f82Actionable Suggestions - 0Additional Suggestions - 2
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Nitpicks 🔍
|
| # CTE queries start with WITH and contain SELECT | ||
| return sql_lower.startswith("select") or sql_lower.startswith("with") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Logic bug: CTEs that start with "WITH" but are not SELECT statements (e.g. "WITH ... INSERT ...") are currently classified as SELECT queries; this will cause the code to call fetch methods on non-SELECT statements and may raise database/DB-API errors. Ensure "WITH" queries are treated as SELECT only when a SELECT is actually present. [logic error]
Severity Level: Minor
| # CTE queries start with WITH and contain SELECT | |
| return sql_lower.startswith("select") or sql_lower.startswith("with") | |
| # Treat CTE queries that start with WITH as SELECT-like only if they contain a SELECT clause | |
| return sql_lower.startswith("select") or ( | |
| sql_lower.startswith("with") and "select" in sql_lower | |
| ) |
Why it matters? ⭐
Correct. The current implementation treats any statement starting with WITH as a SELECT, which is wrong for CTEs followed by DML (e.g., "WITH ... INSERT ..."). The proposed change to require a SELECT token when the SQL starts with WITH fixes a real logic bug and prevents fetch-based processing on non-SELECT statements.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/mcp_service/sql_lab/sql_lab_utils.py
**Line:** 214:215
**Comment:**
*Logic Error: Logic bug: CTEs that start with "WITH" but are not SELECT statements (e.g. "WITH ... INSERT ...") are currently classified as SELECT queries; this will cause the code to call fetch methods on non-SELECT statements and may raise database/DB-API errors. Ensure "WITH" queries are treated as SELECT only when a SELECT is actually present.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use SQLScript and SQLStatement from superset/sql/parse.py for all your parsing/AST checks/AST manipulation needs. Something like this should work:
script = SQLScript(sql, database.db_engine_spec.engine)
if script.has_mutation():
_process_select_results(cursor, results, limit)
else:
_process_dml_results(cursor, conn, results)|
|
||
| def test_simple_select(self): | ||
| """Test simple SELECT query is recognized.""" | ||
| assert _is_select_query("SELECT * FROM table1") is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Tests use identity comparison (is True) to check boolean return values. Using is to compare booleans can be unreliable and is considered a logic error; use a direct assertion or truthiness check instead to avoid surprising failures. [logic error]
Severity Level: Minor
| assert _is_select_query("SELECT * FROM table1") is True | |
| assert _is_select_query("SELECT * FROM table1") |
Why it matters? ⭐
The test currently uses identity comparison (is True) which is brittle: if the implementation ever returns a truthy non-bool (e.g., 1, numpy.bool_), the identity check can fail even though behavior is correct. Replacing with a plain truthiness check is safer and idiomatic for tests.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py
**Line:** 29:29
**Comment:**
*Logic Error: Tests use identity comparison (`is True`) to check boolean return values. Using `is` to compare booleans can be unreliable and is considered a logic error; use a direct assertion or truthiness check instead to avoid surprising failures.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| ) | ||
| SELECT * FROM cte_name | ||
| """ | ||
| assert _is_select_query(cte_sql) is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: A multiline CTE test asserts the boolean result using is True; identity comparison should be replaced with a plain assert to reliably check truthiness of the function result. [logic error]
Severity Level: Minor
| assert _is_select_query(cte_sql) is True | |
| assert _is_select_query(cte_sql) |
Why it matters? ⭐
Same as above — identity comparison is unnecessary and can produce false negatives if the function returns a truthy non-bool. Using assert _is_select_query(cte_sql) is clearer and more robust.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py
**Line:** 47:47
**Comment:**
*Logic Error: A multiline CTE test asserts the boolean result using `is True`; identity comparison should be replaced with a plain assert to reliably check truthiness of the function result.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
|
||
| def test_insert_not_select(self): | ||
| """Test INSERT query is not recognized as SELECT.""" | ||
| assert _is_select_query("INSERT INTO table1 VALUES (1)") is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Tests use is False to check boolean falsehood; identity comparison should be replaced with assert not <expr> to avoid identity vs equality pitfalls and make intent clearer. [logic error]
Severity Level: Minor
| assert _is_select_query("INSERT INTO table1 VALUES (1)") is False | |
| assert not _is_select_query("INSERT INTO table1 VALUES (1)") |
Why it matters? ⭐
Identity comparison to False has the same pitfalls as is True. Using assert not <expr> is the idiomatic and reliable way to assert falsiness in tests.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py
**Line:** 78:78
**Comment:**
*Logic Error: Tests use `is False` to check boolean falsehood; identity comparison should be replaced with `assert not <expr>` to avoid identity vs equality pitfalls and make intent clearer.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
|
||
| def test_update_not_select(self): | ||
| """Test UPDATE query is not recognized as SELECT.""" | ||
| assert _is_select_query("UPDATE table1 SET col = 1") is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Replace is False boolean identity comparison with assert not in update query test to avoid identity comparison issues and make the assertion semantics correct and idiomatic. [logic error]
Severity Level: Minor
| assert _is_select_query("UPDATE table1 SET col = 1") is False | |
| assert not _is_select_query("UPDATE table1 SET col = 1") |
Why it matters? ⭐
The recommendation is correct and improves test reliability/clarity; using assert not _is_select_query(...) avoids identity pitfalls and matches Python testing best practices.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py
**Line:** 82:82
**Comment:**
*Logic Error: Replace `is False` boolean identity comparison with `assert not` in update query test to avoid identity comparison issues and make the assertion semantics correct and idiomatic.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished reviewing your PR. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #36599 +/- ##
===========================================
+ Coverage 0 67.76% +67.76%
===========================================
Files 0 637 +637
Lines 0 47103 +47103
Branches 0 5130 +5130
===========================================
+ Hits 0 31921 +31921
- Misses 0 13905 +13905
- Partials 0 1277 +1277
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| # CTE queries start with WITH and contain SELECT | ||
| return sql_lower.startswith("select") or sql_lower.startswith("with") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use SQLScript and SQLStatement from superset/sql/parse.py for all your parsing/AST checks/AST manipulation needs. Something like this should work:
script = SQLScript(sql, database.db_engine_spec.engine)
if script.has_mutation():
_process_select_results(cursor, results, limit)
else:
_process_dml_results(cursor, conn, results)Per reviewer feedback, replaced simple string-based query detection with proper SQL parsing using SQLScript from superset/sql/parse.py. Changes: - Use SQLScript.has_mutation() to determine if query is SELECT-like - Updated tests to verify SQLScript mutation detection for CTEs - Removed the now-unused _is_select_query helper function
|
CodeAnt AI is running Incremental review |
Per reviewer feedback, now using SQLScript from superset/sql/parse.py for all SQL parsing needs: - validate_sql_query: uses SQLScript.has_mutation() for DML detection and check_functions_present() for disallowed functions - _apply_limit: uses SQLScript to parse and add LIMIT via AST - _execute_query: uses SQLScript.has_mutation() for query type detection This ensures consistent, robust SQL parsing across all operations.
SQLScript now correctly detects DROP TABLE as a mutation and blocks it before execution (improved security). Updated test to expect DML_NOT_ALLOWED error instead of EXECUTION_ERROR.
betodealmeida
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Summary
Fixes a bug where CTE (Common Table Expression) queries using
WITH ... AS ... SELECTsyntax were returning emptyrowsin the MCPexecute_sqltool response, even though the query executed successfully.Root Cause: Query type detection was using simple string matching, which failed for CTEs that start with
WITH.Changes
Now using
SQLScriptfromsuperset/sql/parse.pyconsistently for all SQL parsing needs:validate_sql_querystartswith()for DML keywordsSQLScript.has_mutation()validate_sql_querySQLScript.check_functions_present()_apply_limitSQLScript.set_limit_value()via AST_execute_queryString matchingSQLScript.has_mutation()Testing Instructions
Run unit tests:
Additional Information