fix(mcp): use SQLScript for all SQL parsing in execute_sql #36599

aminghadersohi · 2025-12-12T22:43:10Z

Summary

Fixes a bug where CTE (Common Table Expression) queries using WITH ... AS ... SELECT syntax were returning empty rows in the MCP execute_sql tool response, even though the query executed successfully.

Root Cause: Query type detection was using simple string matching, which failed for CTEs that start with WITH.

Changes

Now using SQLScript from superset/sql/parse.py consistently for all SQL parsing needs:

Function	Before	After
`validate_sql_query`	String `startswith()` for DML keywords	`SQLScript.has_mutation()`
`validate_sql_query`	String matching for disallowed functions	`SQLScript.check_functions_present()`
`_apply_limit`	String matching + concatenation	`SQLScript.set_limit_value()` via AST
`_execute_query`	~~String matching~~	`SQLScript.has_mutation()`

Testing Instructions

# Before fix: returns {"rows": [], "affected_rows": 1}
# After fix: returns {"rows": [{"val": "test"}], "row_count": 1}

WITH simple AS (SELECT 'test' as val) SELECT val FROM simple

Run unit tests:

pytest tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py -v

Additional Information

Unit tests added (14 tests)
No breaking changes
Follows conventional commits
Uses existing SQL parsing infrastructure per reviewer feedback

CTE queries (WITH ... AS ... SELECT) were returning empty rows because _is_select_query() only checked for queries starting with "select". CTEs start with "with", causing them to be routed to _process_dml_results() instead of _process_select_results(). Changes: - Updated _is_select_query() to recognize WITH clauses as SELECT queries - Updated _apply_limit() to add LIMIT to CTE queries - Added comprehensive unit tests for CTE query handling

codeant-ai-for-open-source · 2025-12-12T22:43:16Z

CodeAnt AI is reviewing your PR.

bito-code-review · 2025-12-12T22:43:21Z

Code Review Agent Run #1a3f82

Actionable Suggestions - 0

Additional Suggestions - 2

superset/mcp_service/sql_lab/sql_lab_utils.py - 2
- Incomplete CTE Detection Logic · Line 162-163
  
  The logic for detecting SELECT-like queries should check that WITH clauses contain 'select' to avoid misclassifying queries like 'WITH cte AS (SELECT 1) INSERT INTO table SELECT * FROM cte' as SELECT queries, which could lead to incorrect LIMIT application and result processing.
- Incomplete CTE Detection Logic · Line 215-215
  
  Similarly, _is_select_query should ensure WITH queries contain 'select' to prevent false positives for non-SELECT statements.

Review Details

Files reviewed - 2 · Commit Range: 930a36b..930a36b
- superset/mcp_service/sql_lab/sql_lab_utils.py
- tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py
Files skipped - 0
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

/review - Manually triggers a full AI review.
/pause - Pauses automatic reviews on this pull request.
/resume - Resumes automatic reviews.
/resolve - Marks all Bito-posted review comments as resolved.
/abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by

codeant-ai-for-open-source · 2025-12-12T22:45:49Z

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review Multi-statement handling The code appends `LIMIT` to the whole SQL string. For multi-statement payloads (e.g. `WITH ...; INSERT ...`) or SQL with trailing comments, this may add LIMIT to the wrong statement. Consider detecting and modifying only the final SELECT statement. Coarse LIMIT detection _apply_limit uses a simple substring check ("limit" in sql_lower) to decide whether a LIMIT already exists. This will skip adding an outer LIMIT if any inner CTE or comment contains the word "limit". That can leave outer SELECTs unbounded in cases where only inner queries include LIMIT and the outer query lacks one. LIMIT detection `_apply_limit()` checks for the substring `"limit"` anywhere in `sql_lower`, which can produce false positives (matching "unlimited" or identifiers) and false negatives if LIMIT appears in a different statement. Use a word-boundary-aware regex to detect a standalone LIMIT clause. CTE detection `_is_select_query()` treats any SQL that starts with `WITH` as a SELECT/CTE without verifying that a SELECT actually appears inside the statement or accounting for variants like `WITH RECURSIVE` or leading comments. This can misclassify non-CTE statements, or miss CTEs preceded by comments or parentheses. Leading comments not handled _is_select_query strips whitespace but does not account for leading SQL comments (e.g., "-- ..." or "/* ... */"). If a query begins with a comment followed by a WITH/SELECT, startswith checks will fail and the query may be misclassified (not treated as a SELECT-like query).

codeant-ai-for-open-source · 2025-12-12T22:46:59Z

superset/mcp_service/sql_lab/sql_lab_utils.py

+    # CTE queries start with WITH and contain SELECT
+    return sql_lower.startswith("select") or sql_lower.startswith("with")


Suggestion: Logic bug: CTEs that start with "WITH" but are not SELECT statements (e.g. "WITH ... INSERT ...") are currently classified as SELECT queries; this will cause the code to call fetch methods on non-SELECT statements and may raise database/DB-API errors. Ensure "WITH" queries are treated as SELECT only when a SELECT is actually present. [logic error]

Severity Level: Minor ⚠️

Suggested change

# CTE queries start with WITH and contain SELECT

return sql_lower.startswith("select") or sql_lower.startswith("with")

# Treat CTE queries that start with WITH as SELECT-like only if they contain a SELECT clause

return sql_lower.startswith("select") or (

sql_lower.startswith("with") and "select" in sql_lower

)

Why it matters? ⭐

Correct. The current implementation treats any statement starting with WITH as a SELECT, which is wrong for CTEs followed by DML (e.g., "WITH ... INSERT ..."). The proposed change to require a SELECT token when the SQL starts with WITH fixes a real logic bug and prevents fetch-based processing on non-SELECT statements.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** superset/mcp_service/sql_lab/sql_lab_utils.py **Line:** 214:215 **Comment:** *Logic Error: Logic bug: CTEs that start with "WITH" but are not SELECT statements (e.g. "WITH ... INSERT ...") are currently classified as SELECT queries; this will cause the code to call fetch methods on non-SELECT statements and may raise database/DB-API errors. Ensure "WITH" queries are treated as SELECT only when a SELECT is actually present. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

You should use SQLScript and SQLStatement from superset/sql/parse.py for all your parsing/AST checks/AST manipulation needs. Something like this should work:

script = SQLScript(sql, database.db_engine_spec.engine) if script.has_mutation(): _process_select_results(cursor, results, limit) else: _process_dml_results(cursor, conn, results)

codeant-ai-for-open-source · 2025-12-12T22:46:59Z

tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py

+
+    def test_simple_select(self):
+        """Test simple SELECT query is recognized."""
+        assert _is_select_query("SELECT * FROM table1") is True


Suggestion: Tests use identity comparison (is True) to check boolean return values. Using is to compare booleans can be unreliable and is considered a logic error; use a direct assertion or truthiness check instead to avoid surprising failures. [logic error]

Severity Level: Minor ⚠️

Suggested change

assert _is_select_query("SELECT * FROM table1") is True

assert _is_select_query("SELECT * FROM table1")

Why it matters? ⭐

The test currently uses identity comparison (is True) which is brittle: if the implementation ever returns a truthy non-bool (e.g., 1, numpy.bool_), the identity check can fail even though behavior is correct. Replacing with a plain truthiness check is safer and idiomatic for tests.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py **Line:** 29:29 **Comment:** *Logic Error: Tests use identity comparison (`is True`) to check boolean return values. Using `is` to compare booleans can be unreliable and is considered a logic error; use a direct assertion or truthiness check instead to avoid surprising failures. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai-for-open-source · 2025-12-12T22:46:59Z

tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py

+        )
+        SELECT * FROM cte_name
+        """
+        assert _is_select_query(cte_sql) is True


Suggestion: A multiline CTE test asserts the boolean result using is True; identity comparison should be replaced with a plain assert to reliably check truthiness of the function result. [logic error]

Severity Level: Minor ⚠️

Suggested change

assert _is_select_query(cte_sql) is True

assert _is_select_query(cte_sql)

Why it matters? ⭐

Same as above — identity comparison is unnecessary and can produce false negatives if the function returns a truthy non-bool. Using assert _is_select_query(cte_sql) is clearer and more robust.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py **Line:** 47:47 **Comment:** *Logic Error: A multiline CTE test asserts the boolean result using `is True`; identity comparison should be replaced with a plain assert to reliably check truthiness of the function result. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai-for-open-source · 2025-12-12T22:46:59Z

tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py

+
+    def test_insert_not_select(self):
+        """Test INSERT query is not recognized as SELECT."""
+        assert _is_select_query("INSERT INTO table1 VALUES (1)") is False


Suggestion: Tests use is False to check boolean falsehood; identity comparison should be replaced with assert not <expr> to avoid identity vs equality pitfalls and make intent clearer. [logic error]

Severity Level: Minor ⚠️

Suggested change

assert _is_select_query("INSERT INTO table1 VALUES (1)") is False

assert not _is_select_query("INSERT INTO table1 VALUES (1)")

Why it matters? ⭐

Identity comparison to False has the same pitfalls as is True. Using assert not <expr> is the idiomatic and reliable way to assert falsiness in tests.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py **Line:** 78:78 **Comment:** *Logic Error: Tests use `is False` to check boolean falsehood; identity comparison should be replaced with `assert not <expr>` to avoid identity vs equality pitfalls and make intent clearer. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai-for-open-source · 2025-12-12T22:46:59Z

tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py

+
+    def test_update_not_select(self):
+        """Test UPDATE query is not recognized as SELECT."""
+        assert _is_select_query("UPDATE table1 SET col = 1") is False


Suggestion: Replace is False boolean identity comparison with assert not in update query test to avoid identity comparison issues and make the assertion semantics correct and idiomatic. [logic error]

Severity Level: Minor ⚠️

Suggested change

assert _is_select_query("UPDATE table1 SET col = 1") is False

assert not _is_select_query("UPDATE table1 SET col = 1")

Why it matters? ⭐

The recommendation is correct and improves test reliability/clarity; using assert not _is_select_query(...) avoids identity pitfalls and matches Python testing best practices.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py **Line:** 82:82 **Comment:** *Logic Error: Replace `is False` boolean identity comparison with `assert not` in update query test to avoid identity comparison issues and make the assertion semantics correct and idiomatic. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai-for-open-source · 2025-12-12T22:47:07Z

CodeAnt AI finished reviewing your PR.

codecov · 2025-12-12T22:52:06Z

Codecov Report

❌ Patch coverage is 0% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.76%. Comparing base (edcb385) to head (6992af9).
⚠️ Report is 102 commits behind head on master.

Files with missing lines	Patch %	Lines
superset/mcp_service/sql_lab/sql_lab_utils.py	0.00%	20 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #36599       +/-   ##
===========================================
+ Coverage        0   67.76%   +67.76%     
===========================================
  Files           0      637      +637     
  Lines           0    47103    +47103     
  Branches        0     5130     +5130     
===========================================
+ Hits            0    31921    +31921     
- Misses          0    13905    +13905     
- Partials        0     1277     +1277

Flag	Coverage Δ
hive	`43.52% <0.00%> (?)`
mysql	`66.83% <0.00%> (?)`
postgres	`66.88% <0.00%> (?)`
presto	`47.15% <0.00%> (?)`
python	`67.73% <0.00%> (?)`
sqlite	`66.59% <0.00%> (?)`
unit	`100.00% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

betodealmeida · 2025-12-13T01:21:41Z

superset/mcp_service/sql_lab/sql_lab_utils.py

+    # CTE queries start with WITH and contain SELECT
+    return sql_lower.startswith("select") or sql_lower.startswith("with")


You should use SQLScript and SQLStatement from superset/sql/parse.py for all your parsing/AST checks/AST manipulation needs. Something like this should work:

script = SQLScript(sql, database.db_engine_spec.engine) if script.has_mutation(): _process_select_results(cursor, results, limit) else: _process_dml_results(cursor, conn, results)

Per reviewer feedback, replaced simple string-based query detection with proper SQL parsing using SQLScript from superset/sql/parse.py. Changes: - Use SQLScript.has_mutation() to determine if query is SELECT-like - Updated tests to verify SQLScript mutation detection for CTEs - Removed the now-unused _is_select_query helper function

codeant-ai-for-open-source · 2025-12-13T16:55:37Z

CodeAnt AI is running Incremental review

Per reviewer feedback, now using SQLScript from superset/sql/parse.py for all SQL parsing needs: - validate_sql_query: uses SQLScript.has_mutation() for DML detection and check_functions_present() for disallowed functions - _apply_limit: uses SQLScript to parse and add LIMIT via AST - _execute_query: uses SQLScript.has_mutation() for query type detection This ensures consistent, robust SQL parsing across all operations.

SQLScript now correctly detects DROP TABLE as a mutation and blocks it before execution (improved security). Updated test to expect DML_NOT_ALLOWED error instead of EXECUTION_ERROR.

betodealmeida

Awesome!

pull-request-size bot added the size/L label Dec 12, 2025

dosubot bot added the change:backend Requires changing the backend label Dec 12, 2025

codeant-ai-for-open-source bot added the size:L This PR changes 100-499 lines, ignoring generated files label Dec 12, 2025

codeant-ai-for-open-source bot reviewed Dec 12, 2025

View reviewed changes

betodealmeida requested changes Dec 13, 2025

View reviewed changes

aminghadersohi changed the title ~~fix(mcp): handle CTE queries in execute_sql tool~~ fix(mcp): use SQLScript for all SQL parsing in execute_sql Dec 13, 2025

test(mcp): update SQL injection test for SQLScript behavior

6992af9

SQLScript now correctly detects DROP TABLE as a mutation and blocks it before execution (improved security). Updated test to expect DML_NOT_ALLOWED error instead of EXECUTION_ERROR.

betodealmeida approved these changes Dec 15, 2025

View reviewed changes

sadpandajoe added the merge-if-green If approved and tests are green, please go ahead and merge it for me label Dec 16, 2025

rusackas merged commit e3e6b0e into apache:master Dec 21, 2025
78 of 79 checks passed

dosubot bot mentioned this pull request Jan 7, 2026

[6.0.0] Superset SQLab+starrocks many statement does't support #36939

Open

3 tasks

		# CTE queries start with WITH and contain SELECT
		return sql_lower.startswith("select") or sql_lower.startswith("with")

-    # CTE queries start with WITH and contain SELECT
-    return sql_lower.startswith("select") or sql_lower.startswith("with")
+    # Treat CTE queries that start with WITH as SELECT-like only if they contain a SELECT clause
+    return sql_lower.startswith("select") or (
+        sql_lower.startswith("with") and "select" in sql_lower
+    )

	assert _is_select_query("SELECT * FROM table1") is True
	assert _is_select_query("SELECT * FROM table1")

	assert _is_select_query(cte_sql) is True
	assert _is_select_query(cte_sql)

	assert _is_select_query("INSERT INTO table1 VALUES (1)") is False
	assert not _is_select_query("INSERT INTO table1 VALUES (1)")

	assert _is_select_query("UPDATE table1 SET col = 1") is False
	assert not _is_select_query("UPDATE table1 SET col = 1")

fix(mcp): use SQLScript for all SQL parsing in execute_sql #36599

fix(mcp): use SQLScript for all SQL parsing in execute_sql #36599

Uh oh!

Conversation

aminghadersohi commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing Instructions

Additional Information

Uh oh!

codeant-ai-for-open-source bot commented Dec 12, 2025

Uh oh!

bito-code-review bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Agent Run #1a3f82

Uh oh!

codeant-ai-for-open-source bot commented Dec 12, 2025

Nitpicks 🔍

Uh oh!

codeant-ai-for-open-source bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

betodealmeida Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot commented Dec 12, 2025

Uh oh!

codecov bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

betodealmeida Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot commented Dec 13, 2025

Uh oh!

betodealmeida left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aminghadersohi commented Dec 12, 2025 •

edited

Loading

bito-code-review bot commented Dec 12, 2025 •

edited

Loading

codecov bot commented Dec 12, 2025 •

edited

Loading