Skip to content

Conversation

@aminghadersohi
Copy link
Contributor

@aminghadersohi aminghadersohi commented Dec 12, 2025

Summary

Fixes a bug where CTE (Common Table Expression) queries using WITH ... AS ... SELECT syntax were returning empty rows in the MCP execute_sql tool response, even though the query executed successfully.

Root Cause: Query type detection was using simple string matching, which failed for CTEs that start with WITH.

Changes

Now using SQLScript from superset/sql/parse.py consistently for all SQL parsing needs:

Function Before After
validate_sql_query String startswith() for DML keywords SQLScript.has_mutation()
validate_sql_query String matching for disallowed functions SQLScript.check_functions_present()
_apply_limit String matching + concatenation SQLScript.set_limit_value() via AST
_execute_query String matching SQLScript.has_mutation()

Testing Instructions

# Before fix: returns {"rows": [], "affected_rows": 1}
# After fix: returns {"rows": [{"val": "test"}], "row_count": 1}

WITH simple AS (SELECT 'test' as val) SELECT val FROM simple

Run unit tests:

pytest tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py -v

Additional Information

  • Unit tests added (14 tests)
  • No breaking changes
  • Follows conventional commits
  • Uses existing SQL parsing infrastructure per reviewer feedback

CTE queries (WITH ... AS ... SELECT) were returning empty rows because
_is_select_query() only checked for queries starting with "select".
CTEs start with "with", causing them to be routed to _process_dml_results()
instead of _process_select_results().

Changes:
- Updated _is_select_query() to recognize WITH clauses as SELECT queries
- Updated _apply_limit() to add LIMIT to CTE queries
- Added comprehensive unit tests for CTE query handling
@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI is reviewing your PR.

@bito-code-review
Copy link
Contributor

bito-code-review bot commented Dec 12, 2025

Code Review Agent Run #1a3f82

Actionable Suggestions - 0
Additional Suggestions - 2
  • superset/mcp_service/sql_lab/sql_lab_utils.py - 2
    • Incomplete CTE Detection Logic · Line 162-163
      The logic for detecting SELECT-like queries should check that WITH clauses contain 'select' to avoid misclassifying queries like 'WITH cte AS (SELECT 1) INSERT INTO table SELECT * FROM cte' as SELECT queries, which could lead to incorrect LIMIT application and result processing.
    • Incomplete CTE Detection Logic · Line 215-215
      Similarly, _is_select_query should ensure WITH queries contain 'select' to prevent false positives for non-SELECT statements.
Review Details
  • Files reviewed - 2 · Commit Range: 930a36b..930a36b
    • superset/mcp_service/sql_lab/sql_lab_utils.py
    • tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot bot added the change:backend Requires changing the backend label Dec 12, 2025
@codeant-ai-for-open-source codeant-ai-for-open-source bot added the size:L This PR changes 100-499 lines, ignoring generated files label Dec 12, 2025
@codeant-ai-for-open-source
Copy link
Contributor

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Multi-statement handling
    The code appends LIMIT to the whole SQL string. For multi-statement payloads (e.g. WITH ...; INSERT ...) or SQL with trailing comments, this may add LIMIT to the wrong statement. Consider detecting and modifying only the final SELECT statement.

  • Coarse LIMIT detection
    _apply_limit uses a simple substring check ("limit" in sql_lower) to decide whether a LIMIT already exists.
    This will skip adding an outer LIMIT if any inner CTE or comment contains the word "limit". That can leave
    outer SELECTs unbounded in cases where only inner queries include LIMIT and the outer query lacks one.

  • LIMIT detection
    _apply_limit() checks for the substring "limit" anywhere in sql_lower, which can produce false positives (matching "unlimited" or identifiers) and false negatives if LIMIT appears in a different statement. Use a word-boundary-aware regex to detect a standalone LIMIT clause.

  • CTE detection
    _is_select_query() treats any SQL that starts with WITH as a SELECT/CTE without verifying that a SELECT actually appears inside the statement or accounting for variants like WITH RECURSIVE or leading comments. This can misclassify non-CTE statements, or miss CTEs preceded by comments or parentheses.

  • Leading comments not handled
    _is_select_query strips whitespace but does not account for leading SQL comments (e.g., "-- ..." or "/* ... */").
    If a query begins with a comment followed by a WITH/SELECT, startswith checks will fail and the query may be
    misclassified (not treated as a SELECT-like query).

Comment on lines 214 to 215
# CTE queries start with WITH and contain SELECT
return sql_lower.startswith("select") or sql_lower.startswith("with")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Logic bug: CTEs that start with "WITH" but are not SELECT statements (e.g. "WITH ... INSERT ...") are currently classified as SELECT queries; this will cause the code to call fetch methods on non-SELECT statements and may raise database/DB-API errors. Ensure "WITH" queries are treated as SELECT only when a SELECT is actually present. [logic error]

Severity Level: Minor ⚠️

Suggested change
# CTE queries start with WITH and contain SELECT
return sql_lower.startswith("select") or sql_lower.startswith("with")
# Treat CTE queries that start with WITH as SELECT-like only if they contain a SELECT clause
return sql_lower.startswith("select") or (
sql_lower.startswith("with") and "select" in sql_lower
)
Why it matters? ⭐

Correct. The current implementation treats any statement starting with WITH as a SELECT, which is wrong for CTEs followed by DML (e.g., "WITH ... INSERT ..."). The proposed change to require a SELECT token when the SQL starts with WITH fixes a real logic bug and prevents fetch-based processing on non-SELECT statements.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/mcp_service/sql_lab/sql_lab_utils.py
**Line:** 214:215
**Comment:**
	*Logic Error: Logic bug: CTEs that start with "WITH" but are not SELECT statements (e.g. "WITH ... INSERT ...") are currently classified as SELECT queries; this will cause the code to call fetch methods on non-SELECT statements and may raise database/DB-API errors. Ensure "WITH" queries are treated as SELECT only when a SELECT is actually present.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use SQLScript and SQLStatement from superset/sql/parse.py for all your parsing/AST checks/AST manipulation needs. Something like this should work:

script = SQLScript(sql, database.db_engine_spec.engine)
if script.has_mutation():
    _process_select_results(cursor, results, limit)
else:
    _process_dml_results(cursor, conn, results)


def test_simple_select(self):
"""Test simple SELECT query is recognized."""
assert _is_select_query("SELECT * FROM table1") is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Tests use identity comparison (is True) to check boolean return values. Using is to compare booleans can be unreliable and is considered a logic error; use a direct assertion or truthiness check instead to avoid surprising failures. [logic error]

Severity Level: Minor ⚠️

Suggested change
assert _is_select_query("SELECT * FROM table1") is True
assert _is_select_query("SELECT * FROM table1")
Why it matters? ⭐

The test currently uses identity comparison (is True) which is brittle: if the implementation ever returns a truthy non-bool (e.g., 1, numpy.bool_), the identity check can fail even though behavior is correct. Replacing with a plain truthiness check is safer and idiomatic for tests.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py
**Line:** 29:29
**Comment:**
	*Logic Error: Tests use identity comparison (`is True`) to check boolean return values. Using `is` to compare booleans can be unreliable and is considered a logic error; use a direct assertion or truthiness check instead to avoid surprising failures.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

)
SELECT * FROM cte_name
"""
assert _is_select_query(cte_sql) is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: A multiline CTE test asserts the boolean result using is True; identity comparison should be replaced with a plain assert to reliably check truthiness of the function result. [logic error]

Severity Level: Minor ⚠️

Suggested change
assert _is_select_query(cte_sql) is True
assert _is_select_query(cte_sql)
Why it matters? ⭐

Same as above — identity comparison is unnecessary and can produce false negatives if the function returns a truthy non-bool. Using assert _is_select_query(cte_sql) is clearer and more robust.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py
**Line:** 47:47
**Comment:**
	*Logic Error: A multiline CTE test asserts the boolean result using `is True`; identity comparison should be replaced with a plain assert to reliably check truthiness of the function result.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.


def test_insert_not_select(self):
"""Test INSERT query is not recognized as SELECT."""
assert _is_select_query("INSERT INTO table1 VALUES (1)") is False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Tests use is False to check boolean falsehood; identity comparison should be replaced with assert not <expr> to avoid identity vs equality pitfalls and make intent clearer. [logic error]

Severity Level: Minor ⚠️

Suggested change
assert _is_select_query("INSERT INTO table1 VALUES (1)") is False
assert not _is_select_query("INSERT INTO table1 VALUES (1)")
Why it matters? ⭐

Identity comparison to False has the same pitfalls as is True. Using assert not <expr> is the idiomatic and reliable way to assert falsiness in tests.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py
**Line:** 78:78
**Comment:**
	*Logic Error: Tests use `is False` to check boolean falsehood; identity comparison should be replaced with `assert not <expr>` to avoid identity vs equality pitfalls and make intent clearer.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.


def test_update_not_select(self):
"""Test UPDATE query is not recognized as SELECT."""
assert _is_select_query("UPDATE table1 SET col = 1") is False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Replace is False boolean identity comparison with assert not in update query test to avoid identity comparison issues and make the assertion semantics correct and idiomatic. [logic error]

Severity Level: Minor ⚠️

Suggested change
assert _is_select_query("UPDATE table1 SET col = 1") is False
assert not _is_select_query("UPDATE table1 SET col = 1")
Why it matters? ⭐

The recommendation is correct and improves test reliability/clarity; using assert not _is_select_query(...) avoids identity pitfalls and matches Python testing best practices.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/mcp_service/sql_lab/test_sql_lab_utils.py
**Line:** 82:82
**Comment:**
	*Logic Error: Replace `is False` boolean identity comparison with `assert not` in update query test to avoid identity comparison issues and make the assertion semantics correct and idiomatic.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI finished reviewing your PR.

@codecov
Copy link

codecov bot commented Dec 12, 2025

Codecov Report

❌ Patch coverage is 0% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.76%. Comparing base (edcb385) to head (6992af9).
⚠️ Report is 102 commits behind head on master.

Files with missing lines Patch % Lines
superset/mcp_service/sql_lab/sql_lab_utils.py 0.00% 20 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #36599       +/-   ##
===========================================
+ Coverage        0   67.76%   +67.76%     
===========================================
  Files           0      637      +637     
  Lines           0    47103    +47103     
  Branches        0     5130     +5130     
===========================================
+ Hits            0    31921    +31921     
- Misses          0    13905    +13905     
- Partials        0     1277     +1277     
Flag Coverage Δ
hive 43.52% <0.00%> (?)
mysql 66.83% <0.00%> (?)
postgres 66.88% <0.00%> (?)
presto 47.15% <0.00%> (?)
python 67.73% <0.00%> (?)
sqlite 66.59% <0.00%> (?)
unit 100.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines 214 to 215
# CTE queries start with WITH and contain SELECT
return sql_lower.startswith("select") or sql_lower.startswith("with")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use SQLScript and SQLStatement from superset/sql/parse.py for all your parsing/AST checks/AST manipulation needs. Something like this should work:

script = SQLScript(sql, database.db_engine_spec.engine)
if script.has_mutation():
    _process_select_results(cursor, results, limit)
else:
    _process_dml_results(cursor, conn, results)

Per reviewer feedback, replaced simple string-based query detection with
proper SQL parsing using SQLScript from superset/sql/parse.py.

Changes:
- Use SQLScript.has_mutation() to determine if query is SELECT-like
- Updated tests to verify SQLScript mutation detection for CTEs
- Removed the now-unused _is_select_query helper function
@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI is running Incremental review

Per reviewer feedback, now using SQLScript from superset/sql/parse.py
for all SQL parsing needs:

- validate_sql_query: uses SQLScript.has_mutation() for DML detection
  and check_functions_present() for disallowed functions
- _apply_limit: uses SQLScript to parse and add LIMIT via AST
- _execute_query: uses SQLScript.has_mutation() for query type detection

This ensures consistent, robust SQL parsing across all operations.
@aminghadersohi aminghadersohi changed the title fix(mcp): handle CTE queries in execute_sql tool fix(mcp): use SQLScript for all SQL parsing in execute_sql Dec 13, 2025
SQLScript now correctly detects DROP TABLE as a mutation and blocks
it before execution (improved security). Updated test to expect
DML_NOT_ALLOWED error instead of EXECUTION_ERROR.
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@sadpandajoe sadpandajoe added the merge-if-green If approved and tests are green, please go ahead and merge it for me label Dec 16, 2025
@rusackas rusackas merged commit e3e6b0e into apache:master Dec 21, 2025
78 of 79 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend merge-if-green If approved and tests are green, please go ahead and merge it for me size/L size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants