Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 4, 2025

Why make this change?

Closes #2704

Vector-type columns can be large and are rarely read directly. This adds configuration to omit them by default, improving performance and reducing payload size while allowing explicit opt-in.

What is this change?

Adds include-vector-fields-by-default property to data-source configuration (MSSQL only):

  • DataSource object model: New IncludeVectorFieldsByDefault property (default: false)
  • JSON schema: Property definition with validation constraining true to MSSQL only
  • DataSource converter: Read/write support with user-provided tracking for conditional serialization
  • CLI: --data-source.include-vector-fields-by-default option in dab configure
  • Validation: Runtime validation rejects option for non-MSSQL database types
  • ColumnDefinition: Added IsVectorType property to track vector columns
  • MsSqlMetadataProvider: Detects vector columns by checking if DATA_TYPE is "vector"
  • AuthorizationResolver: Filters out vector columns from allowed column lists when IncludeVectorFieldsByDefault is false

When include-vector-fields-by-default is false (default):

  1. Vector columns are detected during schema discovery and marked with IsVectorType = true
  2. Vector columns are filtered from allowed columns in authorization resolution
  3. GraphQL schema excludes vector fields (since they're not in allowed columns)
  4. REST queries exclude vector columns (since they're not in allowed columns)
  5. MCP tools exclude vector columns (using the same authorization resolution)
{
  "data-source": {
    "database-type": "mssql",
    "connection-string": "...",
    "include-vector-fields-by-default": false
  }
}

How was this tested?

  • Integration Tests
  • Unit Tests

Added 4 test cases:

  • Setting option to true/false for MSSQL
  • Validation failure for MySQL/PostgreSQL

Sample Request(s)

# Enable vector fields in results (MSSQL only)
dab configure --data-source.include-vector-fields-by-default true

# Disable vector fields (default behavior)
dab configure --data-source.include-vector-fields-by-default false
Original prompt

This section details on the original issue you should resolve

<issue_title>[Enh]: Omit Vector from results</issue_title>
<issue_description>## What?

Vector-type columns are omitted from results by default.

Behavior

  • Equivalent to using the exclude permission property, but applied globally.
  • Applies only to mssql data sources (for now).

Why?

Vector columns can be large and are rarely read directly. Omitting them by default improves performance and reduces payload size, while still allowing explicit opt-in.

How?

  • Add include-vector-fields-by-default property to configuration.
  • Update JSON schema (true valid only when database-type is mssql).
  • Update dab validate (true valid only when database-type is mssql).
  • Add dab configure --data-source.include-vector-fields-by-default to CLI.
  • REST: Omit vector fields by default; include when explicitly selected.
  • GraphQL: Omit vector fields by default; include when explicitly requested.
  • MCP: Omit vector fields by default; include when explicitly requested.

Configuration

{
  "data-source": {
    "include-vector-fields-by-default": false // default
  }
}

Command Line

dab configure --data-source.include-vector-fields-by-default false
dab configure --data-source.include-vector-fields-by-default true
```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…tion

- Add IncludeVectorFieldsByDefault property to DataSource record (default: false)
- Add UserProvidedIncludeVectorFieldsByDefault flag for serialization control
- Update DataSourceConverterFactory to read/write the new property
- Update JSON schema with include-vector-fields-by-default property and validation
- Add --data-source.include-vector-fields-by-default CLI option to ConfigureOptions
- Update ConfigGenerator to handle the new CLI option with MSSQL validation
- Add validation in RuntimeConfigValidator for non-MSSQL database types
- Add unit tests for the new CLI configure option

Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Copilot AI changed the title [WIP] Update configuration to omit vector fields by default Add include-vector-fields-by-default property to DataSource configuration Dec 4, 2025
Copilot AI requested a review from JerryNixon December 4, 2025 03:48
@JerryNixon
Copy link
Contributor

@copilot The PR doesn't modify:

  • REST query builders to exclude vector columns
  • GraphQL schema generation to omit vector fields
  • Column selection logic in metadata providers
  • Permission/projection handling for vector types

The configuration can be set, but has no effect on actual query results.

Copy link
Contributor Author

Copilot AI commented Dec 4, 2025

@copilot The PR doesn't modify:

  • REST query builders to exclude vector columns
    ...

You're correct. This PR implements only the configuration infrastructure for include-vector-fields-by-default. The actual functionality to omit vector columns from query results requires additional implementation:

  1. Vector type detection in MsSqlMetadataProvider (SQL Server's vector type is returned as empty string in DATA_TYPE column, similar to UDT types)
  2. ColumnDefinition enhancement to track whether a column is a vector type
  3. REST/GraphQL/MCP filtering to exclude vector columns when IncludeVectorFieldsByDefault is false

I'll work on implementing the complete functionality. The vector type detection and filtering logic needs to be added to the metadata provider and query builders.

- Add IsVectorType property to ColumnDefinition for tracking vector columns
- Update MsSqlMetadataProvider to detect vector type columns during schema discovery
- Modify AuthorizationResolver.ResolveEntityDefinitionColumns to filter out vector columns when IncludeVectorFieldsByDefault is false
- Update SetEntityPermissionMap to pass includeVectorFields setting from DataSource config

Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
@aaronburtle aaronburtle self-assigned this Dec 15, 2025
@JerryNixon JerryNixon marked this pull request as ready for review January 21, 2026 21:51
Copilot AI review requested due to automatic review settings January 21, 2026 21:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an MSSQL-only runtime configuration switch to omit vector-type columns from API results by default, reducing payload size and improving performance, with an opt-in to include them.

Changes:

  • Introduces include-vector-fields-by-default in the DataSource object model, CLI (dab configure), JSON schema, and runtime validation.
  • Detects MSSQL vector columns during metadata discovery and tracks them on ColumnDefinition.
  • Filters vector columns out of authorization-resolved allowed column sets when the setting is disabled.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Core/Services/MetadataProviders/MsSqlMetadataProvider.cs Marks columns as vector type based on MSSQL DATA_TYPE.
src/Core/Configurations/RuntimeConfigValidator.cs Adds MSSQL-only validation for the new config flag.
src/Core/Authorization/AuthorizationResolver.cs Excludes vector columns from resolved columns when configured.
src/Config/ObjectModel/DataSource.cs Adds IncludeVectorFieldsByDefault plus user-provided tracking flag.
src/Config/DatabasePrimitives/DatabaseObject.cs Adds ColumnDefinition.IsVectorType metadata flag.
src/Config/Converters/DataSourceConverterFactory.cs Reads/writes the new property and conditionally serializes it.
src/Cli/ConfigGenerator.cs Implements CLI update behavior and MSSQL-only enforcement.
src/Cli/Commands/ConfigureOptions.cs Adds CLI option --data-source.include-vector-fields-by-default.
src/Cli.Tests/ConfigureOptionsTests.cs Adds unit tests for MSSQL enable/disable and non-MSSQL rejection.
schemas/dab.draft.schema.json Adds schema definition and MSSQL-only constraint (when true).
Comments suppressed due to low confidence (1)

src/Core/Authorization/AuthorizationResolver.cs:326

  • Vector filtering is only applied when resolving “all columns” (wildcard/implicit includes). If entityAction.Fields.Include explicitly lists a vector column, it bypasses ResolveEntityDefinitionColumns(..., includeVectorFields) and the vector column will still be allowed/returned even when IncludeVectorFieldsByDefault is false. If the goal is to omit vector columns globally unless the data-source setting is true, consider filtering vector columns out of explicit include lists (or out of allowedColumns) as well.
                        else
                        {
                            operationToColumn.Included = entityAction.Fields.Include;
                        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +145 to +149
case "include-vector-fields-by-default":
if (reader.TokenType is JsonTokenType.True or JsonTokenType.False)
{
includeVectorFieldsByDefault = reader.GetBoolean();
userProvidedIncludeVectorFieldsByDefault = true;
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include-vector-fields-by-default parsing currently falls through silently for invalid inputs (e.g., non-boolean token types, or a string that doesn’t parse to bool), which can hide config mistakes and leaves UserProvidedIncludeVectorFieldsByDefault false. Consider throwing a JsonException when the value isn’t a boolean or a parseable boolean string (similar to how options rejects unexpected values).

Copilot uses AI. Check for mistakes.
Comment on lines +191 to +195
// Write health if present
if (value.Health is not null)
{
writer.WritePropertyName("health");
JsonSerializer.Serialize(writer, value.Health, options);
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

health is written whenever value.Health is non-null, but DatasourceHealthCheckConfig’s converter only writes JSON when UserProvidedEnabled is true. If Health is non-null with UserProvidedEnabled == false (e.g., config contains "health": null), this can cause serialization to write a property name with no value / throw during serialization. Consider only writing the health property when the health config has user-provided content (or ensure the health converter writes an explicit null/object in the non-user-provided case).

Suggested change
// Write health if present
if (value.Health is not null)
{
writer.WritePropertyName("health");
JsonSerializer.Serialize(writer, value.Health, options);
// Write health only if user provided health configuration
if (value.Health is DatasourceHealthCheckConfig healthConfig && healthConfig.UserProvidedEnabled)
{
writer.WritePropertyName("health");
JsonSerializer.Serialize(writer, healthConfig, options);

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +124
// include-vector-fields-by-default is only valid for mssql database type
if (dataSource.IncludeVectorFieldsByDefault && dataSource.DatabaseType != DatabaseType.MSSQL)
{
HandleOrRecordException(new DataApiBuilderException(
message: "The include-vector-fields-by-default option is only valid for MSSQL database type.",
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime validation only rejects this option for non-MSSQL when the value is true. If the intent is “option is only valid/applicable for MSSQL” (as the message says and as the CLI enforces), consider validating based on UserProvidedIncludeVectorFieldsByDefault instead, so configs that explicitly set it (even to false) on non-MSSQL are rejected consistently.

Copilot uses AI. Check for mistakes.
// that are accessible to the user.
HashSet<string> allowedColumns = new();
IEnumerable<string> allTableColumns = ResolveEntityDefinitionColumns(entityName, metadataProvider);
IEnumerable<string> allTableColumns = ResolveEntityDefinitionColumns(entityName, metadataProvider, includeVectorFields);
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to allTableColumns is useless, since its value is never read.

Suggested change
IEnumerable<string> allTableColumns = ResolveEntityDefinitionColumns(entityName, metadataProvider, includeVectorFields);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enh]: Omit Vector from results

3 participants