Skip to content

Proposal: SDK AutoconfigurationΒ #4856

@herin049

Description

@herin049

Background

My team at my employer has been using the OpenTelemetry Python SDK in production for the past year now. One issue that we have consistently run into is managing the configuration of the SDK across multiple services. While the auto instrumentation library, custom configurators, distros, etc. have helped, there are still many instances where specific services have specific instrumentation requirements and thus, we end up either end up writing custom SDK initialization code or relying heavily on environment variables to configure the SDK. We have considered writing our own configuration library to address this, but before doing so, I wanted to see if there was interest in adding a configuration system to the OpenTelemetry Python SDK itself.

I am aware that there were previous discussions about configuration in the OpenTelemetry Python SDK (e.g., #1035 and #663) however, it seems like these discussions have been dormant for a while and the proposed solutions did not seem sufficiently general.

Proposal: SDK Autoconfiguration

Summary

This proposal introduces a YAML based declarative configuration system for the Python OpenTelemetry SDK. This approach allows users to configure the SDK pipeline through a single configuration file, eliminating the need declaring a large number of environment variables or boilerplate initialization code.

Design Goals

A key goal of this proposal is to create a configuration format that is familiar to Python developers and OpenTelemetry users. The draft configuration format draws inspiration from both Python's builtin logging configuration format and the OpenTelemetry Collector's configuration format.

Concepts borrowed from Python's Logging Configuration

Python's logging.config module supports dictionary based configuration with internal references using the cfg:// and ext:// URI schemes. This proposal adopts the same pattern:

  • cfg:// references other objects defined within the configuration file (e.g., cfg://service.traces.exporters.console)
  • ext:// references external Python objects by their full name (e.g., ext://sys.stdout)

Concepts borrowed from OpenTelemetry Collector Configuration

The OpenTelemetry Collector also uses a YAML configuration format that has become standard. This proposal aligns with several of its conventions:

  • Named component instances using the type/name pattern (e.g., batch/console, otlp/primary)
  • Similar terminology for exporters, processors, and other components

Motivation

Reducing Excessive Environment Variable Usage

Currently, configuring the OpenTelemetry Python SDK often relies on numerous environment variables:

export OTEL_SERVICE_NAME=test-service
export OTEL_TRACES_EXPORTER=otlp
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_EXPORTER_OTLP_COMPRESSION=gzip
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.5
# ...

This approach would provide an alternative to having to define so many environment variables.

Reducing Boilerplate Code

Beyond environment variables, configuring the OpenTelemetry Python SDK often requires writing several lines initialization code when utilizing multiple signals. Users must manually instantiate providers, exporters, processors, and wire them together.

A declarative configuration approach partially addresses these issues by utilizing a more declarative style that minimizes boilerplate code.

Proposed Configuration Format (Draft)

The following is a draft configuration format intended to illustrate the approach and receive feedback. The specific structure, field names and conventions are open for discussion and refinement if the team chooses to move forward.

version: 1
service:
  resource:
    detectors:
      - otel 
      - process
      - os
      - host
    attributes:
      service.name: scratch-8-service
      service.version: 1.0.0
      service.environment: production
  traces:
    sampler: always_on
    id_generator: random 
    exporters:
      console:
      otlp:
        endpoint: "${env:MY_OTEL_TRACE_EXPORTER_OTLP_ENDPOINT}"
        compression: "gzip"
    processors:
      batch/console:
        span_exporter: cfg://service.traces.exporters.console
        max_queue_size: 2048
        schedule_delay_millis: 5000
        max_export_batch_size: 512 
        export_timeout_millis: 30000
      batch/otlp:
        span_exporter: cfg://service.traces.exporters.otlp
        max_queue_size: 2048
        schedule_delay_millis: 5000
        max_export_batch_size: 512
        export_timeout_millis: 30000
  metrics:
    exporters:
      console:
      otlp:
        endpoint: "${env:MY_OTEL_METRIC_EXPORTER_OTLP_ENDPOINT}"
        compression: "gzip"
    readers:
      periodic/console:
        exporter: cfg://service.metrics.exporters.console
        export_interval_millis: 60000
      periodic/otlp:
        exporter: cfg://service.metrics.exporters.otlp
        export_interval_millis: 60000
  logs:
    exporters:
      console:
      otlp:
        endpoint: "${env:MY_OTEL_LOG_EXPORTER_OTLP_ENDPOINT}"
        compression: "gzip"
    processors:
      batch/console:
        exporter: cfg://service.logs.exporters.console
      batch/otlp:
        exporter: cfg://service.logs.exporters.otlp

Key Features

Environment Variable Substitution and Extensible Providers

The ${env:VARIABLE_NAME} syntax allows sensitive or environment-specific values to be injected at runtime:

endpoint: "${env:MY_OTEL_TRACE_EXPORTER_OTLP_ENDPOINT}"

This substitution syntax is exactly as defined in the OpenTelemetry collector and is designed to be extensible. In the future, additional configuration providers could be supported very easily:

# Environment variables
endpoint: "${env:OTEL_ENDPOINT}"

# File based secrets
api_key: "${file:/run/secrets/otel_api_key}"

# HashiCorp Vault
api_key: "${vault:secret/data/otel#api_key}"

# AWS Secrets Manager
api_key: "${secretsmanager:arn:aws:secretsmanager:us-west-2:123456789012:secret:otel}"

# HTTP endpoint
config_value: "${http:https://config-server/otel/settings}"

...

This extensibility has numerous benefits as demonstrated in the OpenTelemetry Collector configuration specification.

Internal References (cfg://)

Borrowed from Python's logging.config module, the cfg:// scheme enables referencing other objects defined within the configuration. This feature is essential for wiring components together. For example, Span processors almost always require a Span exporter as seen below:

  ... 
  traces:
    exporters:
      console/basic:
    processors:
      batch/console:
        span_exporter: cfg://service.traces.exporters.console/basic
  ...

External References (ext://)

Python's logging configuration also supports ext:// for referencing external Python objects. This could be adopted to allow referencing custom components not registered via entry points:

  ...
  traces:
    exporters:
      console/stdout:
        output: ext://sys.stdout
      console/stderr:
        output: ext://sys.stderr
    processors:
      batch/stdout:
        span_exporter: cfg://service.traces.exporters.console/stdout
      batch/stderr: 
        span_exporter: cfg://service.traces.exporters.console/stderr
    ...

This syntax would be familiar to Python developers and provides an alternative for advanced use cases.

Named Instances

Components can be named using the type/name convention (e.g., batch/console, periodic/otlp), enabling multiple instances of the same component type with different configurations.

The type portion of the name will be resolved depending on the context using the already defined entrypoints in the OpenTelemetry Python SDK and contrib packages. For example, when the configuration loader encounters batch/console under traces.processors, it:

  1. Looks up the opentelemetry_traces_processor entry point group
    [project.entry-points.opentelemetry_traces_processor]
    batch = "opentelemetry.sdk.trace.export:BatchSpanProcessor"
    simple = "opentelemetry.sdk.trace.export:SimpleSpanProcessor"
    my_custom = "mycompany.telemetry:CustomSpanProcessor"
  2. Finds the entry registered as batch
  3. Instantiates that class with the provided configuration and the instance name console

This provides an unambiguous way to reference specific component types in a concise format and utilizes the existing plugin architecture.

Open Question: Configuration Mapping Strategy

One design question that I was unsure about is how configuration values should map to SDK object construction. Two alternatives I have considered are:

Option A: Direct Constructor Mapping (Proposed)

Configuration keys map directly to constructor keyword arguments. For example for BatchSpanProcessor:

class BatchSpanProcessor(SpanProcessor):
    def __init__(
        self,
        span_exporter: SpanExporter,
        max_queue_size: int | None = None,
        schedule_delay_millis: float | None = None,
        max_export_batch_size: int | None = None,
        export_timeout_millis: float | None = None,
    ):
        ...

Configuration:

batch/console:
  span_exporter: cfg://service.traces.exporters.console
  max_queue_size: 2048
  schedule_delay_millis: 5000
  max_export_batch_size: 512 
  export_timeout_millis: 30000

Pros:

  • Mirrors the API directly
  • No additional abstraction layer to maintain
  • Easy for users familiar with the SDK to understand

Cons:

  • Tightly couples configuration schema to constructor signatures
  • Constructor changes become breaking changes to configuration format
  • Cannot be applied to classes without suitable constructors (e.g. SynchronousMultiSpanProcessor)

Option B: Explicit Configuration Classes

Introduce separate entry points that define configuration schemas and handle object construction. For example:

[project.entry-points.opentelemetry_traces_processor_cfg]
batch = "opentelemetry.sdk.trace.export.config:BatchSpanProcessorConfig"

Where BatchSpanProcessorConfig might look like:

@dataclass
class BatchSpanProcessorConfig:
    span_exporter: SpanExporter
    max_queue_size: int = 2048
    schedule_delay_millis: float = 5000.0
    max_export_batch_size: int = 512
    export_timeout_millis: float = 30000.0
    
    @classmethod
    def from_config(cls, config: dict, context: ConfigContext) -> "BatchSpanProcessorConfig":
        """Parse and validate configuration."""
        # Custom validation, transformation, deprecation handling
        return cls(**config)
    
    def build(self) -> BatchSpanProcessor:
        """Construct the SDK object."""
        return BatchSpanProcessor(
            span_exporter=self.span_exporter,
            max_queue_size=self.max_queue_size,
            schedule_delay_millis=self.schedule_delay_millis,
            max_export_batch_size=self.max_export_batch_size,
            export_timeout_millis=self.export_timeout_millis,
        )

Pros:

  • Decouples configuration schema from constructors
  • Supports configuration transformations
  • Can provide schema introspection for documentation/tooling

Cons:

  • Additional abstraction layer to write and maintain
  • Higher barrier for third-party contributions

Describe alternatives you've considered

No response

Additional Context

Would you like to implement a fix?

Yes

Tip

React with πŸ‘ to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions