[RFC] Proposal: Architecture Modernization – Native C++ Kernels, Model Hub & Lazy Dependencies

### What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

### Please describe your wishes

I am the creator of **[PerturbLab](https://github.com/krkawzq/PerturbLab)**. I recently built a comprehensive perturbation analysis framework from scratch in just **3 days** (verifiable via commit history). Based on this experience, I propose three architectural enhancements for Scanpy:

## 1. Native C++ Backends (It is painless now)

I propose moving high-performance kernels to **Native C++**.

* **Feasibility:** In the modern CI/CD era (GitHub Actions + `cibuildwheel`), cross-platform binary distribution is fully automated.
* **Maintenance:** With AI-assisted coding, writing C++ kernels is no longer a burden. We can focus on algorithm design while automating the implementation.
* **Proof:** In `PerturbLab/kernels`, I implemented sparse matrix operators in pure C++ that significantly outperform Numba.

## 2. Dependency Hygiene: Lazy Imports & Vendoring

* **Lazy Loading:** Heavy submodules (especially those requiring `torch` or specific plotting libs) should use lazy imports.
* **Vendoring:** Small utility functions should be "vendored" (inlined) rather than adding full package dependencies.
* **Benefit:** This keeps the core lightweight and prevents "dependency hell."

## 3. A "Transformers-like" Model Hub

I propose adding a standardized `sc.models` interface.

* In `perturblab/models`, I implemented a unified registry to manage, download, and deploy models (e.g., scGPT, Gears) with a consistent API (`config`, `model`, `io`).
* Scanpy is the ideal place to standardize this for the community.

### Alternative Solutions

Continuing to rely solely on Numba/Python for everything limits the potential for extreme optimization and restricts the ecosystem from effectively utilizing low-level hardware acceleration (CUDA/C++).

### Additional Context

My repository **[krkawzq/PerturbLab](https://github.com/krkawzq/PerturbLab)** serves as a proof-of-concept for this architecture. It demonstrates that a strictly typed, high-performance (C++ backed), and modular system can be built rapidly.

I am happy to discuss contributing the C++ kernels or the Model Hub design to help push this initiative forward. "The lower the level, the better the performance."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Proposal: Architecture Modernization – Native C++ Kernels, Model Hub & Lazy Dependencies #3934

What kind of feature would you like to request?

Please describe your wishes

1. Native C++ Backends (It is painless now)

2. Dependency Hygiene: Lazy Imports & Vendoring

3. A "Transformers-like" Model Hub

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Proposal: Architecture Modernization – Native C++ Kernels, Model Hub & Lazy Dependencies #3934

Description

What kind of feature would you like to request?

Please describe your wishes

1. Native C++ Backends (It is painless now)

2. Dependency Hygiene: Lazy Imports & Vendoring

3. A "Transformers-like" Model Hub

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions