Backend path selection¶
Path selection starts from the MLX device and the public operation. The Python layer builds semantic objects first: sparse tensors, coordinate keys, kernel relations, neighbor relations, point/voxel maps, or packed quantized weights. The native layer then evaluates those objects on CPU or Metal.
Decision order¶
Layer |
Predicate source |
Effect |
|---|---|---|
Device |
|
Selects CPU primitive evaluation or Metal primitive evaluation. |
Operation family |
Public function/module call |
Chooses convolution, pooling, coordinates, point/voxel, entropy, or feature execution. |
Semantic relation |
|
Defines row connectivity before backend kernels are considered. |
Storage |
Dense floating arrays or |
Selects floating or packed int4/int8 convolution/linear execution. |
Shape and dtype |
Channel count, kernel volume, layout, feature dtype, coordinate dtype |
Selects a specialized route, a generic route, or a validation error. |
Metal capability |
TensorOps capability tier |
Allows TensorOps routes on devices reporting neural acceleration. |
The public operation owns semantics. A backend route can change how the sum or reduction is evaluated, but it must evaluate the same relation contract.
Capability tiers¶
Metal TensorOps capability is computed by the native capability helper:
Tier |
Native predicate |
Routes enabled |
|---|---|---|
|
macOS availability/device family check fails |
Classic CPU/Metal routes only. |
|
Apple GPU family supports the baseline tensor API but not neural acceleration tier |
Some sorted quantized implicit-GEMM routes may be considered; neural accelerator-only TensorOps routes are not preferred. |
|
Device reports the neural-acceleration family |
TensorOps sorted fp16 convolution and TensorOps quantized contraction can be selected when shape predicates also match. |
The capability check is a route predicate, not a user-visible device API. Use
backend_info() for compiled backend availability; use benchmarks to observe
selected performance behavior for a public input.
Route predicates by operation¶
Operation |
Fast-path predicates |
Fallback or alternate route |
|---|---|---|
|
|
Feature matrix multiplication; quantized weights use quantized matmul. |
|
Non-pointwise or explicit target support |
Builds |
|
Odd kernel, stride 1, no coordinate expansion |
Builds a |
Transposed convolution |
|
Builds transposed or generative relation; sorted implicit-GEMM is not
selected because the relation kind is not |
Local pooling |
|
Metal/CPU sparse reduction kernels. |
Global pooling |
|
MLX scatter/reduction over batch row groups. |
Point/voxel utilities |
|
Native coordinate kernels for quantization, maps, and interpolation. |
Sparse algebra |
Shared coordinate identity or explicit join policy |
Native alignment helpers plus MLX feature arithmetic. |
Convolution route equations¶
Relation convolution evaluates:
Pooling evaluates:
where reduce is sum, max, or average. Average pooling divides by the sparse
edge count for the output row.
Validation boundaries¶
The route selector validates public contracts before native execution:
Metal sparse convolution and pooling require
int32coordinates.Floating convolution supports
float16andfloat32feature/weight matrices on native routes.Local pooling currently accepts
float32features.Packed quantized convolution requires int4 or int8 packed
uint32weights, scale/bias arrays matching feature dtype, and group size in{32, 64, 128}.Sorted fp16 implicit-GEMM routes require relation metadata that can produce the sorted implicit-GEMM view.
When a specialized route predicate fails, execution uses the more general route for the same public operation if the public input is otherwise valid.