Backend reference

The backend reference explains how public operations select internal execution routes. It is written for users diagnosing behavior and for maintainers changing kernels. It deliberately avoids presenting internal route names as public APIs.

Backend layers

mlx-lattice has three relevant layers:

Public semantic layer

Python objects and operations such as SparseTensor, conv3d, pool3d, voxelize, and sparse_add.

Route-selection layer

Shared policy code that decides whether an operation can use a specialized route from dtype, shape, relation metadata, weight layout, and device capability.

Backend implementation layer

CPU and Metal code that executes the selected route. Metal code may further use classic threadgroup kernels, TensorOps, packed quantized kernels, or specialized sorted relation kernels.

The public semantic layer does not depend on backend-specific filenames or kernel names. The route-selection layer owns capability and shape predicates. Backend implementation files specialize below explicit route contracts.

Reading benchmark results

Benchmark results are interpreted by public input shape:

  • active rows N;

  • data distribution, such as isolated, plane, grid, or dense block;

  • channel count;

  • kernel geometry;

  • dtype or quantized weight layout;

  • device backend.

Internal route names are useful for maintainer diagnostics, but they are not a stable benchmark axis. The benchmark suite measures public operations and lets route selection make the same decision that normal user code receives.

Backend-to-API map

Backend topic

Public API

Semantic reference

Dispatch policy

Diagnostics API

Backend path selection

Sparse convolution

Convolution operations

Coordinates and relations

Sparse pooling

Pooling operations

Coordinates and relations

Quantized inference

Quantized weights

Quantization routes

Point/voxel utilities

Point/voxel operations

Point/voxel backend routes