flashinfer.comm.pack_strided_memory¶
- flashinfer.comm.pack_strided_memory(ptr: int, segment_size: int, segment_stride: int, num_segments: int, dtype: dtype, dev_id)¶
Pack GPU memory into a PyTorch tensor with specified stride.
- Parameters:
ptr – GPU memory address obtained from cudaMalloc
segment_size – Memory size of each segment in bytes
segment_stride – Memory stride size between segments in bytes
num_segments – Number of segments
dtype – PyTorch data type for the resulting tensor
dev_id – CUDA device ID
- Returns:
PyTorch tensor that references the provided memory
Note
This function creates a new DLPack capsule each time it’s called, even with the same pointer. Each capsule is consumed only once.