flashinfer.comm.pack_strided_memory

flashinfer.comm.pack_strided_memory(ptr: int, segment_size: int, segment_stride: int, num_segments: int, dtype: dtype, dev_id)

Pack GPU memory into a PyTorch tensor with specified stride.

Parameters:
  • ptr – GPU memory address obtained from cudaMalloc

  • segment_size – Memory size of each segment in bytes

  • segment_stride – Memory stride size between segments in bytes

  • num_segments – Number of segments

  • dtype – PyTorch data type for the resulting tensor

  • dev_id – CUDA device ID

Returns:

PyTorch tensor that references the provided memory

Note

This function creates a new DLPack capsule each time it’s called, even with the same pointer. Each capsule is consumed only once.