Modeling of the thickness of the sensor#

In this notebook we will re-use the experiment done at ID28 and previously calibrated and model in 3D the detector.

This detector is a Pilatus 1M with a 450µm thick silicon sensor. Let’s first have a look at the absorption coefficients of this sensor material:

Reference absorption coeficients are available from: https://physics.nist.gov/PhysRefData/XrayMassCoef/ElemTab/z14.html

First we retieve the results of the previous step, then calculate the absorption efficiency:

[1]:

%matplotlib inline
# use `widget` instead of `inline` for better user-experience. `inline` allows to store plots into notebooks.

[2]:

import time
import os
from matplotlib.pyplot import subplots
import numpy
from scipy.sparse import save_npz
import pyFAI
import pyFAI.units
import pyFAI.detectors
from pyFAI.detectors.sensors import Si_MATERIAL
import json
from scipy.sparse import csr_matrix, linalg
try:
    import cpuinfo
except ImportError:
    cpu = ""
else:
    cpu = cpuinfo.cpuinfo.get_cpu_info().get("brand_raw")

start_time = time.perf_counter()
print(f"Using pyFAI vesrion: {pyFAI.version} on a {cpu} with {os.cpu_count()} threads")
with open("id28.json") as f:
    calib = json.load(f)

thickness = 450e-6
wavelength = calib["wavelength"]
dist = calib["param"][calib["param_names"].index("dist")]
poni1 = calib["param"][calib["param_names"].index("poni1")]
poni2 = calib["param"][calib["param_names"].index("poni2")]
energy = pyFAI.units.hc / (wavelength * 1e10)
print(f"wavelength: {wavelength:.3e}m,\t "
      f"dist: {dist:.3e}m,\t "
      f"poni1: {poni1:.3e}m,\t "
      f"poni2: {energy:.3e}m,\t "
      f"energy: {energy:.3f}keV")
mask = numpy.load("mask.npy").astype(numpy.int8)

Using pyFAI vesrion: 2025.10.0-dev0 on a AMD Ryzen Threadripper PRO 3975WX 32-Cores with 64 threads
wavelength: 6.968e-11m,  dist: 2.845e-01m,       poni1: 8.865e-02m,      poni2: 1.779e+01m,      energy: 17.793keV

Absorption coefficient at 17.8 keV#

[3]:

print(
    f"µ = {Si_MATERIAL.mu(energy, unit='cm'):.1f} cm^-1 "
    f"hence absorption efficiency for 450µm: {100 * Si_MATERIAL.absorbance(energy, thickness):.1f} %"
)
mu = Si_MATERIAL.mu(energy, unit="m")

µ = 16.4 cm^-1 hence absorption efficiency for 450µm: 52.3 %

[4]:

from pyFAI.detectors.sensors import Si_MATERIAL

depth = numpy.linspace(0, 1000, 100)
res = [1 - Si_MATERIAL.absorbance(energy, d, "µm") for d in depth]
fig, ax = subplots()
ax.plot(depth, res, "-")
ax.set_xlabel("Depth (µm)")
ax.set_ylabel("Residual signal")
ax.set_title(f"Silicon @ {energy:.1} keV");

../../../_images/usage_tutorial_ThickDetector_raytracing_5_0.png

This is consistent with: henke.lbl.gov or web-docs.gsi.de

Now we can model the detector

Modeling of the detector:#

The detector is represented as a 2D array of voxels. Let vox, voy, and voz denote the dimensions of the detector along the three axes.

[5]:

detector = pyFAI.detector_factory(calib["detector"])
print(detector)

vox = detector.pixel2  # this is not a typo
voy = detector.pixel1  # x <--> axis 2
voz = thickness

print(f"Voxel size: (x:{vox}, y:{voy}, z:{voz})")

Detector Pilatus 1M     PixelSize= 172µm, 172µm  BottomRight(3)
Voxel size: (x:0.000172, y:0.000172, z:0.00045)

The intensity grabbed in this voxel is the triple integral of the absorbed signal coming from this pixel or from the neighboring ones.

There are 3 ways to perform this intergral: * Volumetric analytic integral. Looks feasible with a change of variable in the depth * Slice per slice, the remaining intensity depand on the incidence angle + pixel splitting between neighbooring pixels * raytracing: the decay can be solved analytically for each ray, one has to throw many ray to average out the signal.

For sake of simplicity, this integral will be calculated numerically using this raytracing algorithm. http://www.cse.yorku.ca/~amana/research/grid.pdf

Knowing the input position for a X-ray on the detector and its propagation vector, this algorithm allows us to calculate the length of the path in all voxel it crosses in a fairly efficient way.

To speed up the calculation, we will use a few tricks: * One ray never crosses more than 16 pixels, which is reasonable considering the incidance angle * we use numba to speed-up the calculation of loops in python * We will allocate the needed memory by chuncks of 1 million elements

[6]:

! pip install numba cython

Requirement already satisfied: numba in /users/kieffer/.venv/py313/lib/python3.13/site-packages (0.62.1)
Requirement already satisfied: cython in /users/kieffer/.venv/py313/lib/python3.13/site-packages (3.0.12)
Requirement already satisfied: llvmlite<0.46,>=0.45.0dev0 in /users/kieffer/.venv/py313/lib/python3.13/site-packages (from numba) (0.45.1)
Requirement already satisfied: numpy<2.4,>=1.22 in /users/kieffer/.venv/py313/lib/python3.13/site-packages (from numba) (2.3.1)

Raytracing using numba#

[7]:

BLOCK_SIZE = 1 << 20  # 1 million
BUFFER_SIZE = 16
BIG = numpy.finfo(numpy.float32).max

Now that we are able to perform raytracing for any ray entering the detector, we can calculate the contribution to the neighboring pixels, using the absorption law (the length traveled is already known). To average-out the signal, we will sample a few dozens of rays per pixel to get an approximation of the volume integral.

Now we need to store the results so that this transformation can be represented as a sparse matrix multiplication:

b = M.a

Where b is the recorded image (blurred) and a is the “perfect” signal. M being the sparse matrix where every pixel of a gives a limited number of contribution to b.

Each pixel in b is represented by one line in M and we store the indices of a of interest with the coefficients of the matrix. So if a pixel i,j contributes to (i,j), (i+1,j), (i+1,j+1), there are only 3 elements in the line. This is advantageous for storage.

We will use the CSR sparse matrix representation: https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_.28CSR.2C_CRS_or_Yale_format.29 where there are 3 arrays: * data: containing the actual non zero values * indices: for a given line, it contains the column number of the associated data (at the same indice) * idptr: this array contains the index of the start of every line.

[8]:

from numba.experimental import jitclass
from numba import int8, int32, int64, float32, float64

spec = [
    ("vox", float64),
    ("voy", float64),
    ("voz", float64),
    ("mu", float64),
    ("dist", float64),
    ("poni1", float64),
    ("poni2", float64),
    ("width", int64),
    ("height", int64),
    ("mask", int8[:, :]),
    ("sampled", int64),
    ("data", float32[:]),
    ("indices", int32[:]),
    ("idptr", int32[:]),
]


@jitclass(spec)
class ThickDetector(object):
    "Calculate the point spread function as function of the geometry of the experiment"

    def __init__(self, vox, voy, thickness, mask, mu, dist, poni1, poni2):
        """Constructor of the class:

        :param vox, voy: detector pixel size in the plane
        :param thickness: thickness of the sensor in meters
        :param mask:
        :param mu: absorption coefficient of the sensor material
        :param dist: sample detector distance as defined in the geometry-file
        :param poni1, poni2: coordinates of the PONI as defined in the geometry
        """
        self.vox = vox
        self.voy = voy
        self.voz = thickness
        self.mu = mu
        self.dist = dist
        self.poni1 = poni1
        self.poni2 = poni2
        self.width = mask.shape[-1]
        self.height = mask.shape[0]
        self.mask = mask
        self.sampled = 0
        self.data = numpy.zeros(BLOCK_SIZE, dtype=numpy.float32)
        self.indices = numpy.zeros(BLOCK_SIZE, dtype=numpy.int32)
        self.idptr = numpy.zeros(self.width * self.height + 1, dtype=numpy.int32)

    def calc_one_ray(self, entx, enty):
        """For a ray entering at position (entx, enty), with a propagation vector (kx, ky,kz),
        calculate the length spent in every voxel where energy is deposited from a bunch of photons entering the detector
        at a given position and and how much energy they deposit in each voxel.

        Direct implementation of http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.42.3443&rep=rep1&type=pdf

        :param entx, enty: coordinate of the entry point in meter (2 components, x,y)
        :return: coordinates voxels in x, y and length crossed when leaving the associated voxel
        """
        array_x = numpy.empty(BUFFER_SIZE, dtype=numpy.int32)
        array_x[:] = -1
        array_y = numpy.empty(BUFFER_SIZE, dtype=numpy.int32)
        array_y[:] = -1
        array_len = numpy.empty(BUFFER_SIZE, dtype=numpy.float32)

        # normalize the input propagation vector
        kx = entx - self.poni2
        ky = enty - self.poni1
        kz = self.dist
        n = numpy.sqrt(kx * kx + ky * ky + kz * kz)
        kx /= n
        ky /= n
        kz /= n

        step_X = -1 if kx < 0.0 else 1
        step_Y = -1 if ky < 0.0 else 1

        X = int(entx / self.vox)
        Y = int(enty / self.voy)

        if kx > 0.0:
            t_max_x = ((entx // self.vox + 1) * (self.vox) - entx) / kx
        elif kx < 0.0:
            t_max_x = ((entx // self.vox) * (self.vox) - entx) / kx
        else:
            t_max_x = BIG

        if ky > 0.0:
            t_max_y = ((enty // self.voy + 1) * (self.voy) - enty) / ky
        elif ky < 0.0:
            t_max_y = ((enty // self.voy) * (self.voy) - enty) / ky
        else:
            t_max_y = BIG

        # Only one case for z as the ray is travelling in one direction only
        t_max_z = self.voz / kz

        t_delta_x = abs(self.vox / kx) if kx != 0 else BIG
        t_delta_y = abs(self.voy / ky) if ky != 0 else BIG
        t_delta_z = self.voz / kz

        finished = False
        last_id = 0
        array_x[last_id] = X
        array_y[last_id] = Y

        while not finished:
            if t_max_x < t_max_y:
                if t_max_x < t_max_z:
                    array_len[last_id] = t_max_x
                    last_id += 1
                    X += step_X
                    array_x[last_id] = X
                    array_y[last_id] = Y
                    t_max_x += t_delta_x
                else:
                    array_len[last_id] = t_max_z
                    last_id += 1
                    finished = True
            else:
                if t_max_y < t_max_z:
                    array_len[last_id] = t_max_y
                    last_id += 1
                    Y += step_Y
                    array_x[last_id] = X
                    array_y[last_id] = Y
                    t_max_y += t_delta_y
                else:
                    array_len[last_id] = t_max_z
                    last_id += 1
                    finished = True
            if last_id >= array_len.size - 1:
                print("resize arrays")
                old_size = len(array_len)
                new_size = (old_size // BUFFER_SIZE + 1) * BUFFER_SIZE
                new_array_x = numpy.empty(new_size, dtype=numpy.int32)
                new_array_x[:] = -1
                new_array_y = numpy.empty(new_size, dtype=numpy.int32)
                new_array_y[:] = -1
                new_array_len = numpy.empty(new_size, dtype=numpy.float32)
                new_array_x[:old_size] = array_x
                new_array_y[:old_size] = array_y
                new_array_len[:old_size] = array_len
                array_x = new_array_x
                array_y = new_array_y
                array_len = new_array_len
        return array_x[:last_id], array_y[:last_id], array_len[:last_id]

    def one_pixel(self, row, col, sample):
        """calculate the contribution of one pixel to the sparse matrix and populate it.

        :param row: row index of the pixel of interest
        :param col: column index of the pixel of interest
        :param sample: Oversampling rate, 10 will cast 10x10 ray per pixel

        :return: the extra number of pixel allocated
        """
        if self.mask[row, col]:
            return (
                numpy.empty(0, dtype=numpy.int32),
                numpy.empty(0, dtype=numpy.float32),
            )

        counter = 0
        tmp_size = 0
        last_buffer_size = BUFFER_SIZE
        tmp_idx = numpy.empty(last_buffer_size, dtype=numpy.int32)
        tmp_idx[:] = -1
        tmp_coef = numpy.zeros(last_buffer_size, dtype=numpy.float32)

        pos = row * self.width + col
        start = self.idptr[pos]
        for i in range(sample):
            posx = (col + 1.0 * i / sample) * vox
            for j in range(sample):
                posy = (row + 1.0 * j / sample) * voy
                array_x, array_y, array_len = self.calc_one_ray(posx, posy)

                rem = 1.0
                for i in range(array_x.size):
                    x = array_x[i]
                    y = array_y[i]
                    l = array_len[i]
                    if (x < 0) or (y < 0) or (y >= self.height) or (x >= self.width):
                        break
                    elif self.mask[y, x]:
                        continue
                    idx = x + y * self.width
                    dos = numpy.exp(-self.mu * l)
                    value = rem - dos
                    rem = dos
                    for j in range(last_buffer_size):
                        if tmp_size >= last_buffer_size:
                            # Increase buffer size
                            new_buffer_size = last_buffer_size + BUFFER_SIZE
                            new_idx = numpy.empty(new_buffer_size, dtype=numpy.int32)
                            new_coef = numpy.zeros(new_buffer_size, dtype=numpy.float32)
                            new_idx[:last_buffer_size] = tmp_idx
                            new_idx[last_buffer_size:] = -1
                            new_coef[:last_buffer_size] = tmp_coef
                            last_buffer_size = new_buffer_size
                            tmp_idx = new_idx
                            tmp_coef = new_coef

                        if tmp_idx[j] == idx:
                            tmp_coef[j] += value
                            break
                        elif tmp_idx[j] < 0:
                            tmp_idx[j] = idx
                            tmp_coef[j] = value
                            tmp_size += 1
                            break
        return tmp_idx[:tmp_size], tmp_coef[:tmp_size]

    def calc_csr(self, sample):
        """Calculate the CSR matrix for the whole image
        :param sample: Oversampling factor
        :return: CSR matrix
        """
        size = self.width * self.height
        allocated_size = BLOCK_SIZE
        idptr = numpy.zeros(size + 1, dtype=numpy.int32)
        indices = numpy.zeros(allocated_size, dtype=numpy.int32)
        data = numpy.zeros(allocated_size, dtype=numpy.float32)
        self.sampled = sample * sample
        pos = 0
        start = 0
        for row in range(self.height):
            for col in range(self.width):
                line_idx, line_coef = self.one_pixel(row, col, sample)
                line_size = line_idx.size
                if line_size == 0:
                    new_size = 0
                    pos += 1
                    idptr[pos] = start
                    continue

                stop = start + line_size

                if stop >= allocated_size:
                    new_buffer_size = allocated_size + BLOCK_SIZE
                    new_idx = numpy.zeros(new_buffer_size, dtype=numpy.int32)
                    new_coef = numpy.zeros(new_buffer_size, dtype=numpy.float32)
                    new_idx[:allocated_size] = indices
                    new_coef[:allocated_size] = data
                    allocated_size = new_buffer_size
                    indices = new_idx
                    data = new_coef

                indices[start:stop] = line_idx
                data[start:stop] = line_coef
                pos += 1
                idptr[pos] = stop
                start = stop

        last = idptr[-1]
        self.data = data
        self.indices = indices
        self.idptr = idptr
        return (self.data[:last] / self.sampled, indices[:last], idptr)

[9]:

nbthick = ThickDetector(
    vox, voy, thickness=thickness, mu=mu, dist=dist, poni1=poni1, poni2=poni2, mask=mask
)
%time nbthick.calc_csr(1)

CPU times: user 3.95 s, sys: 236 ms, total: 4.19 s
Wall time: 4.2 s

[9]:

(array([0., 0., 0., ..., 0., 0., 0.], shape=(1902583,), dtype=float32),
 array([      2,       2,       4, ..., 1023180, 1023181, 1023182],
       shape=(1902583,), dtype=int32),
 array([      0,       0,       0, ..., 1902581, 1902582, 1902583],
       shape=(1023184,), dtype=int32))

[10]:

%time nbthick.calc_csr(4)

CPU times: user 3.68 s, sys: 7.78 ms, total: 3.69 s
Wall time: 3.68 s

[10]:

(array([0.17449115, 0.10933631, 0.17465518, ..., 0.24611568, 0.03161056,
        0.24604346], shape=(3410359,), dtype=float32),
 array([      2,       2,       3, ..., 1023181, 1023182, 1023182],
       shape=(3410359,), dtype=int32),
 array([      0,       0,       0, ..., 3410356, 3410358, 3410359],
       shape=(1023184,), dtype=int32))

Same implementation using Cython#

Cython is an ahead of time compiler for Python code. Thus it requires a compiler (gcc, msvc, …) needs to be installed on your computer and available. This is usually already the case under linux but requires some work under windows or macos.

[11]:

%load_ext Cython

[12]:

%%cython --compile-args=-fopenmp --link-args=-fopenmp
#cython: embedsignature=True, language_level=3, binding=True
#cython: boundscheck=False, wraparound=False, cdivision=True, initializedcheck=False,
## This is for developping:
## cython: profile=True, warn.undeclared=True, warn.unused=True, warn.unused_result=False, warn.unused_arg=True
##

import cython
import numpy
from libc.math cimport sqrt, exp
from cython.parallel import prange
from libc.stdint cimport int8_t, uint8_t, int16_t, uint16_t, \
                         int32_t, uint32_t, int64_t, uint64_t

ctypedef double float64_t
ctypedef float float32_t

cdef int32_t BUFFER_SIZE = 16
cdef float64_t BIG = numpy.finfo(numpy.float32).max


cdef class CythonThickDetector:
    "Calculate the point spread function as function of the geometry of the experiment"
    cdef:
        public float64_t vox
        public float64_t voy
        public float64_t voz
        public float64_t mu
        public float64_t dist
        public float64_t poni1
        public float64_t poni2
        public int oversampling
        public int buffer_size
        public int width
        public int height
        public int size
        public int8_t[:, ::1] mask

    def __init__(self,
                 float64_t vox,
                 float64_t voy,
                 float64_t thickness,
                 mask,
                 float64_t mu,
                 float64_t dist,
                 float64_t poni1,
                 float64_t poni2,
                 int buffer_size=BUFFER_SIZE):
        """Constructor of the class:

        :param vox, voy: detector pixel size in the plane
        :param thickness: thickness of the sensor in meters
        :param mask:
        :param mu: absorption coefficient of the sensor material
        :param dist: sample detector distance as defined in the geometry-file
        :param poni1, poni2: coordinates of the PONI as defined in the geometry
        """
        self.vox = float(vox)
        self.voy = float(voy)
        self.voz = float(thickness)
        self.mu = float(mu)
        self.dist = float(dist)
        self.poni1 = float(poni1)
        self.poni2 = float(poni2)
        self.width = int(mask.shape[1])
        self.height = int(mask.shape[0])
        self.size = self.width*self.height
        self.mask = numpy.ascontiguousarray(mask, numpy.int8)
        self.oversampling = 1
        self.buffer_size = int(buffer_size)

    def calc_one_ray(self, entx, enty):
        """For a ray entering at position (entx, enty), with a propagation vector (kx, ky,kz),
        calculate the length spent in every voxel where energy is deposited from a bunch of photons entering the detector
        at a given position and and how much energy they deposit in each voxel.

        Direct implementation of http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.42.3443&rep=rep1&type=pdf

        :param entx, enty: coordinate of the entry point in meter (2 components, x,y)
        :return: coordinates voxels in x, y and length crossed when leaving the associated voxel
        """

        cdef:
            int last_id
            float64_t _entx = float(entx)
            float64_t _enty = float(enty)
            int32_t[::1] array_x = numpy.empty(self.buffer_size, dtype=numpy.int32)
            int32_t[::1] array_y = numpy.empty(self.buffer_size, dtype=numpy.int32)
            float32_t[::1] array_len = numpy.empty(self.buffer_size, dtype=numpy.float32)
        with nogil:
            last_id = self._calc_one_ray(_entx, _enty,
                                         array_x, array_y, array_len)
        if last_id>self.buffer_size:
            raise RuntimeError(f"Temporary buffer size ({last_id}) larger than expected ({self.buffer_size})")
        return (numpy.asarray(array_x[:last_id]),
                numpy.asarray(array_y[:last_id]),
                numpy.asarray(array_len[:last_id]))

    cdef int _calc_one_ray(self,
                           float64_t entx,
                           float64_t enty,
                           int32_t[::1] array_x,
                           int32_t[::1] array_y,
                           float32_t[::1] array_len
                          )noexcept nogil:
        """Return number of entries in the array_x[:last_id], array_y[:last_id], array_len[:last_id]"""
        cdef:
            float64_t kx, ky, kz, n, t_max_x, t_max_y, t_max_z, t_delta_x, t_delta_y#, t_delta_z
            int step_X, step_Y, X, Y, last_id
            bint finished

        # reset arrays
        array_x[:] = -1
        array_y[:] = -1
        array_len[:] = 0.0

        # normalize the input propagation vector
        kx = entx - self.poni2
        ky = enty - self.poni1
        kz = self.dist
        n = sqrt(kx*kx + ky*ky + kz*kz)
        kx /= n
        ky /= n
        kz /= n

        step_X = -1 if kx<0.0 else 1
        step_Y = -1 if ky<0.0 else 1

        X = int(entx/self.vox)
        Y = int(enty/self.voy)

        if kx>0.0:
            t_max_x = ((entx//self.vox+1)*(self.vox)-entx)/ kx
        elif kx<0.0:
            t_max_x = ((entx//self.vox)*(self.vox)-entx)/ kx
        else:
            t_max_x = BIG

        if ky>0.0:
            t_max_y = ((enty//self.voy+1)*(self.voy)-enty)/ ky
        elif ky<0.0:
            t_max_y = ((enty//self.voy)*(self.voy)-enty)/ ky
        else:
            t_max_y = BIG

        #Only one case for z as the ray is travelling in one direction only
        t_max_z = self.voz / kz

        t_delta_x = abs(self.vox/kx) if kx!=0 else BIG
        t_delta_y = abs(self.voy/ky) if ky!=0 else BIG
        # t_delta_z = self.voz/kz

        finished = False
        last_id = 0
        array_x[last_id] = X
        array_y[last_id] = Y

        while not finished:
            if t_max_x < t_max_y:
                if t_max_x < t_max_z:
                    array_len[last_id] = t_max_x
                    last_id = last_id + 1
                    X = X + step_X
                    array_x[last_id] = X
                    array_y[last_id] = Y
                    t_max_x = t_max_x + t_delta_x
                else:
                    array_len[last_id] = t_max_z
                    last_id = last_id + 1
                    finished = True
            else:
                if t_max_y < t_max_z:
                    array_len[last_id] = t_max_y
                    last_id = last_id +1
                    Y = Y + step_Y
                    array_x[last_id] = X
                    array_y[last_id] = Y
                    t_max_y = t_max_y + t_delta_y
                else:
                    array_len[last_id] = t_max_z
                    last_id = last_id +1
                    finished = True
            if last_id>=self.buffer_size:
                return self.buffer_size
        return last_id


    def one_pixel(self, row, col, sample=0):
        """calculate the contribution of one pixel to the sparse matrix and populate it.

        :param row: row index of the pixel of interest
        :param col: column index of the pixel of interest
        :param sample: Oversampling rate, 10 will cast 10x10 ray per pixel
        """
        cdef:
            int entries = 0
            int _row = int(row)
            int _col = int(col)
            int32_t[::1] tmp_idx = numpy.empty(self.buffer_size, dtype=numpy.int32)
            float32_t[::1] tmp_coef = numpy.empty(self.buffer_size, dtype=numpy.float32)
            int32_t[::1] array_x = numpy.empty(self.buffer_size, dtype=numpy.int32)
            int32_t[::1] array_y = numpy.empty(self.buffer_size, dtype=numpy.int32)
            float32_t[::1] array_len = numpy.empty(self.buffer_size, dtype=numpy.float32)

        if  sample:
            self.oversampling = sample
        with nogil:
            entries = self._one_pixel(_row, _col, tmp_idx, tmp_coef, array_x, array_y, array_len)
        if entries<self.buffer_size:
            return (numpy.asarray(tmp_idx)[:entries],numpy.asarray(tmp_coef)[:entries])
        else:
            raise RuntimeError(f"Pixel produced {entries} voxels, limited to {self.buffer_size}. Increase Buffer size ! in constructor")

    cdef int _one_pixel(self, int row, int col,
                       int32_t[::1] tmp_idx,
                       float32_t[::1] tmp_coef,
                       int32_t[::1] array_x,
                       int32_t[::1] array_y,
                       float32_t[::1] array_len
                    )noexcept nogil:
        "return the number of elements in tmp_idx/tmp_coef which are to be used"

        cdef:
            int i, j, tmp_size, last_buffer_size, n, x, y, idx
            float64_t rem, l, posx, posy, value, dos

        if self.mask[row, col]:
            return 0

        tmp_size = 0
        last_buffer_size = tmp_idx.shape[0]
        tmp_idx[:] = -1
        tmp_coef[:] = 0.0

        for i in range(self.oversampling):
            posx = (col+1.0*i/self.oversampling)*self.vox
            for j in range(self.oversampling):
                posy = (row+1.0*j/self.oversampling)*self.voy
                n = self._calc_one_ray(posx, posy, array_x, array_y, array_len)
                rem = 1.0
                for i in range(min(n, self.buffer_size)):
                    x = array_x[i]
                    y = array_y[i]
                    l = array_len[i]
                    if (x<0) or (y<0) or (y>=self.height) or (x>=self.width):
                        break
                    elif (self.mask[y, x]):
                        continue
                    idx = x + y*self.width
                    dos = exp(-self.mu*l)
                    value = rem - dos
                    rem = dos
                    for j in range(self.buffer_size):
                        if tmp_idx[j] == idx:
                            tmp_coef[j] = tmp_coef[j] + value
                            break
                        elif tmp_idx[j] < 0:
                            tmp_idx[j] = idx
                            tmp_coef[j] = value
                            tmp_size = tmp_size + 1
                            break
                    if tmp_size >= self.buffer_size:
                            break
        return tmp_size

    def calc_csr(self, sample=0, int threads=0):
        """Calculate the content of the sparse matrix for the whole image
        :param sample: Oversampling factor
        :param threads: number of threads to be used
        :return: spase matrix?
        """
        cdef:
            int pos, i, current, next, size
            int32_t[::1] sizes, indptr = numpy.zeros(self.size+1, dtype=numpy.int32)
            int32_t[:, ::1] indices
            float32_t[:, ::1] data
            int32_t[::1] csr_indices
            float32_t[::1] csr_data
            int32_t[:, ::1] array_x
            int32_t[:, ::1] array_y
            float32_t[:, ::1] array_len

        if sample:
            self.oversampling = sample
        self.oversampling = sample
        data = numpy.zeros((self.size, self.buffer_size), dtype=numpy.float32)
        indices = numpy.zeros((self.size, self.buffer_size),dtype=numpy.int32)
        sizes = numpy.zeros(self.size, dtype=numpy.int32)

        #single threaded version:
        array_x = numpy.empty((self.size, self.buffer_size), dtype=numpy.int32)
        array_y = numpy.empty((self.size, self.buffer_size), dtype=numpy.int32)
        array_len = numpy.empty((self.size, self.buffer_size), dtype=numpy.float32)

        for pos in prange(self.size, num_threads=threads, nogil=True):
            size = sizes[pos] = self._one_pixel(pos//self.width, pos%self.width,
                                             indices[pos], data[pos],
                                             array_x[pos], array_y[pos], array_len[pos])
        size = numpy.sum(sizes)
        csr_indices = numpy.empty(size, numpy.int32)
        csr_data = numpy.empty(size, numpy.float32)
        current = 0
        for i in range(self.size):
            size = sizes[i]
            next = current + size
            indptr[i+1] = next
            csr_indices[current:next] = indices[i,:size]
            csr_data[current:next] = data[i,:size]
            current = next
        return (numpy.asarray(csr_data)/(self.oversampling*self.oversampling),
                numpy.asarray(csr_indices),
                numpy.asarray(indptr))

[13]:

cythick = CythonThickDetector(
    172e-6,
    172e-6,
    thickness=thickness,
    mu=mu,
    dist=dist,
    poni1=poni1,
    poni2=poni2,
    mask=mask,
)
print("Performance of Cython implementation:")
%time cythick.calc_csr(4, threads=1)

Performance of Cython implementation:
CPU times: user 1.01 s, sys: 47.9 ms, total: 1.05 s
Wall time: 1.05 s

[13]:

(array([0.17449115, 0.10933631, 0.17465518, ..., 0.35667214, 0.05905784,
        0.35653958], shape=(3328700,), dtype=float32),
 array([      2,       2,       3, ..., 1023181, 1023182, 1023182],
       shape=(3328700,), dtype=int32),
 array([      0,       0,       0, ..., 3328697, 3328699, 3328700],
       shape=(1023184,), dtype=int32))

[14]:

# Comparison with Numba, Cython is usually slightly faster
print("Performance of Numba implementation:")
%time nbthick.calc_csr(4)

Performance of Numba implementation:
CPU times: user 3.65 s, sys: 28 ms, total: 3.68 s
Wall time: 3.67 s

[14]:

(array([0.17449115, 0.10933631, 0.17465518, ..., 0.24611568, 0.03161056,
        0.24604346], shape=(3410359,), dtype=float32),
 array([      2,       2,       3, ..., 1023181, 1023182, 1023182],
       shape=(3410359,), dtype=int32),
 array([      0,       0,       0, ..., 3410356, 3410358, 3410359],
       shape=(1023184,), dtype=int32))

Validation of the CSR matrix obtained:#

For this we will build a simple 2D image with one pixel in a regular grid and calculate the effect of the transformation calculated previously on it.

[15]:

%time csr = csr_matrix(cythick.calc_csr(10))

CPU times: user 11.1 s, sys: 403 ms, total: 11.5 s
Wall time: 358 ms

[16]:

dummy_image = numpy.ones(mask.shape, dtype="float32")
dummy_image[::5, ::5] = 10

dummy_blurred = csr.T.dot(dummy_image.ravel()).reshape(mask.shape)
fix, ax = subplots(2, 2, figsize=(8, 8))
ax[0, 0].imshow(dummy_image)
ax[0, 0].set_title("Original image")
ax[0, 1].imshow(dummy_blurred)
ax[0, 1].set_title("Convolved image (i.e. blurred)")
ax[1, 1].imshow(csr.dot(dummy_blurred.ravel()).reshape(mask.shape))
ax[1, 1].set_title("Retro-projected of the blurred")
ax[0, 0].set_xlim(964, 981)
ax[0, 0].set_ylim(0, 16)
ax[0, 1].set_xlim(964, 981)
ax[0, 1].set_ylim(0, 16)
ax[1, 1].set_xlim(964, 981)
ax[1, 1].set_ylim(0, 16);

../../../_images/usage_tutorial_ThickDetector_raytracing_23_0.png

Pseudo inverse with positivitiy constrain and poissonian noise (MLEM)#

The MLEM algorithm was initially developed within the framework of reconstruction of images in emission tomography [Shepp and Vardi, 1982], [Vardi et al., 1985], [Lange and Carson, 1984]. Nowadays, this algorithm is employed in numerous tomographic reconstruction problems and often associated to regularization techniques. It is based on the iterative maximization of the log-likelihood function.

[18]:

def iterMLEM_scipy(F, M, R):
    "Implement one step of MLEM"
    # res = F * (R.T.dot(M))/R.dot(F)# / M.sum(axis=-1)
    norm = 1 / R.T.dot(numpy.ones_like(F))
    cor = R.T.dot(M / R.dot(F))
    res = norm * F * cor
    res[numpy.isnan(res)] = 1.0
    return res


def deconv_MLEM(csr, data, thres=0.2, maxiter=1000):
    R = csr.T
    msk = data < 0
    img = data.astype("float32")
    img[msk] = 0.0  # set masked values to 0, negative values could induce errors
    M = img.ravel()
    # F0 = numpy.random.random(data.size)#M#
    F0 = R.T.dot(M)
    F1 = iterMLEM_scipy(F0, M, R)
    delta = abs(F1 - F0).max()
    for i in range(maxiter):
        if delta < thres:
            break
        F2 = iterMLEM_scipy(F1, M, R)
        delta = abs(F1 - F2).max()
        if i % 100 == 0:
            print(i, delta)
        F1 = F2
        i += 1
    print(i, delta)
    return F2.reshape(img.shape)


%time res = deconv_MLEM(csr, dummy_blurred, 1e-4)

/tmp/ipykernel_3870469/3638533218.py:4: RuntimeWarning: divide by zero encountered in divide
  norm = 1 / R.T.dot(numpy.ones_like(F))
/tmp/ipykernel_3870469/3638533218.py:5: RuntimeWarning: invalid value encountered in divide
  cor = R.T.dot(M / R.dot(F))
/tmp/ipykernel_3870469/3638533218.py:6: RuntimeWarning: invalid value encountered in multiply
  res = norm * F * cor

0 1.7501588
100 0.0014371872
200 0.00016927719
228 9.930134e-05
CPU times: user 2.66 s, sys: 19.9 ms, total: 2.67 s
Wall time: 2.67 s

[19]:

fix, ax = subplots(2, 2, figsize=(8, 8))
ax[0, 0].imshow(dummy_image)
ax[0, 1].imshow(dummy_blurred)
ax[1, 1].imshow(csr.dot(dummy_blurred.ravel()).reshape(mask.shape))
ax[0, 0].set_xlim(964, 981)
ax[0, 0].set_ylim(0, 16)
ax[0, 0].set_title("Original image")
ax[0, 1].set_xlim(964, 981)
ax[0, 1].set_ylim(0, 16)
ax[0, 1].set_title("Convolved image (i.e. blurred)")
ax[1, 1].set_xlim(964, 981)
ax[1, 1].set_ylim(0, 16)
ax[1, 1].set_title("Retro-projected of the blurred")
# ax[1,0].set_title("Corrected image");
ax[1, 0].imshow(res)
ax[1, 0].set_xlim(964, 981)
ax[1, 0].set_ylim(0, 16)
ax[1, 0].set_title("Corrected image: MLEM");

../../../_images/usage_tutorial_ThickDetector_raytracing_28_0.png

Performance measurements … multi-threaded#

[20]:

perfs = {}
for i in range(8):
    j = 1 << i
    print(f"{j:3d} threads: ", end="")
    perfs[(j, cythick.buffer_size)] = %timeit -o -n1 -r1 cythick.calc_csr(8, threads=j)

  1 threads: 3.47 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
  2 threads: 1.87 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
  4 threads: 1.07 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
  8 threads: 679 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
 16 threads: 516 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
 32 threads: 392 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
 64 threads: 325 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
128 threads: 312 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

[21]:

buf = 32
cythick = CythonThickDetector(
    172e-6,
    172e-6,
    thickness=thickness,
    mu=mu,
    dist=dist,
    poni1=poni1,
    poni2=poni2,
    mask=mask,
    buffer_size=buf,
)
cythick.calc_csr(1)
for i in range(8):
    j = 1 << i
    print(f"{j:3d} threads: ", end="")
    perfs[(j, cythick.buffer_size)] = %timeit -o -n1 -r1 cythick.calc_csr(8, threads=j)

  1 threads: 3.55 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
  2 threads: 1.93 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
  4 threads: 1.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
  8 threads: 685 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
 16 threads: 532 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
 32 threads: 402 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
 64 threads: 347 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
128 threads: 337 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

[22]:

fig, ax = subplots(1,2, figsize=(12,6))
t = [i[0] for i in perfs if i[1] == 16]
y16 = [v.best for k,v in perfs.items() if k[1] == 16]
y32 = [v.best for k,v in perfs.items() if k[1] == 32]
p16 = [8e-6*cythick.size/v.best for k,v in perfs.items() if k[1] == 16]
p32 = [8e-6*cythick.size/v.best for k,v in perfs.items() if k[1] == 32]
ax[0].plot(t, y16, label="16 voxel/ray buffer")
ax[0].plot(t, y32, label="32 voxel/ray buffer")
ax[0].set_title("Execution time")
ax[0].set_xlabel("Number of threads")
ax[0].set_ylabel("Runtime (s)")
ax[1].plot(t, p16, label="16 voxel/ray")
ax[1].plot(t, p32, label="32 voxel/ray")
ax[1].set_xlabel("Number of threads")
ax[1].set_ylabel("Millions of rays per second")
ax[1].set_title(cpu)
ax[0].legend();

../../../_images/usage_tutorial_ThickDetector_raytracing_32_0.png

Conclusion of the raytracing part:#

We are able to simulate the path and the absorption of the photon in the thickness of the detector. Numba/Cython helped substentially to make the raytracing calculation much faster. The signal of each pixel is indeed spread on the neighboors, depending on the position of the PONI and this effect can be inverted using sparse-matrix pseudo-inversion. The MLEM can garanteee that the total signal is conserved and that no pixel gets negative value.

We will now save this sparse matrix to file in order to be able to re-use it in next notebook. But before saving it, it makes sense to spend some time in generating a high quality sparse matrix in throwing thousand rays per pixel in a grid of 64x64 (4 billions rays launched).

[23]:

%time pre_csr = cythick.calc_csr(128, threads=os.cpu_count())
hq_csr = csr_matrix(pre_csr)
save_npz("csr.npz", hq_csr)

CPU times: user 20min 57s, sys: 961 ms, total: 20min 58s
Wall time: 22.7 s

[24]:

print(f"Total execution time: {time.perf_counter() - start_time:.3f} s")

Total execution time: 61.854 s