Configuration & Tips¶
In the dtmm.conf there are a few configuration options that you can use for custom configuration, optimization and tuning. The package relies heavily on numba-optimized code. Default numba compilation options are used. For fine-tuning you can of course use custom compilation options that numba provides (See numba compilation options). There are also a few numba related environment variables that you can set in addition to numba compilation options. These are explained below.
Verbosity¶
By default compute functions do not print to stdout. You can set printing of progress bar and messages with:
>>> import dtmm
>>> dtmm.conf.set_verbose(1) #level 1 messages
0
>>> dtmm.conf.set_verbose(2) #level 2 messages (more info)
1
To disable verbosity, set verbose level to zero:
>>> dtmm.conf.set_verbose(0) #disable printing to stdout
2
Note
The setter functions in the dtmm.conf module return previous defined setting.
Numba multithreading¶
Most computationally expensive numerical algorithms were implemented using @vectorize or @guvecgorize and can be compiled with target=”parallel” option. By default, parallel execution is disabled for two reasons. In parallel mode, the functions have to be compiled at runtime. This adds significant compilation time overhead when importing the package. Secondly, automatic parallelization of vectorized functions is a new feature in numba and is still experimental and not supported on all platforms according to numba documentation.
You can enable parallel target for numba functions by setting the DTMM_TARGET_PARALLEL environment variable. This has to be set prior to importing the package.
>>> import os
>>> os.environ["DTMM_TARGET_PARALLEL"] = "1"
>>> import dtmm #parallel enabled dtmm
Another option is to modify the configuration file (see below). Depending on the number of cores in your system, you should be able to notice an increase in the computation speed.
Numba cache¶
Numba allows caching of compiled functions. If DTMM_TARGET_PARALLEL environment variable is not defined, all compiled functions are cached and stored in your home directory for faster import by default. For debugging purposes, you can enable/disable caching with DTMM_NUMBA_CACHE environment variable. To disable caching (enabled by default):
>>> os.environ["DTMM_NUMBA_CACHE"] = "0"
Cached files are stored in .dtmm/numba_cache in user’s home directory. You can remove this folder to force recompilation. To enable/disable caching you can modify the configuration file (see below).
FFT optimization¶
The package was intended to work with mkl_fft FFT library. In stock numpy or spicy, there are no inplace FFT transform and FFT implementation is not optimized. Although the package works without the intel library, you are advised to install mkl_fft for best performance.
You can select FFT library (“mkl_fft”, “numpy”, or “scipy”) with the following:
>>> dtmm.conf.set_fftlib("mkl_fft")
'mkl_fft'
For mkl_fft there is an additional optimization step. Intel’s FFT implementation is multithreaded for single FFT computation, which works well for large sized arrays, but there is a very small increase in speed when computing smaller arrays (say 256x256 and smaller). In light transmission calculation, for each wavelength, each polarization, or ray direction there are four 2D FFT and four 2D IFFT computations performed per layer. Instead of parallelizing each of the transforms it is better to make all these transforms in parallel.
FFT functions in the dtmm.fft can be parallelized using a ThreadPool. By default, this parallelization is disabled and you can enable ThreadPool parallelization of FFTs with:
>>> dtmm.conf.set_nthreads(4)
1
It is important that you disable MKL’s multithreading by setting the MKL_NUM_THREADS environment variable to “1”, or if you have mkl-services installed try:
>>> import mkl
>>> mkl.set_num_threads(1)
2
You must experiment with settings a little. Depending on the size of the field_data, number of cores, the ThreadPool version may work faster or it may work slower than mkl_fft version. If you are not sure what to use, stick with stock MKL threading and default setting of:
>>> dtmm.conf.set_nthreads(1)
4
Note
Creating a ThreadPool in python adds some overhead (a few miliseconds). It makes sense to perform multithreading if computational complexity is high enough. MKL’s threading works well for large arrays, but for large number of computations of small arrays, (as in multi-ray computations) ThreadPool should be faster.
Default threading options can also be set in the configuration file (see below).
Precision¶
By default, computation is performed in double precision. You may disable double precision if you are low on memory, and to gain some speed in computation.
>>> os.environ["DTMM_DOUBLE_PRECISION"] = "0"
You can also use fastmath option in numba compilation to gain some small speed by reducing the computation accuracy when using MKL.
>>> os.environ["DTMM_FASTMATH"] = "1"
Default values can also be set the configuration file (see below).
DTMM cache¶
DTMM package uses results cache internally. You can disable caching of results by:
>>> dtmm.conf.set_cache(0)
1
If you are running out of memory you should probably disable cashing. To clear cached data you can call:
>>> dtmm.conf.clear_cache()
Default option can also be set the configuration file (see below).
DTMM configuration file¶
You can also edit the configuration file .dtmm/dtmm.ini in user’s home directory to define default settings. This file is automatically generated from a template if it does not exist in the directory. To create the default configuration file, remove the configuration file and import the library in python.
[core]
#: max beta parameter used in calculations. Should be 0 < betamax
betamax = 0.8
#: smoothnes parameter used in reflection calculation in 4x4 method Should be 0 < smooth
smooth = 0.1
#: specifies if computation results are being cached or not
cache = yes
#: specifies whether double precision is used in calculations:
double_precision = yes
[transfer]
#: default effective data 0 - isotropic, 1 - uniaxial, 2 - biaxial
eff_data = 0
#: default input refractive index, defaults to n_cover.
#nin =
#: default output refractive index, defaults to n_cover .
#nout =
#: either 2x2 or 4x4
method = "2x2"
#: how many passes to perform (set this to > 1) if you want to compute reflections also.
npass = 1
#: diffraction quality (0,1,2... or -1 for full diffraction).
diffraction = 1
#: reflection mode, either 0, 1 or 2 or comment out to let the algorithm choose the best mode
#reflection = 2
[viewer]
#: default cmf function or path to tabulated cmf data used in field_viewer.
cmf = CIE1931
#: specifies whether to show ticks or not, comment out or leave empty for auto.
#show_ticks =
#: specifies whether to show scale bar, you must have matplotlib.scale_bar installed.
show_scalebar = no
#: specifies whether to show sliders in the viewer.
show_sliders = yes
#: specified whether to convert RGB to gray.
gray = no
#: specifies whether to apply gamma or not, or set the gamma as float.
gamma = yes
#gamma = 2.
#: cover glass refractive index used in pom_viewer.
n_cover = 1.5
#: cover glass thickness. Set to zero or comment out to disable cover glass.
d_cover = 0.
#: specifies whether oil imersion microscope is being use or not.
immersion = no
#: numerical aperture of the objective. Should be lower than 1 for non-immersion objectives.
NA = 0.7
[numba]
#: are compiled numba functions cached or not.
cache = yes
#: should we compile with multithreading support ('target = parallel' option).
parallel = no
#: should numba use 'fastmath = True' option.
fastmath = no
[fft]
#: fft library used for fft, can be mkl_fft, numpy, scipy, comment out to use default library.
#fftlib =
#: should we use python's threading for fft.
parallel = no
#: number of threads used if parallel mode is activated. Uncomment it and set to desired value.
#: number of threads is defined automatically if not defined below.
#nthreads =