Skip to content

v4.3 release notes#774

Open
TysonRayJones wants to merge 33 commits into
mainfrom
devel
Open

v4.3 release notes#774
TysonRayJones wants to merge 33 commits into
mainfrom
devel

Conversation

@TysonRayJones
Copy link
Copy Markdown
Member

@TysonRayJones TysonRayJones commented Jun 3, 2026

(Ignore the diff; using the commit history to prepare the release notes)

Overview

This release accelerates few-qubit, CPU and multi-GPU simulation (on some platforms), adds control and performance tuning utilities for MPI and GPU superusers, adds randomised Trotter simulation, and reduces the likelihood of QuEST symbols colliding with other software stacks.

Optimisations

  • QuEST's simulation of few-qubit systems has been accelerated, by reducing memory transfer overheads.
  • QuEST's CPU and multithreaded backend has been accelerated (at all scales) on specific platforms / compilers, by switching to custom complex arithmetic.
  • QuEST's detection of CUDA-aware MPI has been expanded to MPICH CRAY systems, greatly accelerating multi-GPU simulation.

New features

  • Added experimental initCustomMpiCommQuESTEnv() and initCustomMpiCommQuESTEnv() functions which permit users to retain control of MPI during QuEST simulation, and dedicate only a sub-communicator (e.g. only some MPI processes) to QuEST. Users can now also disable QuEST distribution through initCustomQuESTEnv() while still themselves using MPI, even when QuEST was compiled with MPI.
  • Added an experimental setQuESTNumGpuThreadsPerBlock() function to override QuEST's GPU parallelisation granularity at runtime, permitting simple performance tuning. This accompanies a getter (get...), an environment variable QUEST_DEFAULT_NUM_GPU_THREADS_PER_BLOCK to override the parallelisation at launch time, and a CMake option of the same name to override at build-time.
  • Added sortPauliStrSumLexicographic() and sortPauliStrSumMagnitude() to reorder the terms within a PauliStrSum, affecting the numerical accuracy of Trotterisation.
  • The Trotter functions now accept an additional permuteTerms boolean, which when true, sees every Trotter repetition randomise the ordering of the Trotter terms, often improving numerical accuracy. This affects:
  • The QFT functions now accept an inverse boolean, to apply the inverse QFT.
  • Improved the CMake build:
    • Added QUEST_INSTALL_BINARIES to enable including examples and user-source in the QuEST installation directory.
    • Added a warning when CMake configures a (unoptimised) non-release build.
    • Removed brittle, platform-specific compiler flags (to work around a since-resolved performance problem).
    • Added QUEST_ prefix to (almost) all CMake options, to avoid collision with other projects (see API breaks below).
    • During compilation of the tests, the compiled executable is no longer run (for Catch2 test discovery), improving ease of compilation on supercomputers (e.g. systems with distinct job-submission and job-run nodes).

API breaks

  • As above, the Trotter functions now accept an additional permuteTerms boolean. This affects:
    • apply(Multi)(State)(Controlled)TrotterizedPauliStrSumGadget()
    • applyTrotterizedNonUnitaryPauliStrSumGadget()
    • applyTrotterized(Unitary|Imaginary|Noisy)TimeEvolution()
  • As above, the QFT functions now accept an inverse boolean. This affects:
    • apply(Full)QuantumFourierTransform()
  • A superfluous numControls argument was removed from the std::vector (C++ only) overloads of applyMultiStateControlledSqrtSwap() and applyMultiStateControlledCompMatr2().
  • All debug functions now explicitly contain the word QuEST (for example, setQuESTSeeds()) as the second word, indicated by [QuEST] below. This affects functions:
    • (set|get)[QuEST]Seeds(ToDefault)()
    • get[QuEST]NumSeeds()
    • set[QuEST]InputErrorHandler()
    • set[QuEST]Validation(On|Off)()
    • set[QuEST]ValidationEpsilon(ToDefault)()
    • set[QuEST]MaxNumReportedItems()
    • set[QuEST]MaxNumReportedSigFigs()
    • set[QuEST]NumReportedNewlines()
    • set[QuEST]ReportedPauli(Chars|StrStyle)()
    • get[QuEST]GpuCacheSize()
    • clear[QuEST]GpuCache()
    • get[QuEST]EnvironmentString()
  • All environment variables now begin with QUEST_, as indicated by [QUEST_] below. This affects environment variables:
    • [QUEST_]PERMIT_NODES_TO_SHARE_GPU
    • [QUEST_]DEFAULT_VALIDATION_EPSILON
    • [QUEST_]TEST_NUM_QUBITS_IN_QUREG
    • [QUEST_]TEST_MAX_NUM_QUBIT_PERMUTATIONS
    • [QUEST_]TEST_MAX_NUM_SUPEROP_TARGETS
    • [QUEST_]TEST_NUM_MIXED_DEPLOYMENT_REPETITIONS
    • TEST_ALL_DEPLOYMENTS has become QUEST_TEST_TRY_ALL_DEPLOYMENTS
  • All CMake options have been renamed, to now begin with QUEST_ or USER_ (to disambiguate whether they relate to the QuEST library or the user's optional source files) QUEST_, and several have been made more explicit. The changes are:
    • USER_SOURCE -> USER_SOURCE_NAMES
    • OUTPUT_EXE -> USER_OUTPUT_EXE_NAME
    • LIB_NAME -> QUEST_OUTPUT_LIB_NAME
    • VERBOSE_LIB_NAME -> QUEST_APPEND_CONFIG_TO_LIB_NAME
    • FLOAT_PRECISION -> QUEST_FLOAT_PRECISION
    • BUILD_EXAMPLES -> QUEST_BUILD_EXAMPLES
    • ENABLE_MULTITHREADING -> QUEST_ENABLE_OMP
    • ENABLE_DISTRIBUTION -> QUEST_ENABLE_MPI
    • ENABLE_TESTING -> QUEST_BUILD_TESTS
    • DOWNLOAD_CATCH2 -> QUEST_TESTS_DOWNLOAD_CATCH2
    • [QUEST_]ENABLE_CUDA
    • [QUEST_]ENABLE_CUQUANTUM
    • [QUEST_]ENABLE_HIP
    • [QUEST_]ENABLE_DEPRECATED_API
    • [QUEST_]DISABLE_DEPRECATION_WARNINGS

Minor changes

  • The output of reportQuESTEnv() has been rearranged and reordered.
  • setSeeds (now called setQuESTSeeds()) validates that its given list of integers is non-null.
  • The int fields of the QuESTEnv struct (such as isMultithreaded) have been changed to type bool (exposed by <stdbool.h> in C).
  • Added extended examples set_num_gpu_threads.(c|cpp) and user_owned_(sub)mpi.(c|cpp)

Patches

  • #729 Restored compatibility with ROCm 6 and beyond, and removed the need for -Ofast to be applied to CPU subroutines.
  • #699 Patched an overflow bug in the GPU (Thrust) backend affecting of simulation using more than 64 GiB or more of memory in a single GPU (i.e. a non-distributed 32-qubit statevector, or 16-qubit density matrix). Formerly, the below functions would induce a crash, or set all amplitudes to zero, or output zero.
    • (set|create)FullStateDiagMatrFromPauliStrSum()
    • setQuregToPauliStrSum()
    • calcTotalProb()
    • calcProbOf(Multi)QubitOutcome()
    • calcFidelity()
    • calcExpecPauliStr()
    • calcExpecPauliStrSum()
    • calcExpecFullStateDiagMatr()
    • apply(Multi)QubitProjector()
    • initRandomPureState()
  • #693 Restored compatibility with CUDA 13 (solving compilation failure).

Notable internal changes

  • The CPU and GPU backends no longer use std::complex arithmetic overloads, and instead make use of custom cpu_qcomp and gpu_qcomp types.
  • The internal functions no longer pass qubit lists as copies of heap-based std::vector<int>. They now instead use List64 - a custom, stack-based, light-weight, fixed-capacity list - and pass constant references thereof (``ConstList64`) where possible.
  • Internal MPI calls now use a dedicated MPI communicator rather than MPI_COMM_WORLD to avoid collisions with user messages.

New contributors

This release contained contributions from new contributors:

otbrown and others added 30 commits October 21, 2025 10:11
gpu_thrust.cuh: removed thrust::[unary|binary]_function which has been removed from CCCL.
* Simplify installation path configuration

Removed unnecessary path normalization and appending for installation.

* Updates CMake config for conditional installs

Modifies CMakeLists to conditionally build shared libraries and
install binaries only at the top-level project. Introduces the
INSTALL_BINARIES option to control the inclusion of example
binaries in the installation process. Corrects a typo from
'RATH' to 'RPATH' for build configurations.
* docs/cmake.md: fixed formatting of non-default options for mt and distribution

* cmake: wrapped user source install in if(INSTALL_BINARIES)

* docs/cmake.md: added INSTALL_BINARIES option
* gpu_thrust.cuh: modified initial thrust counting iterator declarations to use long long to avoid overflow at >30 qubits. Fixes #698.

* patched test of rightapplyCompMatr distributed validation

The operation validation tests previously always uses a statevector to test the "targeted amps fit in node" validation, though the rightapply*() functions cannot accept statevectors, instead only density matrices. Because the "was given a density matrix" validation happens before "targeted amps fit in node" validation, the latter intended triggered error was beaten out by the earlier unintended one.

Now, we are careful to pass a density matrix Qureg to the validation of "targeted amps fit in node" when triggered by a function which 'right-applies' (and is ergo only compatible with density matrices)

* changed literals to defensive type

---------

Co-authored-by: Tyson Jones <tyson.jones.input@gmail.com>
* Implement PauliStrSum random permutations inspired by [arXiv:1805.08385](https://arxiv.org/abs/1805.08385)

* Add randomisation to Trotter functions

* Document random Pauli permutations for Trotterisation

* Add test for Trotter randomisation
* Added unit tests requested by Quantum Motion

* tests/unit/trotterisation.cpp: updated time evo calls to new API

* tests/unit/trotterisation.cpp: updated authorlist

* Fixed valgrind errors

* tests/unit/trotterisation.cpp: tuned floating-point comparison epsilon to account for worst-case scenario which is single precision, single thread

* added get-arbitrary-qureg test util

since it will be used frequently by new input validation

* removed Qureg creation in validation tests

so that test failures do not cause memory leaks and e.g. add to valgrind noise. Tests now instead use getArbitraryCachedStatevec() or getArbitraryCachedDensmatr() to obtain an existing qureg with an arbitrary deployment.

* restoring missing-validation comments

since the validation for these functions wasn't added. Such functions have additional tests to their tested counterparts; for example, validating that matrix elements are non-zero when given a negative exponent

* fixing test category

* added missing operation validation tests

* fixed indentation

* making spacing consistent

and adding a missing Hermiticity validation to applyTrotterizedUnitaryTimeEvolution test

* added warning about untested deployments

* removed defunct signature

* patching C++ validation err msg

Previously, an error message of the C++ API was not substituting in values for its placeholder variables. This affected the C++ variants of the below functions when passing vectors for the targets and outcome parameters of mismatching length:
- calcProbOfMultiQubitOutcome
- leftapplyMultiQubitProjector
- rightapplyMultiQubitProjector
- applyMultiQubitProjector
- applyForcedMultiQubitMeasurement

* added missing C++ API signatures

* added C++-API validation tests

* updated doc warnings

* added Vasco to Trotter API authorlist

* merged Tyson's patches

---------

Co-authored-by: Tyson Jones <tyson.jones.input@gmail.com>
Co-authored-by: Maurice Jamieson <m.jamieson@epcc.ed.ac.uk>
Co-authored-by: Oliver Thomson Brown <otbrown@users.noreply.github.com>
Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com>
CI updated to use latest AMD ROCm install instructions. As of this commit corresponding to ROCm 7.2.

---------

Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com>
Remove numControls argument from applyMultiStateControlledSqrtSwap overloaded definition taking std::vector<int>

(cherry picked from commit 9c20792)

Co-authored-by: D-Exposito <dexposito@cesga.es>
* tests/unit/trotterisation.cpp: updated to use REQUIRE_AGREE and cached statevecs and densmats, and both permutePaulis options

* tests/utils/compare.hpp/cpp: added setters for test epsilon

* tests/unit/trotterisation.cpp: adjusted test epsilon for quad precision imaginary time evolution tests

* tests/unit/trotterisation.cpp: moved unitary time evo test to REQUIRE_AGREE

* tests/utils/cache.hpp/cpp: added additional utilities for creating and destroying temp caches (which I guess makes them not caches?) with a set number of qubits

* tests/unit/trotterisation.cpp: updated unitary time evo test to test across deployments

* tests/unit/trotterisation.cpp: reduced number of qubits and increased number of steps to admit the possibility of testing density matrices too

* tests/unit/trotterisation.cpp: added density matrix tests

* reduce test precision

to lazily pass CPU clang quad-precision

* skip Trotter tests in paid CI

* changing varname convention

* renaming cache funcs

---------

Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com>
Co-authored-by: Tyson Jones <tyson.jones.input@gmail.com>
---------

Co-authored-by: Oliver Thomson Brown <otbrown@users.noreply.github.com>
Formerly, the Trotter functions (such as applyTrotterizedPauliStrSumGadget()), when passed permutePaulis=true, would randomly permutate the order of the passed PauliStrSum, mutating it and affecting the outputs of subsequent functions like reportPauliStrSum(). The function also contained superfluous memory allocs/copies equal in size to the PauliStrSum.

Now, the PauliStrSum is never mutated, and an internally allocated ordering list keeps track of the randomised permutation. We also updated the doc, renamed permutePaulis to permuteTerms, and improved validation. Note that 'permuteTerms' had not yet reached main/release, so these changes do not need to be documented in the v4.3 release notes.
Created cpu_qcomp and gpu_qcomp (from a shared base_qcomp) to avoid std::complex arithmetic operators in hot loops which caused performance issues. Removed all prior compiler flags and related scaffolding attempting to mitigate the performance issue.

Also gave MSVC build the params `/Zc:preprocessor -Xcompiler=/Zc:preprocessor /bigobj` as needed for compilation of the unit tests on my windows machines.
This is to circumvent the std::vector performance overheads visible in few-qubit simulation (responsible for a performance regression from v3; see #720), and also so that qubit lists can be passed directly to CUDA kernels without conversion (as explored in #739).
Optimisations include:
- Adopted SmallView (const SmallList&) to avoid superfluous SmallList copies
- Made internally created matrices static
- Change accelerator dynamic function vectors to static arrays
- Exit all validators early when validation is disabled

Additional cleanup includes:
- Tidied accelerator macros (replaced param-specific macros like "numCtrls" and "numTargs" with "param")
- Fill ctrlStates vectors with default before localiser
- Renamed getBitsFromInteger to setToBitsOfInteger
- Adopted const in bitwise.hpp to better express intent

Note that the naming of SmallList and SmallView will be subsequently changed to List64 and ConstList64
such that they all begin with QUEST, but some have additional changes
so that we can compile MPI tests on systems which cannot actually run with MPI, because they are missing an MPI or UCX library file, as is witnessed in the CI (when compiling with MPICH). It's generally irksome too to trigger an execution of the test binary (which itself initialises QuEST) during build when on a HPC platform with distinct submit and compute nodes
* Added ENABLE_SUBCOMM build option

* Moved from MPI_COMM_WORLD to mpiQuestComm

* Decided passing *MPI_Comm was probably overly cautious, and updated function name to comm_getMpiComm

* environment.cpp: added methods to reset rank and numNodes, and reporting for subcomm compiled

* comm_config.hpp/cpp: added comm_setMpiComm

* CMakeLists.txt: PUBLIC MPI::MPI_CXX turned out to be unhelpful, even for SubComm, because of course it enforces CXX

* Added new custom QuESTEnv initialiser which allow user to positively declare that they take ownership of MPI

* validation.cpp: updated comm_end call

* comm_config.hpp: added config.h include so COMPILE_MPI is actually defined

* subcommunicator.h/cpp: implemented QuESTEnv initialiser with custom MPI_Comm

* CMake: added subcommunicator.cpp

* comm_config.hpp: added missing config.h include...

* comm_config.cpp: explicitly initialise mpiCommQuest to MPI_COMM_NULL, updated setComm for init only workflow

* quest.h: added subcommunicator header

* CMake: added MPI to application binaries when SUBCOMM is enabled

* comm_routines.cpp: post Irecv before Isend which probably won't do anything but it makes MPI library implementers less nervous

* tests: added new env test for initCustomMpiQuESTEnv

* Added error throws to comm_config to cover new scenarios of badness with user owned MPI

* subcommunicator.cpp: updated var names to match QuEST style

* tests/unit/initialisations.cpp: slightly modified setQuregAmps test to avoid unexpected test failure due to range checking when compild in Debug configuration

* Updated validation in comm_setMpiComm

Co-authored-by: iarejula-bsc <inigo.arejula@bsc.es>

* userOwnsMpi int->bool

* comm_config.cpp: corrected call to MPI_Comm_free

* subcommunicator.cpp: userOwnsMpi int->bool

* subcommunicator.cpp: added comm_isInit guard around comm_setMpiComm

* environment.cpp: USER_OWNS_MPI -> userOwnsMpi

* comm_init: fixed case where useDistrib = 0 and userOwnsMpi = true

* comm_init: moved (recently) misplaced MPI_Init

* AUTHORS.txt: added iarejula-bsc

* Added placeholder docstrings to new initialisers

* docs/cmake.md: added ENABLE_SUBCOMM to list of QuEST CMake vars

* Newly added COMPILE_MPI -> QUEST_COMPILE_MPI

* ENABLE_SUBCOMM -> QUEST_ENABLE_SUBCOMM

* CMake: corrected OpenMP and subcommunicator pre-processor definitions

---------

Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com>
Co-authored-by: iarejula-bsc <inigo.arejula@bsc.es>
to reduce the likelihood of users printing from non-root nodes interrupting QuEST root output. This is not bullet-proof; we sync the active communicator rather than MPI_COMM_WORLD so the user-controlled non-participating processes may still be printing. Furthermore, even if all processes participate, some may have outstanding non-root prints that are not aggregated to the user screen by the time MPI_Barrier finishes. But these syncs greatly reduce the change of corruption, and are effectively free!
This enables CRAY MPICH platforms to leverage GPU-awareness, greatly accelerating distributed GPU simulation

Co-authored-by: JPRichings <james.richings@ed.ac.uk>
Important changes:
- permit user initialisation of MPI when QuEST is not distributed
- changed QuESTEnv fields bool from int (e.g. isMultithreaded)
- add user-input validation for custom MPI calls
- disambiguated comm_config.cpp concepts of "MPI is initialised" (comm_isMpiInit) from "QuEST communication is active" (comm_isActive)
- refactored comm_config.cpp flow, especially related to pre-quest-init flow (during validation)
- added Oliver's custom-MPI examples (from #712)
- moved new API functions to experimental.h
- tweaked reportQuESTEnv output grouping
Added:
- QUEST_DEFAULT_NUM_GPU_THREADS_PER_BLOCK CMake option
- QUEST_DEFAULT_NUM_GPU_THREADS_PER_BLOCK environment variable
- setQuESTNumGpuThreadsPerBlock() API function
- getQuESTNumGpuThreadsPerBlock() API function
- set_num_gpu_threads examples in examples/extended

---------

Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com>
Co-authored-by: Tyson Jones <tyson.jones.input@gmail.com>
TysonRayJones and others added 3 commits June 1, 2026 22:32
Beware this included removing the superfluous `numControls` argument from the C++only `std::vector` overload of `applyMultiStateControlledCompMatr2`, which is technically a teeny tiny API break ¯\_(ツ)_/¯
…de new validation (#771)

Updated number of seeds test to use a valid pointer and added a separate NULL pointer test.
test_free.yml: added Release config to ctest commands (#773)
@TysonRayJones
Copy link
Copy Markdown
Member Author

@otbrown @JPRichings Draft of release notes above, feel free to edit directly!

@otbrown
Copy link
Copy Markdown
Collaborator

otbrown commented Jun 3, 2026

Added a note about the change from MPI_COMM_WORLD and that the change in datatypes restores AMD GPU support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants