International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Georg Land

Publications

Year
Venue
Title
2024
TCHES
HAETAE: Shorter Lattice-Based Fiat-Shamir Signatures
We present HAETAE (Hyperball bimodAl modulE rejecTion signAture schemE), a new lattice-based signature scheme. Like the NIST-selected Dilithium signature scheme, HAETAE is based on the Fiat-Shamir with Aborts paradigm, but our design choices target an improved complexity/compactness compromise that is highly relevant for many space-limited application scenarios. We primarily focus on reducing signature and verification key sizes so that signatures fit into one TCP or UDP datagram while preserving a high level of security against a variety of attacks. As a result, our scheme has signature and verification key sizes up to 39% and 25% smaller, respectively, compared than Dilithium. We provide a portable, constanttime reference implementation together with an optimized implementation using AVX2 instructions and an implementation with reduced stack size for the Cortex-M4. Moreover, we describe how to efficiently protect HAETAE against implementation attacks such as side-channel analysis, making it an attractive candidate for use in IoT and other embedded systems.
2024
TCHES
Correction Fault Attacks on Randomized CRYSTALS-Dilithium
After NIST’s selection of Dilithium as the primary future standard for quantum-secure digital signatures, increased efforts to understand its implementation security properties are required to enable widespread adoption on embedded devices. Concretely, there are still many open questions regarding the susceptibility of Dilithium to fault attacks. This is especially the case for Dilithium’s randomized (or hedged) signing mode, which, likely due to devastating implementation attacks on the deterministic mode, was selected as the default by NIST.This work takes steps towards closing this gap by presenting two new key-recovery fault attacks on randomized/hedged Dilithium. Both attacks are based on the idea< of correcting faulty signatures after signing. A successful correction yields the value of a secret intermediate that carries information on the key. After gathering many faulty signatures and corresponding correction values, it is possible to solve for thesigning key via either simple linear algebra or lattice-reduction techniques. Our first attack extends a previously published attack based on an instruction-skipping fault to the randomized setting. Our second attack injects faults in the matrix A, which is part of the public key. As such, it is not sensitive to side-channel leakage and has, potentially for this reason, not seen prior analysis regarding faults.We show that for Dilithium2, the attacks allow key recovery with as little as 1024 and 512 faulty signatures, with each signature generated by injecting a single targeted fault. We also demonstrate how our attacks can be adapted to circumvent several popular fault countermeasures with a moderate increase in the computational runtime and the number of required faulty signatures. These results are verified using both simulated faults and clock glitches on an ARM-based standard microcontroller. The presented attacks demonstrate that also randomized Dilithium can be subject to diverse fault attacks, that certain countermeasures might be easily bypassed, and that potential fault targets reach beyond side-channel sensitive operations. Still, many further operations are likely also susceptible, implying the need for increased analysis efforts in the future.
2023
TCHES
Gadget-based Masking of Streamlined NTRU Prime Decapsulation in Hardware
Streamlined NTRU Prime is a lattice-based Key Encapsulation Mechanism (KEM) that is, together with X25519, the default algorithm in OpenSSH 9. Based on lattice assumptions, it is assumed to be secure also against attackers with access to< large-scale quantum computers. While Post-Quantum Cryptography (PQC) schemes have been subject to extensive research in recent years, challenges remain with respect to protection mechanisms against attackers that have additional side-channel information, such as the power consumption of a device processing secret data. As a countermeasure to such attacks, masking has been shown to be a promising and effective approach. For public-key schemes, including any recent PQC schemes, usually, a mixture of Boolean and arithmetic techniques is applied on an algorithmic level. Our generic hardware implementation of Streamlined NTRU Prime decapsulation, however, follows an idea that until now was assumed to be solely applicable efficiently to symmetric cryptography: gadget-based masking. The hardware design is transformed into a secure implementation by replacing each gate with a composable secure gadget that operates on uniform random shares of secret values. In our work, we show the feasibility of applying this approach also to PQC schemes and present the first Public-Key Cryptography (PKC) – pre- and post-quantum – implementation masked with the gadget-based approach considering several trade-offs and design choices. By the nature of gadget-based masking, the implementation can be instantiated at arbitrary masking order. We synthesize our implementation both for Artix-7 Field-Programmable Gate Arrays (FPGAs) and 45nm Application-Specific Integrated Circuits (ASICs), yielding practically feasible results regarding the area, randomness requirement, and latency. We verify the side-channel security of our implementation using formal verification on the one hand, and practically using Test Vector Leakage Assessment (TVLA) on the other. Finally, we also analyze the applicability of our concept to Kyber and Dilithium, which will be standardized by the National Institute of Standards and Technology (NIST).
2023
TCHES
A Tale of Snakes and Horses: Amplifying Correlation Power Analysis on Quadratic Maps
We study the success probabilities of two variants of Correlation Power Analysis (CPA) to retrieve multiple secret bits. The target is a permutation-based symmetric cryptographic construction with a quadratic map as an S-box. More precisely, we focus on the nonlinear mapping X used in the Xoodoo and Keccak-p permutations, which is affine equivalent to the nonlinear mapping of Ascon. We thus consider three-bit and five-bit S-boxes. Our leakage model is the difference in power consumption of register cells before and after one round. It reflects that, >in hardware, the aforesaid cryptographic algorithms are usually implemented by deploying a round-based architecture. The power consumption difference depends on whether the targeted bits in the register flip. In particular, we describe two attacks based on the CPA methodology. First, we start with a standard CPA approach, for which, to the best of our knowledge, we are the first to point out the differences between attacking a three-bit and a five-bit S-box. For CPA, the highest correlation coefficient is the most likely secret hypothesis. We improve on this with our novel combined Correlation Power Analysis (combined CPA), or Snake attack, which uses quadratic map cryptanalysis (e.g., of the function X) to achieve a better attack in terms of the number of traces required and computational complexity. For the Snake attack, sums of absolute or squared values of correlation coefficients are used to determine the most likely guess. As a result, we effectively show that our proposed Snake attack can recover the secret in n22 (or n22+n) intermediate results, compared to 22n for the CPA, where n is the length of the targeted S-Box. We collected power measurements from a hardware setup to demonstrate practical attack success probabilities according to the rank of the correct secret hypothesis both for Xoodoo and Keccak-p. In addition, we explain our success probabilities thanks to the Henery model developed for horse races. In short, after performing 16,896 attacks, the Snake attack or combined CPA on Xoodoo consistently recovers the correct secret bits with each attack using 43,860 traces on average and with only 12 correlations, compared to 61,380 traces for the standard CPA attack with 64 correlations. The Snake attack requires about one-fifth as many correlation values as the standard CPA. For Keccak-p, the difference is more drastic: the Snake attack recovers the secret bits invariably after 771,600 traces with just 20 correlations. In contrast, the standard CPA attack operates on 1,024 correlations (about fifty times more than the Snake attack) and requires 1,223,400 traces.