Shihe Ma

CryptoDB

Shihe Ma

Publications

Year

Venue

Title

2025

CIC

Ultra Low-Latency Block Cipher uLBC Abstract

Guoxiao Liu Qingyuan Yu Liyuan Tang Shihe Ma Congming Wei Keting Jia Lingyue Qin Xiaoyang Dong Yantian Shen

<p>In recent years, there has been a growing interest in low-latency ciphers. Since the first low-latency block cipher PRINCE was proposed at ASIACRYPT 2012, many low-latency primitives sprung up, such as Midori, MANTIS, QARMA and SPEEDY. Some ciphers, like SPEEDY and Orthros, introduce bit permutations to achieve reduced delay. However, this approach poses a challenge in evaluating the resistance against some cryptanalysis, especially differential and linear attacks. SPEEDY-7-192, was fully broken by Boura et.al. using differential attack, for example. In this paper, we manage to propose a novel low-latency block cipher, which guarantees security against differential and linear attacks. Revisiting the permutation technique used in Orthros, we investigate the selection of nibble permutations and propose a method for selecting them systematically rather than relying on random search. Our new nibble permutation method ensures the existence of impossible differential and differential trails for up to 8 rounds, while the nibble permutations for both branches of Orthros may lead to a 9-round impossible differential trail. Furthermore, we introduce a new approach for constructing low-latency coordinate functions for 4-bit S-boxes, which involves a more precise delay computation compared to traditional methods based solely on circuit depth. The new low-latency primitive uLBC we propose, is a family of 128-bit block ciphers, with three different versions of key length, respectively 128-bit and 256-bit key, as well as a 384-bit tweakey version with variable-length key. According to the key length, named uLBC-128, uLBC-256 and uLBC-384t. Our analysis shows that uLBC-128 exhibits lower latency and area requirements compared to ciphers such as QARMA9-128 and Midori128. On performance, uLBC-128 has excellent AT performance, the best performance except SPEEDY-6, and even the best performance in UMC 55nm in our experiments. </p>

2024

EUROCRYPT

Accelerating BGV Bootstrapping for Large $p$ Using Null Polynomials Over $\mathbb{Z}_{p^e}$ Abstract

Shihe Ma Tairong Huang Anyu Wang Xiaoyun Wang

The BGV scheme is one of the most popular FHE schemes for computing homomorphic integer arithmetic. The bootstrapping technique of BGV is necessary to evaluate arbitrarily deep circuits homomorphically. However, the BGV bootstrapping performs poorly for large plaintext prime $p$ due to its digit removal procedure exhibiting a computational complexity of at least $O(\sqrt{p})$. In this paper, we propose optimizations for the digit removal procedure with large $p$ by leveraging the properties of null polynomials over the ring $\mathbb{Z}_{p^e}$. Specifically, we demonstrate that it is possible to construct low-degree null polynomials based on two observations of the input to the digit removal procedure: 1) the support size of the input can be upper-bounded by $(2B+1)^2$; 2) the size of the lower digits to be removed can be upper-bounded by $B$. Here $B$ can be controlled within a narrow interval $[22,23]$ in our parameter selection, making the degree of these null polynomials much smaller than $p$ for large values of $p$. These low-degree null polynomials can significantly reduce the polynomial degrees during homomorphic digit removal, thereby decreasing both running time and capacity consumption. Theoretically, our optimizations reduce the computational cost of extracting a single digit from $O(\sqrt{pe})$ (by Chen and Han) or $O(\sqrt{p}\sqrt[4]{e})$ (by Geelen et al.) to $\min(2B+1,\sqrt{\lceil e/t\rceil(2B+1)})$ for some $t\ge 1$. We implement and benchmark our method on HElib with $p=17,127,257,8191$ and $65537$. With our optimized digit removal, we achieve a bootstrapping throughput $1.38\sim151$ times that in HElib, with the speedup increasing with the value of $p$. For $p=65537$, we accelerate the digit removal step by 80 times and reduce the bootstrapping time from more than 12 hours to less than 14 minutes.

2024

ASIACRYPT

Faster BGV Bootstrapping for Power-of-two Cyclotomics through Homomorphic NTT Abstract

Shihe Ma Tairong Huang Anyu Wang Xiaoyun Wang

Power-of-two cyclotomics is a popular choice when instantiating the BGV scheme because of its efficiency and compliance with the FHE standard. However, in power-of-two cyclotomics, the linear transformations in BGV bootstrapping cannot be decomposed into sub-transformations for acceleration with existing techniques. Thus, they can be highly time-consuming when the number of slots is large, degrading the advantage brought by the SIMD property of the plaintext space. By exploiting the algebraic structure of power-of-two cyclotomics, this paper derives explicit decomposition of the linear transformations in BGV bootstrapping into NTT-like sub-transformations, which are highly efficient to compute homomorphically. Moreover, multiple optimizations are made to evaluate homomorphic linear transformations, including modified BSGS algorithms, trade-offs between level and time, and specific simplifications for thin and general bootstrapping. We implement our method on HElib. With the number of slots ranging from 4096 to 32768, we obtain a 2.4x$\sim$55.1x improvement in bootstrapping throughput, compared to previous works or the naive approach.

2023

TCHES

Fast and Accurate: Efficient Full-Domain Functional Bootstrap and Digit Decomposition for Homomorphic Computation Abstract

Shihe Ma Tairong Huang Anyu Wang Qixian Zhou Xiaoyun Wang

The functional bootstrap in FHEW/TFHE allows for fast table lookups on ciphertexts and is a powerful tool for privacy-preserving computations. However, the functional bootstrap suffers from two limitations: the negacyclic constraint of the lookup table (LUT) and the limited ability to evaluate large-precision LUTs. To overcome the first limitation, several full-domain functional bootstraps (FDFB) have been developed, enabling the evaluation of arbitrary LUTs. Meanwhile, algorithms based on homomorphic digit decomposition have been proposed to address the second limitation. Although these algorithms provide effective solutions, they are yet to be optimized. This paper presents four new FDFB algorithms and two new homomorphic decomposition algorithms that improve the state-of-the-art. Our FDFB algorithms reduce the output noise, thus allowing for more efficient and compact parameter selection. Across all parameter settings, our algorithms reduce the runtime by up to 39.2%. Our homomorphic decomposition algorithms also run at 2.0x and 1.5x the speed of prior algorithms. We have implemented and benchmarked all previous FDFB and homomorphic decomposition algorithms and our methods in OpenFHE.