International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Chi Zhang

Publications

Year
Venue
Title
2024
TCHES
Optimized Hardware-Software Co-Design for Kyber and Dilithium on RISC-V SoC FPGA
Kyber and Dilithium are both lattice-based post-quantum cryptography (PQC) algorithms that have been selected for standardization by the American National Institute of Standards and Technology (NIST). NIST recommends them as two primary algorithms to be implemented for most use cases. As the applications of RISC-V processors move from specialized scenarios to general scenarios, efficient implementations of PQC algorithms on general-purpose RISC-V platforms are required. In this work, we present an optimized hardware-software co-design for Kyber and Dilithium on the industry’s first RISC-V System-on-Chip (SoC) Field Programmable Gate Array (FPGA) platform. The performance of both algorithms is enhanced through the utilization of hardware acceleration and software optimization, while a certain level of flexibility is still maintained. The polynomial arithmetic operations in Kyber and Dilithium are accelerated by the customized accelerators. We employ a unified high-level architecture to depict their shared characteristics and design dedicated underlying modular multipliers to explore their distinctive features. The hashing functions are optimized using RISC-V assembly instructions, resulting in improved performance and reduced code size without additional hardware resources. For other operations involving matrices and vectors, we present a multi-core acceleration scheme based on the multi-core RISC-V Microprocessor Sub-System (MSS). Combining these acceleration and optimization methods, experimental results show that the overall performance of Kyber and Dilithium across different security levels improves by 3 to 5 times, while the utilized FPGA resources account for less than 5% of the total resources provided by the platform.
2024
TCHES
Trace Copilot: Automatically Locating Cryptographic Operations in Side-Channel Traces by Firmware Binary Instrumenting
A common assumption in side-channel analysis is that the attacker knows the cryptographic algorithm implementation of the victim. However, many labsetting studies implicitly extend this assumption to the knowledge of the source code, by inserting triggers to measure, locate or align the Cryptographic Operations (CO) in the trace. For real-world attacks, the source code is typically unavailable, which poses a challenge for locating the COs thus reducing the effectiveness of many methods. In contrast, obtaining the (partial) binary firmware is more prevalent in practical attacks on embedded devices. While binary code theoretically encapsulates necessary information for side-channel attacks on software-implemented cryptographic algorithms, there is no systematic study on leveraging this information to facilitate side-channel analysis. This paper introduces a novel and general framework that utilizes binary information for the automated locating of COs on side-channel traces. We first present a mechanism that maps the execution flow of binary instructions onto the corresponding side-channel trace through a tailored static binary instrumentation process, thereby transforming the challenge of locating COs into one of tracing cryptographic code execution within the binary. For the latter, we propose a method to retrieve binary instruction addresses that are equivalent to the segmenting boundaries of the COs within side-channel traces. By identifying the mapping points of these instructions on the trace, we can obtain accurate segmentation labeling for the sidechannel data. Further, by employing the well-labeled side-channel segments obtained on a profiling device, we can readily identify the locations of COs within traces collected from un-controllable target devices. We evaluate our approach on various devices and cryptographic software, including a real-world secure boot program. The results demonstrate the effectiveness of our method, which can automatically locate typical COs, such as AES or ECDSA, in raw traces using only the binary firmware and a profiling device. Comparison experiments indicate that our method outperforms existing techniques in handling noisy or jittery traces and scales better to complex COs. Performance evaluation confirms that the runtime and storage overheads of the proposed approach are practical for real-world deployment.
2021
TCHES
Pay Attention to Raw Traces: A Deep Learning Architecture for End-to-End Profiling Attacks 📺
With the renaissance of deep learning, the side-channel community also notices the potential of this technology, which is highly related to the profiling attacks in the side-channel context. Many papers have recently investigated the abilities of deep learning in profiling traces. Some of them also aim at the countermeasures (e.g., masking) simultaneously. Nevertheless, so far, all of these papers work with an (implicit) assumption that the number of time samples in raw traces can be reduced before the profiling, i.e., the position of points of interest (PoIs) can be manually located. This is arguably the most challenging part of a practical black-box analysis targeting an implementation protected by masking. Therefore, we argue that to fully utilize the potential of deep learning and get rid of any manual intervention, the end-to-end profiling directly mapping raw traces to target intermediate values is demanded.In this paper, we propose a neural network architecture that consists of encoders, attention mechanisms and a classifier, to conduct the end-to-end profiling. The networks built by our architecture could directly classify the traces that contain a large number of time samples (i.e., raw traces without manual feature extraction) while whose underlying implementation is protected by masking. We validate our networks on several public datasets, i.e., DPA contest v4 and ASCAD, where over 100,000 time samples are directly used in profiling. To our best knowledge, we are the first that successfully carry out end-to-end profiling attacks. The results on the datasets indicate that our networks could get rid of the tricky manual feature extraction. Moreover, our networks perform even systematically better (w.r.t. the number of traces in attacks) than those trained on the reduced traces. These validations imply our approach is not only a first but also a concrete step towards end-to-end profiling attacks in the side-channel context.
2021
TCHES
Cross-Device Profiled Side-Channel Attack with Unsupervised Domain Adaptation 📺
Deep learning (DL)-based techniques have recently proven to be very successful when applied to profiled side-channel attacks (SCA). In a real-world profiled SCA scenario, attackers gain knowledge about the target device by getting access to a similar device prior to the attack. However, most state-of-the-art literature performs only proof-of-concept attacks, where the traces intended for profiling and attacking are acquired consecutively on the same fully-controlled device. This paper reminds that even a small discrepancy between the profiling and attack traces (regarded as domain discrepancy) can cause a successful single-device attack to completely fail. To address the issue of domain discrepancy, we propose a Cross-Device Profiled Attack (CDPA), which introduces an additional fine-tuning phase after establishing a pretrained model. The fine-tuning phase is designed to adjust the pre-trained network, such that it can learn a hidden representation that is not only discriminative but also domain-invariant. In order to obtain domain-invariance, we adopt a maximum mean discrepancy (MMD) loss as a constraint term of the classic cross-entropy loss function. We show that the MMD loss can be easily calculated and embedded in a standard convolutional neural network. We evaluate our strategy on both publicly available datasets and multiple devices (eight Atmel XMEGA 8-bit microcontrollers and three SAKURA-G evaluation boards). The results demonstrate that CDPA can improve the performance of the classic DL-based SCA by orders of magnitude, which significantly eliminates the impact of domain discrepancy caused by different devices.