Research Statement - Dr. Ahmad Al Badawi

The agenda behind fifteen years of work on fully homomorphic encryption, privacy-preserving machine learning, and the silicon to make them practical.

Private computation should be a default, not a feature.

Most of today’s compute runs in the clear. Training data lives in a cluster, inference servers see raw inputs, analytics pipelines need column-level access. As models scale and inference moves into regulated industries (healthcare, finance, defense), that assumption becomes more difficult to satisfy.

Fully homomorphic encryption is the most general tool we have for keeping data encrypted throughout its lifecycle, including during computation. The question is no longer whether FHE works. It does. The question is whether it can be made fast, ergonomic, and infrastructure-ready.

My work has been a sustained attack on that gap, from three angles.

1. Algorithms

The RNS variants of the BFV scheme (IEEE TETC) that I implemented delivered roughly two orders of magnitude over CPU baselines and are now standard building blocks in every modern FHE library. The first homomorphic CNN on GPUs (HCNN) (IEEE TETC) showed that encrypted deep learning is not a thought experiment but a measurable benchmark. CareNets (NeurIPS 2019) pushed that result to high-resolution images: a compact packing scheme that fits CNN inputs, weights, and activations into HE ciphertexts, delivering over 32x speedup, 45x better memory efficiency, and a 5851x reduction in transferred message size on encrypted 96x96 and 256x256 retinal images, all within 3% of the plaintext accuracy. PrivFT (IEEE Access) extended the line into text, training and serving private classification on encrypted documents.

2. Systems

Algorithms alone do not close the deployment gap. My systems work started with the first single-GPU CUDA implementation of FV/BFV (IACR TCHES 2018), the foundational GPU-FHE result that set the baseline for every accelerated FHE implementation since. The multi-GPU extension (IEEE TPDS) then scaled FHE workloads across GPU clusters, turning encrypted training and inference into a parallel workload rather than a CPU bottleneck. As a co-author of PALISADE and its successor OpenFHE, I helped build the open-source infrastructure that most FHE research and product work today depends on. As technical lead on DARPA DPRIVE, I drove a $15M effort from concept to a 12 nm, 1 GHz ASIC design for homomorphic ML, with a custom ISA designed around FHE. The architecture and methodology are documented in TREBUCHET (GOMACTech 2025). From compilers to GPU clusters to silicon, the systems work is the rest of the answer.

3. Applications

The goal of the applications work is to push what FHE can deploy, not just what it can demonstrate.

In healthcare, cross-institution oncology analysis under multiparty HE (PNAS) showed that federated learning on real cancer-center data is technically achievable end-to-end, and CKKS-based private pathological assessment (Springer BioData Mining, 2024) extended the same machinery to encrypted pathology classification: SVM inference on encrypted patient data paired with a compact feature-extraction pipeline runs in seconds at 128-bit security and matches the accuracy of plaintext baselines.

In financial-sector analytics, FHSVM (Neural Computing and Applications, 2022) ran homomorphic SVM inference for anti-money-laundering classification on encrypted Bitcoin-transaction datasets, achieving roughly 1.25 s prediction latency on multi-core CPUs at 128-bit security with zero accuracy loss versus the plaintext model, through novel CKKS packing strategies and parallel implementation.

On the LLM side, POLARIS is the open-source, model-preserving reference framework I introduced for CKKS-based private LLM inference: encrypted BERT-Tiny and BERT-Mini under GPU acceleration, evaluated on standard, unmodified architectures without retraining or activation-function substitution. It is intended as a proof-of-concept and a shared baseline for the community, paired with the Private LLM Card System (PLCS) for standardized reporting of framework configurations and results across the field.

Cancer-center collaborations, federated learning on encrypted gradients, intrusion detection on encrypted telemetry, and private LLM inference on encrypted prompt embeddings are not toy demos. They are the cases that show what production-grade private computation actually looks like.

The next decade

The frontier is private inference for large generative models. CKKS-based frameworks have now demonstrated end-to-end encrypted inference of LLMs up to 8 billion parameters, but my recent systematization of knowledge (SoK on private LLM inference under approximate HE) identifies a runtime gap of roughly four orders of magnitude between encrypted and plaintext inference as the primary barrier to practical use. CKKS-based inference is now algorithmically feasible on standard, unmodified LLMs from BERT-Tiny up to Llama-3-8B; it is not yet operationally practical for human-facing applications until that efficiency gap is narrowed.

Closing it is not a single-discipline problem. It requires progress on all three pillars at once:

Algorithms: FHE-friendly approximations of transformer building blocks (softmax, GELU, layer norm), packing layouts for linear and non-linear blocks, and bootstrapping schedules tuned to attention and MLP rather than generic worst-case loops.
Systems: compilers, hybrid execution stacks, GPU acceleration, and silicon that turn the algorithmic gains into wall-clock latency a user will tolerate.
Applications: regulated, latency-tolerant use cases (healthcare diagnostics, federated analytics, multi-party machine learning, secure inference for defense) where the privacy guarantee is worth a real latency budget today, and where deployment exercises the rest of the stack.

10x improvement on each of the three pillars (algorithms, systems, applications) compounds to a 1000x reduction, closing most of the four-orders-of-magnitude gap and leaving encrypted inference roughly 10x slower than plaintext. That overhead is well within what latency-tolerant production workloads can absorb (overnight analytics, federated training rounds, regulated medical and financial diagnostics, batch scoring, asynchronous LLM serving), and small enough that the privacy guarantee starts to dominate the cost calculus rather than be dominated by it. None of the three 10x gains is out of reach individually; the open question is whether the field can deliver them in a coordinated way rather than each pillar shipping a 10x in isolation.

I work across all three pillars and I am looking for collaborators (labs, startups, program committees) who want to make encrypted AI practical.

Get in touch See current research See research impact