ILV: Iterative Latent Volumes for Fast and
Accurate Sparse-View CT Reconstruction

Seungryong Lee1, Woojeong Baek2, Joosang Lee1, Eunbyung Park2†
1Sungkyunkwan University 2Yonsei University

Reconstruction quality across axial, sagittal, and coronal views. All baselines are optimization-based methods that require substantial reconstruction time: IntraTomo (9m 26s), NAF (5m 04s), SAX-NeRF (1h 35m), and R2-Gaussian (17m 37s), while our method (ILV) completes reconstruction less than a second (0.76s) with achieving significantly higher quality.

Abstract

A long-term goal in CT imaging is to achieve fast and accurate 3D reconstruction from sparse-view projections, thereby reducing radiation exposure, lowering system cost, and enabling timely imaging in clinical workflows. Recent feed-forward approaches have shown strong potential toward this overarching goal, yet their results still suffer from artifacts and loss of fine details. In this work, we introduce Iterative Latent Volumes (ILV), a feed-forward framework that integrates data-driven priors with classical iterative reconstruction principles to overcome key limitations of prior feed-forward models in sparse-view CBCT reconstruction.
At its core, ILV constructs an explicit 3D latent volume that is repeatedly updated by conditioning on multi-view X-ray features and the learned anatomical prior, enabling the recovery of fine structural details beyond the reach of prior feed-forward models. In addition, we develop and incorporate several key architectural components, including an X-ray feature volume, group cross-attention, efficient self-attention, and view-wise feature aggregation, that efficiently realize its core latent volume refinement concept. Extensive experiments on a large-scale dataset of approximately 14,000 CT volumes demonstrate that ILV significantly outperforms existing feed-forward and optimization-based methods in both reconstruction quality and speed. These results show that ILV enables fast and accurate sparse-view CBCT reconstruction suitable for clinical use.

Pipeline Architecture

Overview of the proposed ILV. Given multi-view X-ray images, ILV reconstructs a 3D CT volume or synthesizes novel-view projections. The overall network consists of four stages: (1) Multi-view X-ray image encoding, (2) Latent volume update, (3) Gaussian volume decoding, and (4) CT volume refinement. (For more details, please refer to our paper.)

Quantitative Comparison

Comparison with feed-forward methods

Type Method Time ↓ 6-View 8-View 10-View
PSNR↑ SSIM↑ PSNR↑ SSIM↑ PSNR↑ SSIM↑
Traditional FDK 0.23s 12.790.12214.580.14514.750.166
ASD-POCS 1m 32s 22.480.66123.570.69524.370.721
SART 2m 48s 23.210.68924.260.71225.060.733
2D FF FreeSeed 4.5s 28.810.79329.610.83330.340.837
3D FF DIF-Net 3.0s 24.180.72024.590.73424.650.745
DIF-Gaussian 3.0s 26.560.81027.460.82927.880.837
ILV (Ours) 0.59s 33.450.922 33.250.919 33.840.924

Comparison with optimization-based (NeRF/Gaussian) methods

Method Time ↓ 6-View 10-View 24-View
PSNR↑ SSIM↑ PSNR↑ SSIM↑ PSNR↑ SSIM↑
IntraTomo 9m 26s 24.490.72226.300.77228.620.837
NAF 5m 04s 23.740.67826.170.74131.340.876
SAX-NeRF 1h 35m 24.580.75426.780.79433.140.919
R²-Gaussian 17m 37s 24.540.77327.260.82333.290.931
ILV (Ours) 0.76s 33.570.923 33.950.925 35.930.941

Qualitative Comparison

Visual comparison of CT reconstruction across different views. ILV achieves significantly cleaner structural details and consistency under both 10-view and 24-view sparse settings compared to existing methods.

X-ray Novel View Synthesis

Synthesis results for X-ray novel view synthesis. ILV successfully recovers sharp object boundaries and consistent internal structures.