Introduction
Diffusion Transformers (DiTs) offer state-of-the-art fidelity in image and video synthesis, but their iterative sampling process remains a major bottleneck due to the high cost of transformer forward passes at each timestep. To mitigate this, feature caching has emerged as a training-free acceleration technique that reuses or forecasts hidden representations. However, existing methods often apply a uniform caching strategy across all feature dimensions, ignoring their heterogeneous dynamic behaviors. Therefore, we adopt a new perspective by modeling hidden feature evolution as a mixture of ODEs across dimensions, and introduce HyCa, a Hybrid ODE solver inspired caching framework that applies dimension-wise caching strategies. HyCa achieves near-lossless acceleration across diverse domains and models, including 5.56× speedup on FLUX and HunyuanVideo, 6.24× speedup on Qwen-Image and Qwen-Image-Edit without retraining.
Samples
Explore our generated samples across different domains below.
Image Cases
Image Editing Cases
Video Cases
Contact Us
Feel free to contact us for any questions, cooperation, and communication.
BibTeX
@misc{zheng2025letfeaturesdecidesolvers,
title={Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers},
author={Shikang Zheng and Guantao Chen and Qinming Zhou and Yuqi Lin and Lixuan He and Chang Zou and Peiliang Cai and Jiacheng Liu and Linfeng Zhang},
year={2025},
eprint={2510.04188},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.04188},
}