Block Diffusion for Flash Speculative Decoding
AI & ML interests
Efficient AI
Recent Activity
View all activity
Papers
DFlash: Block Diffusion for Flash Speculative Decoding
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
-
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Paper • 2511.10645 • Published • 7 -
z-lab/Qwen3.5-4B-PARO
Image-Text-to-Text • 1B • Updated • 574 • 10 -
z-lab/Qwen3.5-9B-PARO
Image-Text-to-Text • 3B • Updated • 591 • 23 -
z-lab/Qwen3.5-2B-PARO
Image-Text-to-Text • 1B • Updated • 143 • 1
Block Diffusion for Flash Speculative Decoding
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
-
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Paper • 2511.10645 • Published • 7 -
z-lab/Qwen3.5-4B-PARO
Image-Text-to-Text • 1B • Updated • 574 • 10 -
z-lab/Qwen3.5-9B-PARO
Image-Text-to-Text • 3B • Updated • 591 • 23 -
z-lab/Qwen3.5-2B-PARO
Image-Text-to-Text • 1B • Updated • 143 • 1
models 27
z-lab/Qwen3.5-9B-DFlash
Text Generation • 3B • Updated
• 215 • 2
z-lab/Qwen3.5-4B-DFlash
Text Generation • 1B • Updated
• 104 • 2
z-lab/Meta-Llama-3-8B-Instruct-SparseLoRA
Updated
• 22
z-lab/Llama-2-13b-hf-SparseLoRA
Updated
• 16
z-lab/Llama-2-7b-hf-SparseLoRA
Updated
• 16
z-lab/Qwen3.5-35B-A3B-DFlash
Text Generation • 1B • Updated
• 498 • 4
z-lab/Qwen3-14B-PARO
Text Generation • 2B • Updated
• 179 • 2
z-lab/Qwen3-8B-PARO
Text Generation • 1B • Updated
• 986 • 1
z-lab/Qwen3-4B-PARO
Text Generation • 0.9B • Updated
• 464 • 1
z-lab/Qwen3-1.7B-PARO
Text Generation • 0.5B • Updated
• 176 • 1
datasets 0
None public yet