Interesting article: use Claude Code to help open models write CUDA kernels (for eg) by turning CC traces into Skills. They made a library out of it π
We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!
1οΈβ£ Q1 β Learning to Reason Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.
Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)
2οΈβ£ Q2 β Multimodality and Coding More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.
Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4
3οΈβ£ Q3 β "Gold" rush, OpenAI opens up, the community goes bananas Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.
Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5
4οΈβ£ Q4 β Mistral returns, leaderboard hill-climbing Mistral is back with updated model families. All labs release impressive models to wrap up the year!
Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 π€―
Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models π₯
Weβre thrilled to announce that the Qwen3-VL family of vision-language models is now available on Azure AI Foundry, thanks to our collaboration with Microsoft.
We bring open-source innovation to enterprise-grade AI infrastructure, making it easier than ever for enterprise to deploy and scale the latest and greatest from models from hugging Face securely within Azure.
π Highlights:
- Deploy Qwen3-VL instantly via managed endpoints - Built-in governance, telemetry, and lifecycle management - True multimodal reasoning β vision, language, and code understanding - State-of-the-art performance, outperforming closed-source models like Gemini 2.5 Pro and GPT-5 - Available in both *Instruct* and *Thinking* modes, across 24 model sizes
π Get started today: search for Qwen3-VL in the Hugging Face Collection on Azure AI Foundry.
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! π€― Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! π
How does it work? π€ 1οΈβ£ Generate and cache image features for each frame 2οΈβ£ Create a list of embeddings for selected patch(es) 3οΈβ£ Compute cosine similarity between each patch and the selected patch(es) 4οΈβ£ Highlight those whose score is above some threshold
... et voilΓ ! π₯³
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.
We now have the newest Open AI models available on the Dell Enterprise Hub!
We built the Dell Enterprise Hub to provide access to the latest and greatest model from the Hugging Face community to our on-prem customers. Weβre happy to give secure access to this amazing contribution from Open AI on the day of its launch!
Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! π€― π£οΈ Transcribe videos, meeting notes, songs and more π Runs on-device, meaning no data is sent to a server π Multilingual (8 languages) π€ Completely free (forever) & open source
That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! π₯
You can now find it in the Hugging Face Collection in Azure ML or Azure AI Foundry, along with 10k other Hugging Face models π€π€ Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
π New in Azure Model Catalog: NVIDIA Parakeet TDT 0.6B V2
We're excited to welcome Parakeet TDT 0.6B V2βa state-of-the-art English speech-to-text modelβto the Azure Foundry Model Catalog.
What is it?
A powerful ASR model built on the FastConformer-TDT architecture, offering: π Word-level timestamps βοΈ Automatic punctuation & capitalization π Strong performance across noisy and real-world audio
It runs with NeMo, NVIDIAβs optimized inference engine.
Want to give it a try? π§ You can test it with your own audio (up to 3 hours) on Hugging Face Spaces before deploying.If it fits your need, deploy easily from the Hugging Face Hub or Azure ML Studio with secure, scalable infrastructure!
π Learn more by following this guide written by @alvarobartt
In case you missed it, Hugging Face expanded its collaboration with Azure a few weeks ago with a curated catalog of 10,000 models, accessible from Azure AI Foundry and Azure ML!
@alvarobartt cooked during these last days to prepare the one and only documentation you need, if you wanted to deploy Hugging Face models on Azure. It comes with an FAQ, great guides and examples on how to deploy VLMs, LLMs, smolagents and more to come very soon.
We need your feedback: come help us and let us know what else you want to see, which model we should add to the collection, which model task we should prioritize adding, what else we should build a tutorial for. Youβre just an issue away on our GitHub repo!