More OSS than ever with the latest pruna 0.3.2 release. It extends existing algorithm families, such as compilers, kernels, and pruners, and adds new ones, including decoders, distillers, enhancers, and recoverers. But it's not only a collection of algorithms; instead, you can easily combine them to get the biggest efficiency win.
This first unit of the course sets you up with all the fundamentals to become a pro in agents.
- What's an AI Agent? - What are LLMs? - Messages and Special Tokens - Understanding AI Agents through the Thought-Action-Observation Cycle - Thought, Internal Reasoning and the Re-Act Approach - Actions, Enabling the Agent to Engage with Its Environment - Observe, Integrating Feedback to Reflect and Adapt
Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.
🚀 We will be generating a preference dataset for DPO/ORPO and cleaning it with AI feedback during our upcoming meetup!
In this session, we'll walk you through the essentials of building a distilabel pipeline by exploring two key use cases: cleaning an existing dataset and generating a preference dataset for DPO/ORPO. You’ll also learn how to make the most of AI feedback, integrating Argilla to gather human feedback and improve the overall data quality.
This session is perfect for you - if you’re getting started with distilabel or synthetic data - if you want to learn how to use LLM inference endpoints for **free** - if you want to discover new functionalities - if you want to provide us with new feedback
distilabel 1.3.0 is out! This release contains many core improvements and new tasks that help us building argilla/magpie-ultra-v0.1!
Distributed pipeline execution with Ray, new Magpie tasks, reward models, components for dataset diversity based on sentence embeddings, Argilla 2.0 compatibility and many more features!
🔌 You can now integrate with the Hugging Face Hub and get started in under five minutes. 🪂 A single Dataset class is now designed to handle multiple tasks. 🔧 It’s 100 times simpler to configure your dataset now with the new SDK! 📖 The documentation has been revamped to be cleaner and more user-friendly. 🍌 A new feature automates splitting annotation tasks among a team. ✍️ The layout has been made more flexible to accommodate many use cases.
Data is essential for training good AI systems. We believe that the amazing community built around open machine learning can also work on developing amazing datasets together.
To explore how this can be done, Argilla and Hugging Face are thrilled to announce a collaborative project where we’re asking Hugging Face community members to build a dataset consisting of LLM prompts collectively.
What are we doing? Using an instance of Argilla — a powerful open-source data collaboration tool — hosted on the Hugging Face Hub, we are collecting ratings of prompts based on their quality.
How Can You Contribute? It’s super simple to start contributing:
1. Sign up if you don’t have a Hugging Face account