15 32 67

Sara Han Díaz

sdiazlor

AI & ML interests

Data curation and generation, RLHF, RAG, Prompt Engineering

Recent Activity

posted an update 1 day ago

More OSS than ever with the latest pruna 0.3.2 release. It extends existing algorithm families, such as compilers, kernels, and pruners, and adds new ones, including decoders, distillers, enhancers, and recoverers. But it's not only a collection of algorithms; instead, you can easily combine them to get the biggest efficiency win. Read the full blog here: https://huggingface.co/blog/PrunaAI/pruna-0-3-2-open-source-optimization-algorithms

upvoted an article 1 day ago

KV Caching Explained: Optimizing Transformer Inference Efficiency

published an article 1 day ago

Pruna 0.3.2: More OSS Algos, More Ways to Optimize

View all activity

Organizations

posted an update 1 day ago

Post

243

More OSS than ever with the latest pruna 0.3.2 release. It extends existing algorithm families, such as compilers, kernels, and pruners, and adds new ones, including decoders, distillers, enhancers, and recoverers. But it's not only a collection of algorithms; instead, you can easily combine them to get the biggest efficiency win.

Read the full blog here: https://huggingface.co/blog/PrunaAI/pruna-0-3-2-open-source-optimization-algorithms

reacted to burtenshaw's post with 🔥 about 1 year ago

Post

9288

The Hugging Face agents course is finally out!

👉

agents-course

This first unit of the course sets you up with all the fundamentals to become a pro in agents.

- What's an AI Agent?
- What are LLMs?
- Messages and Special Tokens
- Understanding AI Agents through the Thought-Action-Observation Cycle
- Thought, Internal Reasoning and the Re-Act Approach
- Actions, Enabling the Agent to Engage with Its Environment
- Observe, Integrating Feedback to Reflect and Adapt

reacted to davidberenstein1957's post with 🔥 about 1 year ago

Post

4277

Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.

Blog: https://huggingface.co/blog/synthetic-data-generator
Space: https://huggingface.co/spaces/argilla/synthetic-data-generator

4 replies

reacted to fffiloni's post with 🔥 over 1 year ago

Post

14012

DimensionX is out for you to try and duplicate 🤗
—> fffiloni/DimensionX

Discuss Paper: DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion (2411.04928)

Examples by the amazing William Lamkin @phanes

4 replies

reacted to davidberenstein1957's post with 🔥 over 1 year ago

Post

3033

🚀 We will be generating a preference dataset for DPO/ORPO and cleaning it with AI feedback during our upcoming meetup!

In this session, we'll walk you through the essentials of building a distilabel pipeline by exploring two key use cases: cleaning an existing dataset and generating a preference dataset for DPO/ORPO. You’ll also learn how to make the most of AI feedback, integrating Argilla to gather human feedback and improve the overall data quality.

This session is perfect for you
- if you’re getting started with distilabel or synthetic data
- if you want to learn how to use LLM inference endpoints for **free**
- if you want to discover new functionalities
- if you want to provide us with new feedback

Sign up here: https://lu.ma/dt0c7jru

reacted to gabrielmbmb's post with 🚀❤️ over 1 year ago

Post

2931

distilabel 1.3.0 is out! This release contains many core improvements and new tasks that help us building argilla/magpie-ultra-v0.1!

Distributed pipeline execution with Ray, new Magpie tasks, reward models, components for dataset diversity based on sentence embeddings, Argilla 2.0 compatibility and many more features!

Check the new release in GitHub: https://github.com/argilla-io/distilabel

reacted to Ameeeee's post with 🤗🔥 over 1 year ago

Post

3608

❤️‍🔥 Just released version 2.0 of Argilla!

This small revolution includes:

🔌 You can now integrate with the Hugging Face Hub and get started in under five minutes.
🪂 A single Dataset class is now designed to handle multiple tasks.
🔧 It’s 100 times simpler to configure your dataset now with the new SDK!
📖 The documentation has been revamped to be cleaner and more user-friendly.
🍌 A new feature automates splitting annotation tasks among a team.
✍️ The layout has been made more flexible to accommodate many use cases.

Check out the release highlights for more details: https://github.com/argilla-io/argilla/releases/tag/v2.0.0

1 reply

reacted to dvilasuero's post with 🤯❤️🤗 about 2 years ago

Post

🤗 Data is better together!

Data is essential for training good AI systems. We believe that the amazing community built around open machine learning can also work on developing amazing datasets together.

To explore how this can be done, Argilla and Hugging Face are thrilled to announce a collaborative project where we’re asking Hugging Face community members to build a dataset consisting of LLM prompts collectively.

What are we doing?
Using an instance of Argilla — a powerful open-source data collaboration tool — hosted on the Hugging Face Hub, we are collecting ratings of prompts based on their quality.

How Can You Contribute?
It’s super simple to start contributing:

1. Sign up if you don’t have a Hugging Face account

2. Go to this Argilla Space and sign in: https://huggingface.co/spaces/DIBT/prompt-collective

3. Read the guidelines and start rating prompts!

You can also join the #data-is-better-together channel in the Hugging Face Discord.

Finally, to track the community progress we'll be updating this Gradio dashboard:

https://huggingface.co/spaces/DIBT/prompt-collective-dashboard

5 replies

Sara Han Díaz

AI & ML interests

Recent Activity

Organizations

sdiazlor's activity