Model Quality: Hugging Face Is All You Need

· 734 words · 4 minute read

Building successful AI models typically requires an iterative process of ship, test, and improve. A practical approach is to set up a feedback loop with human testers. And when it comes to implementing it, if you already run your organization on Hugging Face (if not, you should!), then look no further — you have everything at your disposal to make it happen.

Here’s a short and practical case study showing how we revamped our quality feedback loop for the Finegrain Eraser on top of Hugging Face — and we believe the approach is broadly applicable.

Just to be clear: we never collect data from our regular users. What follows is strictly an internal process to improve model quality.

Our Needs 🔗

At the core, we need:

  • A simple web app that lets human testers play with the Eraser model — pick an image, brush over an object to erase, and carefully inspect the result.
  • A way to report any issues — record the inputs/outputs and describe what went wrong from a quality perspective.

We also want the option to restrict access to selected testers.

Hugging Face to The Rescue 🔗

Web App 🔗

The web app part is a no-brainer: it can be built with Gradio and easily deployed as a Space. We’re already heavy users, with numerous public demos showcasing our models.

Data Collection 🔗

A few months ago, we stumbled upon a demo Space called How to persist data from a Space to a Dataset by Lucain Pouget at Hugging Face. It highlights a very useful capability of the Hugging Face Hub called Scheduled uploads, powered by the CommitScheduler:

The idea is to run a background job that regularly pushes a local folder to the Hub. Let’s assume you have a Gradio Space that takes as input some text and generates two translations of it. Then, the user can select their preferred translation. For each run, you want to save the input, output, and user preference to analyze the results. […]

That is exactly what we need! Even better — it can be implemented in just a few lines of code and the data is saved in a dataset repository.

Here’s a simplified, adapted version from the official demo for illustration:

import json
from datetime import datetime
from uuid import uuid4
from PIL import Image
from huggingface_hub import CommitScheduler


IMAGE_DATASET_DIR = Path("data") / f"{uuid4()}"
IMAGE_DATASET_DIR.mkdir(parents=True, exist_ok=True)
IMAGE_JSONL_PATH = IMAGE_DATASET_DIR / "metadata.jsonl"

# 1. Instantiate the scheduler for regular uploads in the background
scheduler = CommitScheduler(
    repo_id="myorg/some-dataset",
    repo_type="dataset",
    folder_path=IMAGE_DATASET_DIR,
    path_in_repo=IMAGE_DATASET_DIR.name,
    every=0.5,  # commit every 30 seconds (tune as you see fit)
    private=True,  # keep it private (optional)
)

# ...

# 2. Write data to the local folder which is monitored by the scheduler
# and seamlessly pushed to the hub (dataset) via a new commit
def save_image(image: Image.Image, info: str) -> None:
    image_path = IMAGE_DATASET_DIR / f"{uuid4()}.jpg"
    
    with scheduler.lock:
        image.save(image_path, quality=95)
        with IMAGE_JSONL_PATH.open("a") as f:
            json.dump({"info": info, "file_name": image_path.name, "datetime": datetime.now().isoformat()}, f)
            f.write("\n")

Access Control 🔗

The last piece was access control. We decided to keep both the Space and the dataset private, while still onboarding external human testers who aren’t part of Finegrain. The challenge: ensuring they only get access to what they need — in this case, the private Space for Eraser quality assessment.

This is where Resource groups (available in the Team & Enterprise plans) come into play.

We created:

  • A core-team resource group associated with all our private repos (models, datasets, Spaces), granting access to the Finegrain team.
  • A testers resource group for the new Eraser Quality Space, where we invited selected external testers alongside the Finegrain team.

Resource Groups

Special shout-out to the Hugging Face support team for their reactivity, and for sharing this extra tip to enforce security:

Pro-tip: For extra security, if you don’t want anyone to create repositories in the default “Everyone” scope, you can set their default role at the org level to read only. Then, grant them write, contributor, or admin access inside the specific resource group you want them to use.

Wrap-Up 🔗

tl;dr: Hugging Face provides everything you need to set up a pro-grade feedback loop. It’s quick to build, fun to use, and powerful enough to make quality evaluation part of your regular workflow.

And here’s a sneak peek at how our app looks:

Preview

Hope this gives you ideas for your own setup — and if you end up building something similar, share it with us 🤗!