Catalog · 2026-04-22 · 6 min read

The SKU-level report that tells you what to re-shoot.

Visual search is only as good as the photos it's searching against. A catalog full of phone-snapped shelves, yellow warehouse lighting, and 480-pixel thumbnails will make even a state-of-the-art model look bad. The photo-quality audit is the report that ends the "is it the model or the data?" debate.

What the audit measures

Every indexed image runs through a four-signal quality check. None of these signals is new — photo-QA tooling has used them for decades — but most visual-search systems throw them away. We keep them around per product so we can tell you exactly which ones need work.

Signal	What it measures	What a low score means
Resolution	Shorter side in pixels	Upscaled, compressed, or thumbnail-grade. CLIP has trouble with edges below ~512 px on the shorter side.
Blur	Laplacian variance of the greyscale image	Camera shake, autofocus miss, or aggressive JPEG re-saves.
Exposure	Pixel-value histogram: how close to the `0` or `255` ends it piles up	Harsh shadows clipping to black, or blown highlights clipping to white. Both destroy colour information.
Contrast	Standard deviation of luminance	Foggy, low-dynamic-range images where the product blends into the background.

Each signal scores 0–1; we combine them with a weighted mean into the product's quality_score (also 0–1), stored on every indexed product.

The audit API

Ask for the catalogue-wide report with the worst N products surfaced:

GET /v1/products/quality-audit?worst_n=20

The response has three sections:

{
  "total_products": 1284,
  "average_score": 0.88,
  "distribution": {
    "excellent": 841,   // score >= 0.85
    "good":      312,
    "fair":       97,
    "poor":       34    // < 0.55 — re-shoot candidates
  },
  "common_problems": [
    {"label": "low_resolution", "count": 58},
    {"label": "poor_exposure",  "count": 41},
    {"label": "blurry",         "count": 22}
  ],
  "worst": [
    {
      "product_id": "SKU-4821",
      "name": "Marla Tote",
      "quality_score": 0.31,
      "worst_component": "poor_exposure",
      "fix_hint": "Use diffused lighting — a softbox or window at 45°. Avoid direct overhead lights that cast harsh shadows."
    },
    ...
  ]
}

Fix hints are the point

A score on its own is useless to a catalog team. What they need is "what do I do differently next time?" The audit's fix_hint field is deliberately prescriptive — we map the worst-scoring component to a concrete action:

low_resolution → "Upload at least 800×800. Anything under 512 on the shorter side makes edges fuzzy to the model."
blurry → "Re-shoot with a tripod or burst mode. Enable Pro / Expert mode to lock shutter above 1/250s."
poor_exposure → "Use diffused lighting — a softbox or window at 45°. Avoid direct overhead lights that cast harsh shadows."
low_contrast → "Shoot against a plain, contrasting backdrop. Add +15% contrast in post if the product is pale."

These are the same suggestions a photo producer would make in a studio. The audit does the triage so your team spends time re-shooting the bottom 5%, not eyeballing the full catalog.

Detail

The worst_component is the sub-signal that pulled the overall score down the most — not just the lowest sub-score. A product with 0.3 resolution and 0.4 exposure still flags resolution as the actionable problem because the gap to the next-highest-signal is what the weighted mean actually penalises.

The portal view

The same data renders at /portal/quality as:

Four summary cards: average score, excellent count, poor count, products audited.
A tiered distribution bar (Excellent / Good / Fair / Poor) showing the catalog at a glance.
A "Common problems" chip list — if 40% of your flagged products are poor_exposure, that's a lighting rig problem, not 40 individual product problems.
A ranked card grid of the worst-N with a thumbnail, the worst component badge, and the fix hint.

Real-world numbers from a merchant we ran this against: 394 products, average 0.88 (good), 347 excellent, bottom 3 all flagged poor_exposure with concrete re-shoot guidance. Re-shooting those three products moved their similarity scores on direct-match queries by 12–18 percentage points.

Where to use the score

Beyond the audit report itself, the per-product quality_score is available in metadata._match_breakdown on every search result. Two patterns we've seen work:

Quality as a tie-breaker. When two products have nearly identical CLIP similarity, prefer the one with the better photo. This is a small, non-disruptive nudge — most visual-search systems already effectively do this because blurry photos embed poorly.
Suppress the truly unusable. A product with quality_score < 0.3 on the front image should probably be hidden from visual search results entirely — render it only in text / browse paths until someone re-shoots. A merchandising bury rule filtered on a quality threshold handles this.

The one measurement we deliberately left out

We don't score composition — whether the product is centred, cropped cleanly, shot from the most flattering angle. Those are subjective judgements that vary by category (fashion wants lifestyle shots; electronics want clinical flats), and we don't trust a model to stand in for an art director. Resolution, blur, exposure, and contrast are objective. Composition isn't, so we leave it to the human.

Back to the docs

That's the last post in this wave. The docs hub links back into the OpenAPI reference, the Postman collection, and the Python SDK. If there's a feature you want a deeper post on, email [email protected] — we'll prioritise by what comes up most.