Most advice about ecommerce video is opinion dressed up as fact. "Keep it under 15 seconds." "Go cinematic." "Always add captions." "Pay the influencer." None of it is tested at scale, on real catalogs, against a real engagement signal.
So we tested it. Whatmore analyzed 595 ecommerce videos across 10 product categories and 10 brands, tagged each one on eight content dimensions, and measured how every choice tracked against engagement. This report is the follow-up to our 2026 shoppable video benchmarks — that report told you where the bar sits; this one tells you which choices move it.
The headline is simple: a few "rules" survive contact with the data, several popular ones do not, and the most important answers — polished or raw, who stands in front of the camera — flip depending on what you sell. Here is the anatomy of a video that works.
How we ran the study
We started with 631 videos from 10 ecommerce brands and dropped 36 for insufficient reach — too few plays to read a reliable engagement signal — leaving 595 videos in the final sample. A human-and-model tagging pass labelled every video on eight dimensions:
- Production style — professional, casual-UGC, aesthetic-edited, cinematic.
- Subject — who is on camera (founder, influencer, model, multiple people, hands-only, no person).
- Setting — home, outdoor, store, studio, mixed.
- Content type — the format category of the video.
- Action — what happens on screen (unboxing, reviewing, explaining, demonstrating, styling, trying-on, applying).
- Duration feel — quick-clip, short, medium, long. We tag the feel rather than raw seconds, because a 20-second clip can feel long and a 40-second one can feel short.
- Text overlays — present or absent.
- Voiceover — spoken voice or on-camera speech present, or silent.
The engagement metric
For each video we calculated an engagement rate: (likes + comments + shares) ÷ plays. Then we normalized every video against its own brand's median. So a score of 1.00 means "exactly the median video for that brand", 1.20 means "20% above that brand's median", and so on. This strips out account-size bias. Across the whole sample, the median raw engagement rate was 1.53%.
What this does and does not measure
This study measures Instagram organic engagement, not sales, revenue or conversion. A video that earns likes, comments and shares is a video people responded to — that is genuinely useful signal, and it is a strong shortlist for what to test on a product page. But it is not proof of purchase. Treat every finding here as a hypothesis to test against your own conversion data, not a revenue promise.
Sample sizes — where to trust the data
Categories were not evenly sampled. Read the well-sampled ones with confidence; treat the thin ones as directional at best.
| Category | Videos | Confidence |
|---|---|---|
| Jewellery | 100 | Well-sampled |
| Home decor | 99 | Well-sampled |
| Accessories | 96 | Well-sampled |
| Fitness | 96 | Well-sampled |
| Shapewear | 80 | Well-sampled |
| Skincare | 59 | Solid |
| Apparel | 32 | Usable, smaller |
| Hair | 18 | Thin — directional only |
| Food | 15 | Thin — directional only |
| Petcare | 3 | Excluded |
Petcare (3 videos) was excluded from all analysis. Hair and food are reported only in passing.
A handful of cuts below have small n — long-duration videos (n=9), unboxing actions (n=8), cinematic production (n=12). We will flag those as we go and we will not build a rule on them. The strongest claims in this report rest on hundreds of videos.
Part 1 — The one rule that never breaks: talk
Most of the findings in this report come with a "but it depends." This one does not. Across 313 videos with voice and 271 silent ones, the spoken videos out-engaged the silent ones — a roughly 21% relative gap. What makes it the spine of the report is what happened when we cut it by category.
We checked voiceover separately inside every category with enough videos to test it. Voice won every time. Not "usually." Every time.
Voiceover wins in every category tested
Brand-relative engagement (1.00 = brand median). Blue = with voice, grey = silent.
Voiceover (with vs. without) by category. The gap is largest in apparel and shapewear; even where it is narrowest — jewellery — voice still wins.
The size of the gap varies. In jewellery the spread is modest — 1.11 with voice against 0.97 without — because a lot of jewellery content works as quiet, aesthetic imagery. But in apparel (1.30 vs 0.52) and shapewear (1.01 vs 0.07) the silent video nearly falls off a cliff. These are categories where buyers have real questions — fit, fabric, feel, sizing — and a video that answers nothing out loud leaves them unanswered.
The shapewear figure of 0.07 for silent video deserves a caveat: it is an extreme value pulled from a sub-slice of an 80-video category, so do not read it as "silent shapewear video gets 7% of median forever." Read the direction, which is unambiguous: in shapewear, a silent video is close to dead weight.
"Voice" here is deliberately broad. It does not have to be a scripted voiceover in a studio. A founder talking to camera counts. A creator narrating as they unbox counts. A quick spoken "okay, here's why I actually like this" over B-roll counts. The mechanism is not production value — it is that a human voice gives the viewer a reason to stay, a point being made, a person to react to. If your videos are mostly silent, that is the first thing to fix.
Example video
What "add a voice" looks like. Underneat, "Brief Bodysuit" — a casual, influencer-led video carried by a real voiceover. 815K plays · 3.49× this brand's median engagement. Watch on Instagram ↗
Part 2 — Four myths the data kills
Myth 1 — "Shorter is always better"
The reflex in social video is to cut everything to the bone. Our data says the bone is too short. We tagged duration by feel — quick-clip, short, medium, long — and the quick-clip was the worst-performing format in the study.
Duration feel vs. engagement
Brand-relative engagement (1.00 = brand median). Sample size in parentheses.
Quick-clips (0.76x) were the weakest duration in the sample. Long videos top the chart but rest on only 9 videos — directional, not a rule.
Quick-clip came in at 0.76x against a short-video baseline of 1.00x and medium video at 1.17x. The lesson is not "make long videos." It is: give the video enough room to actually say something. Short is fine. Too-short is not.
Myth 2 — "Go cinematic"
"Make it cinematic" is the advice that sells camera rigs. The data says it sells the wrong thing. Across four production styles, cinematic finished last at 0.64x — worse than every other style and worse than the brand median by a wide margin.
Production style vs. engagement
Brand-relative engagement (1.00 = brand median).
Professional production leads at 1.14x; cinematic trails badly at 0.64x (n=12 — small, but the direction is striking).
The winner was professional at 1.14x — clean, well-lit, competent, but not a film. Casual-UGC (1.00x) and aesthetic-edited (0.98x) sit right at the median. The pattern: viewers reward video that looks credible and clear, and they quietly tune out video that looks like it is performing for an award. Aim for clean and professional. Skip the colour grade.
Myth 3 — "Always add text overlays"
Roughly 80% of the videos in our sample used text overlays — 470 with, 114 without. It is close to a default. And it did almost nothing: videos with overlays scored 1.00x and videos without scored 0.99x. That is not a small effect — it is no effect.
| Text overlays | Videos | Engagement (brand-relative) |
|---|---|---|
| With overlays | 470 | 1.00x |
| Without overlays | 114 | 0.99x |
A near-universal practice with no measurable engagement lift.
This is not an argument against captions. Overlays help accessibility, they help silent autoplay in feeds, and they cost almost nothing to add — all good reasons to keep them. The point is narrower: text overlays are not an engagement lever. If a video is under-performing, adding more on-screen text will not save it. Spend the effort on a clearer point and a voice.
Myth 4 — "Pay the influencer"
The default solution to "we need video" is "hire a creator." Useful — influencer videos engaged at 1.04x, slightly above median. But they were not the top of the table. The founder on camera was, at 1.23x.
| Who's on camera | Videos | Engagement (brand-relative) |
|---|---|---|
| Founder | 29 | 1.23x |
| Influencer | 127 | 1.04x |
| Model | 156 | 1.02x |
| No person | 35 | 0.97x |
| Multiple people | 147 | 0.97x |
| Hands-only | 87 | 0.93x |
Founder leads, but on a small sample (n=29).
Honesty first: the founder figure rests on 29 videos. That is enough to be interesting, not enough to be a law. But the direction is plausible and worth acting on. A founder brings something a hired creator cannot fake: genuine conviction, product knowledge, and a face the audience associates with the brand itself. It is also the cheapest video you can make — no booking, no brief, no licensing. If your founder will get on camera, that is not a fallback. On this data it is a front-runner. Notice too that hands-only (0.93x) sits at the bottom — the faceless product-in-hands clip is comfortable to make and weak to watch.
Example video
The founder, on camera. Artociti's founder explains a 3D relief wall mural himself — the product knowledge and conviction a hired creator can't fake. 34.5K plays · 3.13× brand-median engagement. Watch on Instagram ↗
Part 3 — The category playbooks
This is where averages stop being useful. The whole-sample numbers in Parts 1 and 2 are real, but they hide the most actionable finding in the study: your category has its own physics. A choice that wins for a jewellery brand can lose for a fitness brand.
Jewellery 100 videos
Jewellery rewards craft over personality. Aesthetic-edited production is the right style — close-ups, light, sparkle, considered cuts. The best subject is the influencer (1.16x), who lends aspiration and a sense of how the piece is worn. Keep it short, and lead with an explaining action (1.10x) — the story of the piece, the materials, the styling logic. And per Part 1, give it a voice: even in this most-visual category, voiceover still won (1.11x vs 0.97x).
Home decor 99 videos
Home decor is the founder's category. Founder on camera is the single strongest subject result in the entire study at 1.36x. Pair it with professional production (1.18x), a medium duration (1.38x) — decor needs room to show scale, context and a room coming together — and an explaining action (1.20x). The picture is coherent: a credible founder, taking enough time, walking you through the why. Resist the urge to cut it short.
Example video
The home-decor playbook in one video. Artociti — professional production, the founder on camera, explaining, with room to show scale. 36.9K plays · 1.63× brand-median engagement. Watch on Instagram ↗
Accessories 96 videos
Professional production, a model on camera, and a demonstrating action (1.18x). Accessories live or die on use-in-context — how the bag opens, how the strap sits, how the piece functions in a real outfit. Show it working on a person, cleanly shot. Less story than jewellery, more function.
Example video
The accessories playbook in one video. Laglits — clean professional production, shown on a model, in use. 212K plays · 1.52× brand-median engagement. Watch on Instagram ↗
Fitness 96 videos
Fitness flips back to raw. Casual-UGC production wins — gym-real beats studio-staged. The best subject is the influencer (1.18x), and the action is demonstrating — show the product in motion, in use. Keep it short, and heed the warning: quick-clips collapse to 0.45x in fitness. A fitness clip that is too fast to show a real rep or a real movement has nothing to offer.
Example video
The fitness playbook in one video. Zumba Wear — casual UGC, an influencer, demonstrating the product in motion. 13.5K plays · 1.75× brand-median engagement. Watch on Instagram ↗
Shapewear 80 videos
Shapewear is the clearest "polish is wrong" case in the dataset. Casual-UGC wins; aesthetic-edited collapses to 0.67x. Over-produced shapewear video reads as untrustworthy — buyers want to see the real thing on a real body. The best subject is the influencer, the best duration is medium (1.29x), and the highest-performing action is reviewing (1.27x) — honest, talked-through, first-person. And recall: silent shapewear video nearly flatlines. Voice is non-negotiable here.
Example video
The shapewear playbook in one video. Underneat — casual UGC, an influencer, on a real body, voiced throughout. 493K plays · 2.45× brand-median engagement. Watch on Instagram ↗
Skincare 59 videos
Skincare is the exception to the "put a person on camera" instinct. The best subject is no-person (1.10x) — texture shots, the product itself, the swatch, the absorb. Use aesthetic-edited production and an applying action (1.31x): the satisfying close-up of product meeting skin. Skincare is also the one category where ultra-short clips are fine — a quick, beautiful texture moment works here even though it fails almost everywhere else.
Apparel 32 videos
Smaller sample, but a clear shape. Professional production (1.22x), models or multiple people on camera, and a trying-on action (1.36x) — apparel buyers want movement, drape and fit. Keep it short, and note the now-familiar warning: quick-clips collapse to 0.36x in apparel. A clothing clip too fast to show how a garment moves answers none of the buyer's questions.
Category playbooks at a glance
| Category | Production | On camera | Duration | Best action |
|---|---|---|---|---|
| Jewellery | Aesthetic-edited | Influencer (1.16x) | Short | Explaining (1.10x) |
| Home decor | Professional (1.18x) | Founder (1.36x) | Medium (1.38x) | Explaining (1.20x) |
| Accessories | Professional | Model | Short | Demonstrating (1.18x) |
| Fitness | Casual-UGC | Influencer (1.18x) | Short (quick-clip 0.45x) | Demonstrating |
| Shapewear | Casual-UGC (edited 0.67x) | Influencer | Medium (1.29x) | Reviewing (1.27x) |
| Skincare | Aesthetic-edited | No-person (1.10x) | Short / ultra-short OK | Applying (1.31x) |
| Apparel | Professional (1.22x) | Models / multiple | Short (quick-clip 0.36x) | Trying-on (1.36x) |
Look down the "Production" and "On camera" columns and the headline of this section becomes obvious. Polish wins for apparel and accessories; raw wins for fitness and shapewear. The founder wins for home decor; no person wins for skincare; an influencer wins for jewellery. There is no master answer — only your answer. Anyone selling you a single universal video formula is, by this data, selling you the average. The average is not your category.
For the actions that aren't broken out by category — across the whole sample, unboxing (1.31x, n=8) and reviewing (1.13x) topped the action list, while trying-on (0.93x) and styling (0.96x) trailed on average. Setting mattered least of all: home and outdoor tied at 1.07x, store at 1.02x, studio at 0.98x, and only mixed settings (0.89x) clearly under-performed — a busy, location-hopping video reads as unfocused. Pick a place and stay there.
The anatomy of a video that works — the checklist
- Give it a voice. Voiceover or on-camera speech. It won in every category we tested (1.09x vs 0.90x overall). This is the one rule that never broke.
- Give it room to make a point. Short is fine; quick-clip is the worst format in the study (0.76x). Don't cut below the length where the point can land.
- Make it clean, not cinematic. Professional, well-lit, credible. Cinematic was the lowest-engagement style (0.64x).
- Don't rely on text overlays to do the work. Keep them for accessibility, but they moved engagement by nothing (1.00x vs 0.99x).
- Cast and produce for your category. Polished for apparel and accessories; raw for fitness and shapewear. Founder for home decor; no-person for skincare; influencer for jewellery and fitness. The founder out-engaged the influencer overall (1.23x vs 1.04x) — if yours will go on camera, test it first.
- Pick a setting and stay in it. Home and outdoor tied highest (1.07x); only mixed settings clearly under-performed (0.89x).
- Lead with the action your category rewards. Explaining for jewellery and home decor, demonstrating for accessories and fitness, reviewing for shapewear, applying for skincare, trying-on for apparel.
One last reminder: every number here is an engagement number, drawn from Instagram organic performance — not sales, not conversion, not revenue. Use this report to build a smarter shortlist of videos to test. Then watch what those videos do on your own product pages and in your own checkout data. Engagement tells you a video earned attention. Only your store can tell you it earned a sale. If you are turning these videos into shoppable experiences, our ultimate guide to shoppable video and the shoppable video platform are where the engagement-to-sales question gets answered.
Ready to turn your best videos into shoppable experiences?
Join leading D2C brands using Whatmore to convert Instagram engagement into sales — no credit card required.
Book a free demo →