Veo3 Prompt Guide: Write Better AI Video Prompts (30+ Templates Included)

Veo3 Prompt Guide: Write Better AI Video Prompts (30+ Templates Included)

ASMRVideos.io
14 min read

I spent three weeks deep in Reddit threads, Discord servers, and Google's own documentation trying to figure out what actually works with Veo3. Most guides out there repeat the same generic advice. This one doesn't.

  • Structure your prompts: Subject → Context → Action → Style → Camera → Audio
  • Dialogue breaks if it's longer than 8 seconds
  • Use colons for speech ("Character says: Hello") or you'll get burned-in subtitles
  • Camera instructions work better as separate sentences
  • 30+ ASMR prompts below—steal them

Why Veo3 is weird

Here's what nobody tells you upfront: Veo3 generates video and audio at the same time. That sounds minor. It's not.

With Midjourney or DALL-E, you're describing a frozen moment. With Veo3, you're writing a compressed film script. What happens first? What comes next? What sounds should the viewer hear? The model needs all of this.

What Veo3 does well:

  • Human movement that doesn't look uncanny
  • Audio that matches what's on screen
  • Dialogue (sort of—more on that later)
  • Camera work

Where it falls apart:

  • Anything over 8 seconds
  • Two people talking at once
  • Specific text or logos
  • Quick cuts between scenes

Knowing these limits upfront saves hours of frustration.

Six things every good prompt needs

You don't need all six in every prompt. But understanding what each does gives you actual control.

1. Subject

Who or what is in the frame. Vague descriptions produce vague results.

Weak: "A woman"

Better: "A woman in her late 30s with silver-streaked black hair, wearing a cream linen blouse, wire-rimmed glasses"

Here's something useful: if you use the exact same character description across multiple generations, Veo3 produces similar-looking people. Great for multi-clip projects.

2. Context

The environment. Lighting matters more than you'd think.

Weak: "In a room"

Better: "In a minimalist ceramic studio, late afternoon sun through floor-to-ceiling windows, dust visible in the light beams"

3. Action

What's happening. Veo3 can handle complex sequences if you choreograph them clearly.

Weak: "Making pottery"

Better: "Slowly centers a ball of grey clay on the wheel, wets her hands in a ceramic bowl, then begins shaping the clay upward into a tall vase"

That word "then" matters. It creates sequence. Veo3 processes left-to-right, so earlier words carry more weight than later ones.

4. Style

The visual treatment. Put this at the beginning of your prompt.

Styles that work:

  • Cinematic, photorealistic
  • 35mm film grain, vintage
  • Anime
  • Stop-motion claymation
  • Documentary footage
  • Selfie video

Example: "Cinematic, shot on ARRI Alexa, shallow depth of field"

5. Camera

This is where amateur prompts become professional ones.

I figured something out after dozens of failed attempts: write camera instructions as their own sentences. Burying them in action descriptions usually fails.

Doesn't work: "The camera follows the woman while she walks through the forest"

Works: "A woman walks through the forest. The camera tracks alongside her at eye level, moving slowly through the trees."

Camera vocabulary Veo3 understands:

  • Eye level, high angle, low angle, bird's eye, worm's eye
  • Tracking shot, dolly shot, steadicam
  • Pan left/right, tilt up/down
  • Zoom in/out (use sparingly)
  • Static shot, locked off
  • Close-up, extreme close-up, medium shot, wide shot

For ASMR specifically, extreme close-ups are everything. Macro shots of textures and small movements define the genre.

6. Audio

Veo3 generates audio with every video. Skip this part and the model guesses—usually wrong.

Format it like this:

Visual: [your visual description]

Audio: Soft whispered narration, subtle room tone, pencil scratching on paper, no music

Things to specify:

  • Dialogue (keep it under 8 seconds)
  • Ambient sounds
  • Sound effects timed to actions
  • Music style, or explicitly "no music"
  • What you don't want (prevents weird additions)

ASMR example:

Audio: Close-mic'd sounds—paper crinkling, soft breathing, finger taps on wood, whispered narration saying: "Let me show you how this works", no background music, quiet room

Dialogue is tricky

Getting characters to talk naturally took me longer to figure out than anything else.

Keep it under 8 seconds. Longer dialogue forces the character to speak unnaturally fast. If you need 15 seconds of speech, split it into two clips.

Use colons, not quotation marks.

Wrong: She says "Welcome to my channel"
Right: She says: Welcome to my channel

Quotation marks trigger burned-in subtitles. Colons don't. I have no idea why.

Add "(no subtitles)" anyway. Even with correct formatting, subtitles sometimes appear. Tacking "(no subtitles)" at the end helps, though not always.

Identify speakers. With multiple characters:

The woman in blue says: Nice to meet you
The man in grey responds: The pleasure is mine

Spell unusual names phonetically.

Wrong: She says: I'm Shridhar
Right: She says: I'm Shree-dar

30+ ASMR prompts you can copy

These all follow the structure above. Modify for your needs.

Glass cutting (the viral one)

Extreme macro close-up, cinematic lighting.

A pristine glass strawberry on smooth marble. A sharp silver knife slowly slices through, revealing the interior structure. The strawberry keeps its red color with crystalline transparency.

Camera: Static extreme close-up, shallow depth of field on the cutting edge.

Audio: Sharp cutting sounds, delicate glass clinks, subtle room ambiance, no music, no voice.

Whispering narration

Warm bedroom, soft golden hour light, cozy.

A young woman with long brown hair sits close to camera in an oversized sweater. Gentle eye contact, soft whisper.

She says: Hi there. I'm so glad you're here tonight. Let's take a deep breath together and just relax.

Camera: Medium close-up, eye level, static.

Audio: Close-mic'd whisper, soft breathing, quiet room tone, no background music.

Tapping and scratching

Minimalist desk, neutral background, soft diffused light.

Female hands with natural nails tap on a wooden jewelry box, then scratch the textured lid. Slow, deliberate movements.

Camera: Overhead close-up of hands and object, static.

Audio: Crisp tapping, scratching texture, binaural quality, no voice, no music.

Keyboard typing

Home office, desk lamp, evening.

Close view of hands typing on a mechanical keyboard with blue switches. Steady, satisfying rhythm.

Camera: Low angle close-up showing keys and fingers.

Audio: Mechanical key clicks, smooth clacking rhythm, subtle room ambiance, no voice, no music.

Sizzling steak

Professional kitchen, warm overhead light, visible steam.

Cast iron skillet on gas flame. Raw seasoned steak placed onto the hot surface.

Camera: Medium close-up, steam rising into frame, static.

Audio: Intense sizzling, meat crackling on hot metal, subtle kitchen ambiance, no voice, no music.

Rain on window

Cozy bedroom, overcast daylight through rain-covered window.

Rain droplets stream down the glass. Blurred garden beyond. Warm reading nook with cushions inside.

Camera: Close-up of rain on glass, rack focus to interior.

Audio: Steady rainfall, distant thunder, indoor room tone, no voice, no music.

Hair brushing

Soft-lit vanity, mirror visible, warm bedroom.

Woman with long dark hair sits facing away. Another pair of hands brushes with a wooden paddle brush. Slow rhythmic strokes.

Camera: Medium shot from behind, capturing brush movement.

Audio: Soft bristles through hair, gentle whispered conversation, cozy room ambiance.

Unboxing

Clean white desk, natural daylight, product photography look.

Hands remove a luxury box from tissue paper. The box opens slowly. Deliberate, unhurried movements.

Camera: Overhead bird's eye view, hands entering from bottom.

Audio: Paper crinkling, cardboard textures, soft handling sounds, no voice, no music.

Book pages

Library atmosphere, warm lamp, dark wood.

Close view of an old leather-bound book. Fingers slowly turn a yellowed page, revealing aged text.

Camera: Extreme close-up, page filling frame, shallow depth of field.

Audio: Paper turning, subtle page crackle, quiet library ambiance, no voice.

Kinetic sand

Black background, dramatic top lighting.

Purple kinetic sand cut with a thin wire. It holds shape briefly, then slowly collapses. A hand presses it flat and repeats.

Camera: Close-up, black background isolation.

Audio: Soft sand compression, satisfying cutting texture, no voice, no music.

Soap cutting

White surface, bright even lighting, minimal.

Large block of colored soap on a cutting board. Sharp knife slices thin layers. Each slice curls.

Camera: Close-up side angle, knife and soap interaction.

Audio: Clean cutting sounds, soap texture, no voice, no music.

Candle

Dark room, single candle lit, intimate.

Close view of a candle flame. A match strikes and lights a second candle. Wax drips slowly.

Camera: Macro close-up, flame flickering, dark background.

Audio: Match strike, flame crackle, wax dripping, soft breathing.

Makeup application

Vanity setup, ring light, beauty aesthetic.

Makeup artist applies foundation with a damp sponge on a model's face. Gentle patting, careful blending.

Camera: Close-up of face and hands.

Audio: Soft sponge patting, gentle breathing, whispered instruction, no music.

Writing

Desk, warm lamp, journaling mood.

Hands write in a notebook with a fountain pen. Ink flows onto cream paper. Beautiful handwriting emerges.

Camera: Overhead close-up, pen and paper in focus.

Audio: Pen scratching, ink flow sounds, no voice, no music.

Crystal tapping

Dark velvet background, spot lighting.

Crystals and gemstones on display. Fingernails tap gently on each, creating different tones.

Camera: Close-up tracking across crystals as they're tapped.

Audio: Crystalline tapping tones, no voice.

Slime

Bright pastel workspace, cheerful lighting.

Hands stretch a ball of glossy slime. Stretching, folding, poking.

Camera: Close-up of slime manipulation, hands in frame.

Audio: Sticky slime sounds, stretching textures, bubble pops, no voice, no music.

Haircut

Barbershop, vintage aesthetic, warm light.

Barber's hands with scissors trim around a client's ear. Careful movements. Hair falls.

Camera: Close-up on scissor work and ear.

Audio: Scissor snips, hair cutting textures, quiet barbershop, soft jazz in background.

Gift wrapping

Holiday atmosphere, wrapping station, festive lighting.

Hands fold paper around a gift box. Tape dispenser sounds. Ribbon cut and tied into a bow.

Camera: Overhead view of wrapping.

Audio: Paper crinkling, tape pulling, scissors cutting, no voice.

Coffee making

Morning kitchen, natural light, cozy.

Coffee beans pour into a manual grinder. Hands turn the handle slowly. Ground coffee transferred to pour-over filter.

Camera: Close-up following the sequence.

Audio: Beans rattling, grinding sounds, water pouring, coffee dripping, no voice.

Forest stream

Forest, dappled sunlight through leaves.

Woodland stream flows over smooth stones. Leaves rustle. A bird lands briefly on a branch.

Camera: Slow pan across the scene, documentary style.

Audio: Running water, birdsong, wind through leaves, no voice, no music.

Ear cleaning

Clinical but cozy, soft lighting.

Close view of ASMR ear cleaning tools—soft brushes, fluffy picks. Tools brush against a simulated ear.

Camera: Extreme close-up, binaural perspective.

Audio: Soft brushing, gentle scraping, whispered explanation, binaural recording.

Art supplies

Art studio, natural light.

Hands organize colored pencils by shade in a wooden box. One pencil selected and tested on textured paper.

Camera: Overhead close-up of hands and supplies.

Audio: Pencil clinking, wood textures, paper testing sounds, no voice, no music.

Fabric folding

Minimalist retail space, soft lighting.

Hands fold a cashmere sweater using a folding board. Precise movements.

Camera: Medium close-up, folding technique visible.

Audio: Soft fabric sounds, folding textures, no voice.

Ice cracking

Dark background, dramatic lighting.

Clear ice sphere on metal surface. Warm whiskey poured over it. Ice cracks with fracture patterns.

Camera: Close-up on ice sphere, side lighting.

Audio: Ice cracking, liquid pouring, glass settling, no voice.

Medical roleplay

Doctor's office, clinical lighting.

Person in white coat performs gentle examination—checking reflexes, looking at eyes with penlight.

Doctor says: Just relax and follow the light with your eyes.

Camera: Personal perspective, viewer is patient.

Audio: Soft spoken voice, medical equipment sounds, quiet clinical atmosphere.

Sleep story

Cozy nighttime setting, bedside lamp.

Narrator in a comfortable chair, holding a book, speaking softly to camera with a gentle smile.

She says: Close your eyes. Imagine you're walking through a quiet forest path. The air is cool and fresh.

Camera: Medium close-up, warm and inviting.

Audio: Soft whispered narration, gentle room tone, distant night sounds.

Mukbang

Clean table, food photography lighting.

Person sits before crispy fried chicken. They pick up a piece and bite.

Camera: Medium shot showing person and food.

Audio: Crispy biting sounds, chewing textures, crunching, no speaking.

Scalp massage

Spa setting, soft ambient light.

Hands perform gentle scalp massage on a client with closed eyes. Slow circular motions through hair.

Camera: Close-up on hands and scalp.

Audio: Soft hair sounds, gentle friction, relaxation ambiance.

Water play

Bathroom, soft lighting.

Hands under running water in a decorative sink bowl. Fingers play with the stream, creating splashes.

Camera: Close-up on hands and water.

Audio: Running water, splash variations, no voice, no music.

Trigger words

Simple background, soft lighting.

Young woman looks at camera, close to microphone, speaking trigger words slowly with mouth sounds between.

She says: Relax... tingles... sleepy... gentle... peaceful...

Camera: Close-up face shot, eye level.

Audio: Close-mic'd whisper, mouth sounds, binaural, intimate atmosphere.

Seven mistakes I see constantly

1. Prompts that are way too long

500-word prompts with every possible detail. The problem? Veo3 weights earlier words more heavily. The middle gets ignored. Keep it to 3-6 sentences.

2. Vague audio

"Natural sounds" tells the model nothing. "Sizzling pan, clinking utensils, running water, no background music" actually works.

3. Running the same broken prompt repeatedly

Veo3 produces nearly identical outputs from identical prompts. If it didn't work the first time, change something meaningful—camera angle, lighting, action sequence.

4. No camera direction at all

Beautiful scene description, zero camera guidance. Add it as its own sentence: "The camera remains static at eye level, close-up framing."

5. Writing for 30-second videos

Veo3 maxes out at 8 seconds. Design for short, self-contained moments. Chain clips together using consistent character descriptions.

6. Mixing visual styles

"Cinematic realistic with anime characters" confuses the model. Pick one style and declare it early.

7. Getting random audio you didn't ask for

Laugh tracks. News music. Random effects. The model fills gaps with guesses. Fix this by describing what you want AND what you don't: "Quiet room ambiance, no music, no audience sounds."

The faster option

Writing prompts from scratch works. But if you're specifically making ASMR content, there's a shortcut.

ASMRVideos.io has a Veo3 interface built for ASMR. It handles the formatting rules automatically—audio separation, the "(no subtitles)" workaround, presets for common triggers.

The Veo3 tool does text-to-video and image-to-video. The prompt library has examples ready to use.

Worth it if:

  • You're focused on ASMR specifically
  • You want template-based workflows
  • Prompt engineering isn't where you want to spend time

Not worth it if:

  • You need full creative control
  • Your content doesn't fit ASMR categories
  • You enjoy the prompting process itself

FAQ

How long should prompts be?

3-6 sentences for visuals, plus audio as a separate section. 100-200 words total. Longer doesn't mean better.

Why does my character look different each time?

Veo3 interprets descriptions probabilistically. Keep character descriptions identical—word for word—across prompts. Make a template and copy it exactly.

Can I get vertical video for TikTok?

No. Veo3 only outputs 16:9 horizontal. Convert to 9:16 using tools like Luma's Reframe Video after generation.

How do I get rid of subtitles?

Use colons ("She says: Hello") instead of quotation marks. Add "(no subtitles)" at the end. If they still appear, regenerate.

What resolution is the output?

1280×720 by default. Some people report audio problems at 1080p, so 720p is safer. Upscale with Topaz afterward if needed.

How do I make longer videos?

Generate 8-second clips separately with consistent character descriptions. Stitch them in editing software. Some users have figured out how to chain up to 148 seconds using the extension API, but that requires technical setup.

Why am I getting random sound effects?

The model fills audio gaps with guesses. Describe what you want and what you don't: "Quiet room tone, no music, no audience sounds, no background conversation."

Bottom line

Good Veo3 prompts come down to structure and specificity.

  1. Style first
  2. Detailed subject
  3. Clear context
  4. Action with sequence ("then")
  5. Camera as its own sentence
  6. Comprehensive audio

The 30+ templates above should get you started. Modify them, see what works, build your own library.

If you're doing ASMR specifically, ASMRVideos.io handles most of this for you.