Extending video duration

Midjourney’s video generation introduces a fresh way of creating animated content—one that’s fundamentally different from timeline-based video editing tools. Instead of manipulating frames, timelines, or keyframes, each video in Midjourney is built sequentially from a single prompt. Once the animation is generated, it cannot be revised or edited at a specific timestamp. You can only extend it forward—one block at a time.
Because of this structure, designing the flow of a video in Midjourney requires planning ahead. You’re not editing a timeline; you’re constructing a visual sequence step by step. This section will show how to extend your video using the two available methods—automatic and manual—while maintaining control over pacing, tone, and visual coherence.
Manual and automatic extension options
Midjourney supports two official methods to extend a video: Extend Auto and Extend Manual. Both options appear after the initial 5-second video is generated and can be used up to four times, bringing the maximum total video duration to 21 seconds.
Extend Auto
This option adds another 4 seconds to your video using the original prompt, without requiring any user input. The system continues the animation using the same seed, motion path, and stylistic settings. This is useful for scenes where continuity is more important than variation—such as ambient loops or smooth camera glides.
Extend Manual
This option also adds 4 seconds, but allows you to edit the prompt before generating the next segment. It gives you more control to gradually evolve the scene: introduce new actions, adjust the camera angle, shift lighting, or build narrative progression. The underlying style and motion settings remain consistent unless explicitly changed.
Regardless of which option is used, the final duration of a single video cannot exceed 21 seconds. Once the fourth extension is added, the system disables further continuation for that video.
Technique: Writing prompt sequences with intent
When using Extend Manual, each new prompt becomes the next chapter in a short visual story. Unlike traditional editing platforms, Midjourney doesn’t allow for real-time adjustments or timeline editing. Once a segment is generated, it cannot be revised—only extended forward. That makes prompt sequencing not just important—but fundamental.
A strong prompt sequence works like a well-composed shot list: each prompt should set up motion, transition smoothly from the previous segment, and lead into the next. Below are five key strategies to help you write smarter, more deliberate prompt chains.
1. Design the full arc before you begin
Midjourney’s video tool encourages linear, irreversible progression. You cannot go back and rework the middle of a clip. For that reason, plan your entire sequence before writing a single prompt.
Ask yourself:
- What is the subject doing across the full video?
- How will the environment evolve?
- Where is the camera going—staying still, tracking, panning, zooming?
Sketch out the emotional or narrative arc in rough beats:
Start → Motion development → Peak moment → Resolution or pause
This structure helps each prompt feel purposeful rather than improvised.
2. Use prompts to suggest transitions, not resets
Each new prompt should pick up from where the last left off—not restart the scene. To do this, use linking phrases that signal progression:
- “the character now begins to...”
- “camera slowly shifts as...”
- “fog thickens while the subject turns...”
- “sunlight starts to pierce through...”
These signals tell the model: continue from the previous moment, don’t start a new one. They reduce visual drift and improve frame-to-frame alignment.
3. Control complexity: introduce one visual shift at a time
Every time you change the scene—adding motion, shifting light, altering the background—you increase the chance of breaking visual continuity. Instead of overwhelming the model with multiple changes, introduce complexity gradually.
For example:
- Prompt 1: “camera push-in, light wind, character still”
- Prompt 2: “character raises hand, camera tracks slightly left”
- Prompt 3: “background begins to glow, wind increases”
This sequencing mirrors real cinematography, where motion builds over time. It also helps the AI hold onto important visual anchors.
4. Use stillness as a bridge between segments
Because each prompt generates a four-second block, how you end a segment matters. If the previous clip ends in rapid movement, the next segment may struggle to find a coherent starting point.
To improve transitions:
- End each segment in a natural pause—a look, a breath, a flicker of light
- Begin the next segment with a subtle continuation—like “the wind carries a few leaves as the camera lingers”
Stillness provides visual stability. It creates clear hand-off points between segments that help the AI preserve subject structure and scene integrity.
This prompt-writing approach helps transform Midjourney from a generator of isolated moments into a platform for coherent visual storytelling. By treating each 4-second extension as part of a longer, intentional arc—and by carefully shaping the motion from one prompt to the next—you build animations that feel deliberate, composed, and emotionally consistent.
Case study 1: Extending a quiet cinematic moment
This example continues from a previously generated 5-second clip featuring a kitten standing alone in a rainy street.

We used the following prompt to generate the first video.
adorable animated kitten on a rainy city street at night, wearing a hoodie, soft camera push-in, gentle raindrops falling, warm bokeh lights, eyes glistening with curiosity --motion low --raw
Once the video is generated, Midjourney presents four variations, similar to how it handles still image generation. To select the best candidate for extension, use the Scrub feature:
- On Mac, hold Command and move your mouse left or right to preview specific frames.
- On Windows, use Ctrl instead of Command.

After choosing the version you’d like to continue, right-click it and select “Manual: Low Motion”. Then enter your next prompt:

Prompt 2:
same kitten slowly begins to walk forward, reflections shimmer beneath its paws, distant neon sign flickers, camera lowers slightly to follow the motion --motion low --raw
This second prompt introduces gentle forward motion while preserving the atmosphere, lighting, and composition of the original scene.
You can view the generated video here.
Testing Style and Character Reference Impact
Case study 2: Building a narrative arc from a still image
This case study demonstrates how to build a complete cinematic arc from a single still image—using sequential prompts to shape light, mood, and motion. The video begins with quiet reflection and evolves into a high-energy turning point before settling into a peaceful resolution.
Image for Video Generation:

Scene 1: Stillness before the shift
Start by uploading the image into the Starting Frame section of Midjourney’s video UI.
Prompt:
two children stand silently at the water’s edge, reflected perfectly in the still tide pool, soft blue sky and gentle clouds above, sunlight filtering calmly through the surrounding canopy, camera slowly pushes in from behind --motion low --raw
This first prompt sets a serene tone. The result should feel calm, ambient, and observational, with soft environmental movement.
Scene 2: The light begins to shift
From the four generated versions, select the one that visually connects well with your vision for the next scene.

Prompt
a warm golden light begins to cast from the left, clouds start to deepen with amber and rose tones, one child steps forward and causes ripples across the mirrored surface, camera tilts slightly upward to catch the shifting sky --motion low --raw
This segment introduces subtle changes—motion through the child’s movement, and atmosphere through evolving light.
Scene 3: A sudden gust and dramatic buildup
As the energy builds, the AI may introduce unexpected visual ideas. Be ready to adjust the prompt to preserve continuity from Scene 2 while adding tension.

Prompt
the wind rises suddenly, trees sway, clouds gather in rich violet and gold, the water darkens slightly, both children look up sharply as a low hum fills the air, camera circles quickly to face them from the front --motion high --raw
This is the emotional and visual peak of the sequence. It should feel dramatic and cinematic.
Scene 4: Movement toward the unknown
This is a natural point for action and expansion. Adjust the prompt based on the outcome of Scene 3 to ensure smooth progression.

Prompt
the children begin walking into the sea, waves rising around their knees, sky bursting with deep orange and electric pink streaks, water flickering with light, camera tracks with them just above the surface --motion high --raw
Here, movement, color, and composition all intensify—pulling the viewer deeper into the scene’s emotional arc.
Scene 5: Return to calm, transformed setting
This final scene should provide closure while echoing the opening’s stillness—now viewed through the lens of transformation.

Prompt:
camera lifts upward into a wide aerial view, the children now distant figures wading into silvery mist, sky fades to pale lavender with scattered stars emerging, reflections shimmer faintly across the dark water --motion low --raw
This last moment leaves the viewer with a sense of calm, reflection, and scale as the day transitions to twilight.
You can view the generated video here.
Photoreal Boost for Midjourney Video Scenes
To turn your animation into a complete visual short, consider adding background music.
Tools like Suno can generate a cinematic ambient track to match your video’s arc. A well-timed cue—soft piano during stillness, swelling synths at the dive, and ambient fade at the end—can dramatically enhance emotional resonance.
These case studies highlight how Midjourney’s video extension tools—especially Extend Manual—can be used not just to lengthen a clip, but to craft complete visual sequences. Whether you're adding subtle movement to a quiet moment or orchestrating a dramatic shift in mood, success lies in how thoughtfully you shape your prompts.
With consistent structure, mindful pacing, and a narrative through line, even short-form AI-generated videos can feel cinematic, intentional, and emotionally rich.