Flux Motion

Deforum was the go-to animation tool for Stable Diffusion — powerful, flexible, but tightly coupled to A1111. When AnimateDiff arrived, the community shifted to ComfyUI's node-based workflows. Deforum's complexity became a barrier. This project brings Deforum's power to FLUX with an abstracted UI.

The Thesis

Deforum's motion techniques — camera transforms, prompt morphing, temporal feedback — are still powerful. The tooling just hasn't kept up. This project is three things: Research — adapting Deforum workflows to FLUX's 16/128-channel latent architecture. Product — an abstracted generator UI that makes FLUX animation accessible without mastering legacy tooling. Platform — deployed for public use with a generation library to view, compare, and iterate on outputs. Classic Deforum assumes 4-channel SD latents. FLUX.1 uses 16, FLUX.2/Klein uses 128-dimensional tokens. The maths doesn't transfer — this is where the research lives. The UI abstracts that complexity. The library enables systematic research, discovery, and outputs capture.

The Challenge

The challenge: FLUX.2's architecture is fundamentally different from the SD models Deforum was built on. Different latent space, different inference patterns, different architecture. Making it work means rebuilding core assumptions, not just swapping models.

Research Decisions

—FLUX-Native Stack: Animation pipeline built on FLUX's latent space — 16 channels (FLUX.1). FLUX.2/Klein: 32-channel VAE output, 2×2 patchified to 128 dimensions per token, each covering 16×16 pixels.
—Distilled Model Tradeoffs: Klein 4B is a 4-step distilled model — fast iteration but lacks the self-correction depth of 50-step models, requiring explicit anti-collapse techniques.
—Anti-Drift Corrections: LAB color coherence, pre-sharpening, blue noise dithering. Parameters tuned per-model rather than runtime adaptive.
—Recursive Collapse Problem: Distilled models rapidly amplify feedback errors without denoising steps to self-correct — frames drift to abstract forms by frame 30-40.
—Latent Reservoir: Periodic injection of fresh latent entropy prevents collapse while maintaining temporal coherence.
—Hybrid V2V Mode: Input video guides structure while AI adds style — balances motion preservation with creative generation. Tuned blend ratios prevent hallucination while maintaining temporal coherence.

Deployment

Deployed via Cloudflare Workers edge routing with multi-provider GPU backend (Freepik API, RunPod for custom pipelines). Automated init scripts handle Tailscale networking, model downloads, and GPU warmup. Remote development via Claude Code over Tailscale to GPU instances.

Conceptual Flow

Deforum-style feedback loop adapted for FLUX.2's rectified flow architecture. Classic noise injection replaced with edit-mode refinement — Klein is an editor, not traditional img2img. Pre-sharpening, LAB color matching, and blue noise dithering compensate for the different denoising behavior. Tested on FLUX.1 Dev, Klein 4B (distilled, 4-step), and Klein 9B base — the distilled model enabled fast iteration cycles.

Animation Pipeline

Input

Frame N-1

Encode

VAE → Latent

Transform

Channel Motion

Denoise

FLUX Sampling

Decode

Latent → Frame N

FLUX.1 Latent Space

16 Channels

VAE Output

z_channels = 16

Patchification

2×2 × 16 = 64 dims

Motion engine operates pre-patchify on raw 16 channels

FLUX.2 / Klein Token Space

128 Dimensions per Token

VAE Output

32 channels

Patchification

2×2 × 32 = 128

Token Grid

64×64 = 4,096 tokens

Coverage

16×16 px per token

128 learned dimensions — entangled semantics, not discrete channels

Adaptive Corrections

Burn • Blur • Flicker Detection

Output Frame

Colour-coherent • Anti-burn

Animation Research

This is ongoing research — multiple approaches to FLUX animation, none fully solved. Each reveals different tradeoffs in the latent space:

1Static Seed Generation

Consistent seed across frames maintains visual coherence. Same latent starting point produces stable aesthetics while allowing controlled variation.

2Diffusers Feedback Loop

Traditional img2img feedback using diffusers pipeline. Achieves smooth temporal transitions but accumulates noise over time as the model repeatedly processes its own output.

3Hybrid V2V Mode

Input video guides structure while AI adds style. Balances motion preservation with creative generation — tuned blend ratios prevent hallucination while maintaining temporal coherence.

4Scheduled Strength + Prompt Morphing

Keyframed strength ramp (0.15 → 0.4 → 0.25) with mid-sequence prompt transition. First half stays subtle, jumps to strong style at 50%, eases back in final quarter. Enables controlled aesthetic shifts without hard cuts.

System Architecture

Animation	FLUX-native motion pipeline	Python, PyTorch, diffusers
Edge	Request routing, API gateway	Cloudflare Workers
Frontend	Generation UI, gallery	Next.js, React
Fast Inference	Standard FLUX models	Freepik API
Custom Pipelines	Deforum, LTX, ControlNets	RunPod Serverless
Storage	Asset persistence, CDN	Cloudflare R2

How It Fits Together

1.Motion-aware animation engine that operates in FLUX's latent space
2.Deployment platform that makes generation provider-agnostic
3.Edge layer handles routing, failover, and storage automatically
4.Research and production in the same system — new models get benchmarked here

Research and production in the same system.

Links

Demo↗Koshi-Flux↗Koshi-Nodes↗

Stack

Python • PyTorch • diffusers • ComfyUI • Next.js 15 • Cloudflare Workers • RunPod • Freepik • Tailscale

Flux Motion

The Thesis

Research Decisions

—FLUX-Native Stack: Animation pipeline built on FLUX's latent space — 16 channels (FLUX.1). FLUX.2/Klein: 32-channel VAE output, 2×2 patchified to 128 dimensions per token, each covering 16×16 pixels.

—Distilled Model Tradeoffs: Klein 4B is a 4-step distilled model — fast iteration but lacks the self-correction depth of 50-step models, requiring explicit anti-collapse techniques.

—Anti-Drift Corrections: LAB color coherence, pre-sharpening, blue noise dithering. Parameters tuned per-model rather than runtime adaptive.

—Recursive Collapse Problem: Distilled models rapidly amplify feedback errors without denoising steps to self-correct — frames drift to abstract forms by frame 30-40.

—Latent Reservoir: Periodic injection of fresh latent entropy prevents collapse while maintaining temporal coherence.

—Hybrid V2V Mode: Input video guides structure while AI adds style — balances motion preservation with creative generation. Tuned blend ratios prevent hallucination while maintaining temporal coherence.

Conceptual Flow

Animation Pipeline

Input

Frame N-1

Encode

VAE → Latent

Transform

Channel Motion

Denoise

FLUX Sampling

Decode

Latent → Frame N

FLUX.1 Latent Space

16 Channels

VAE Output

z_channels = 16

Patchification

2×2 × 16 = 64 dims

Motion engine operates pre-patchify on raw 16 channels

FLUX.2 / Klein Token Space

128 Dimensions per Token

VAE Output

32 channels

Patchification

2×2 × 32 = 128

Token Grid

64×64 = 4,096 tokens

Coverage

16×16 px per token

128 learned dimensions — entangled semantics, not discrete channels

Adaptive Corrections

Burn • Blur • Flicker Detection

Output Frame

Colour-coherent • Anti-burn

Animation Research

This is ongoing research — multiple approaches to FLUX animation, none fully solved. Each reveals different tradeoffs in the latent space:

1Static Seed Generation

Consistent seed across frames maintains visual coherence. Same latent starting point produces stable aesthetics while allowing controlled variation.

2Diffusers Feedback Loop

Traditional img2img feedback using diffusers pipeline. Achieves smooth temporal transitions but accumulates noise over time as the model repeatedly processes its own output.

3Hybrid V2V Mode

Input video guides structure while AI adds style. Balances motion preservation with creative generation — tuned blend ratios prevent hallucination while maintaining temporal coherence.

4Scheduled Strength + Prompt Morphing

System Architecture

Animation	FLUX-native motion pipeline	Python, PyTorch, diffusers
Edge	Request routing, API gateway	Cloudflare Workers
Frontend	Generation UI, gallery	Next.js, React
Fast Inference	Standard FLUX models	Freepik API
Custom Pipelines	Deforum, LTX, ControlNets	RunPod Serverless
Storage	Asset persistence, CDN	Cloudflare R2

How It Fits Together

1.Motion-aware animation engine that operates in FLUX's latent space

2.Deployment platform that makes generation provider-agnostic

3.Edge layer handles routing, failover, and storage automatically

4.Research and production in the same system — new models get benchmarked here

Research and production in the same system.