Brainstorm

Background Element Extraction

SS-RP-2026-009visual designer3d renderingintake

Created 2026-03-25

Plan

Report

Spec

Disclosure

Brief

Background Element Extraction

ID: SS-RP-2026-009 Product: Visual Designer Capability: 3D Rendering Status: Intake Created: 2026-03-25

Intended Outcome

Any element visible in a background photo — whether it was there originally or was added by AI generation — can be promoted into a discrete, movable, z-orderable design element.

Today, when a professional uses AI to generate a pergola, arbor, fountain, or any structural element, it gets baked into the background image. That means it's permanently behind every plant, furniture item, and design object on the canvas. There's no way to move a plant behind the pergola, because the pergola doesn't exist as an independent object — it's just pixels in the background.

This is the single biggest limitation of AI-generated content in the Visual Designer. The professional can generate beautiful hardscape, but the moment they need something to sit behind it, they're stuck.

With Background Element Extraction, the professional selects an area of the canvas, says "extract this," and the element lifts out of the background into a real design object that participates in z-ordering like everything else. The background heals behind it. The extracted element can be moved, scaled, rotated, layered in front of or behind other objects, and toggled on/off.

This also unlocks a second use case: extracting elements from the original property photo. A mature tree, an existing stone wall, a decorative fence — anything the homeowner wants to keep can be extracted and repositioned within the new design, or kept in place but given proper z-ordering relative to new elements.

Why This Is Urgent

AI image generation is already shipping in Visual Designer via Gemini inpainting. Every generated hardscape element hits this z-ordering wall. The more powerful generation becomes, the more frustrating this limitation gets. Extraction unblocks the full value of every AI feature — current and planned.

User Flow

Professional has a background photo with an element they want to extract (AI-generated pergola, existing tree, etc.)
They select the area containing the element (lasso, rectangle, or brush selection)
They trigger "Extract Element"
The system:
- Generates a version of the background with the element removed (AI inpainting)
- Computes the difference between original and cleaned backgrounds to isolate the element
- Cleans the extracted element (removes background bleed, artifacts, edge noise)
- Strips the solid-color background to produce a transparent cutout (since Gemini doesn't support alpha channels)
- Replaces the background image with the cleaned version
- Places the extracted element as a new design object at the same position, now z-orderable
The element now behaves like any other design object — movable, scalable, rotatable, layerable

Pipeline

The extraction pipeline has up to four steps. Research will determine which are necessary and whether steps can be combined.

Step 1: Remove from Background (AI)

AI generates a new version of the background image with the selected element removed. The area is filled with contextually appropriate content (sky continuation, wall surface, ground, etc.) — the same inpainting capability already shipping for hardscape design.

Input: Background image + selection mask Output: Cleaned background image (element removed, area healed)

Step 2: Deterministic Diff

Pixel-by-pixel comparison of the original and cleaned backgrounds produces a raw delta — the pixels that changed. This isolates the element but will include artifacts: minor color shifts in surrounding areas, anti-aliasing bleed, compression noise, lighting adjustments the AI made to surrounding pixels.

Input: Original background + cleaned background Output: Raw delta image (element + artifacts)

Key challenge: The diff must handle minor distortions — AI doesn't just remove the element, it subtly adjusts surrounding pixels for coherence. A naive diff will pick up these adjustments as noise. The algorithm needs a threshold/tolerance to distinguish "this is part of the element" from "this is a minor surrounding adjustment."

Possible approaches:

Threshold-based: Ignore pixel differences below a certain magnitude
Mask-guided: Use the original selection mask to constrain the diff area, with a small feather/bleed margin
Connected-component: Only keep diff regions that are connected to the selection area

Step 3: AI Cleanup (Optional)

If the deterministic diff produces artifacts — halos, edge bleed, partial background remnants — a secondary AI pass cleans the extracted element. This could be:

A Gemini call with the raw delta and a prompt to "clean this cutout, remove any background artifacts"
A specialized cleanup model tuned for cutout refinement

Input: Raw delta image + original selection mask Output: Clean element image on solid background

Step 4: Background Removal (Deterministic)

Since Gemini currently outputs images on a solid background (no alpha channel support), the final step strips that solid color to produce a true transparent cutout. This is a solved problem — deterministic color-keying or flood-fill from edges.

Input: Clean element on solid background Output: Element with transparent background (PNG with alpha)

Pipeline Optimization

These steps may collapse depending on what works:

Steps 2+3 merge: If AI cleanup is always needed, skip the deterministic diff entirely and ask AI to "generate just this element as a cutout on a solid background" directly.
Steps 3+4 merge: If the AI cleanup step outputs on a consistent solid color, background removal becomes trivial.
Steps 1+2+3 combined: The ideal case — a single AI call that simultaneously removes the element from the background AND produces the extracted element as a separate output. This depends on whether Gemini (or a future model) can produce two coordinated images in one call.
Step 1 integrated with generation: When generating content (e.g., "add a pergola"), the system could automatically extract the generated element in the same workflow — producing both the updated background and the element as a design object in one operation.

Integration with AI Generation

The most powerful version of this feature integrates extraction directly into the AI generation step. When a user generates a pergola via inpainting:

AI generates the scene with the pergola (as it does today)
System automatically extracts the pergola as a design object
Background is stored without the pergola
Pergola is placed as a z-orderable element at the correct position

The user never sees the pergola "trapped" in the background. It's born as a proper design object. This transforms AI generation from "paint onto the background" to "create a new element in the design."

Success Metrics

| Metric | What It Measures | |--------|-----------------| | Extraction Fidelity | Does the extracted element look clean? No halos, no background bleed, no missing edges. | | Background Healing Quality | Does the background look natural after the element is removed? No obvious patches or smearing. | | Positional Accuracy | Is the extracted element placed exactly where it was in the background? No shift or misalignment. | | Edge Quality | Are element edges crisp and natural? No jagged outlines or hard color boundaries. | | Pipeline Latency | Total time from "extract" to usable element. Must feel responsive (target: under 10 seconds). | | Cost per Extraction | AI credits consumed. Must be economical for frequent use (target: 10-20 credits). | | Integration Seamlessness | When combined with generation, does the user even notice the extraction happened? |

Risks and Mitigations

| Risk | Impact | Mitigation | |------|--------|------------| | AI removal quality varies — complex backgrounds (foliage, patterns) are harder to heal | High | Allow user to retry with adjusted selection; manual touch-up tools as fallback | | Deterministic diff picks up too much noise | Medium | Mask-guided diffing constrained to selection area; threshold tuning; AI cleanup as safety net | | Extracted element has visible edge artifacts | Medium | Feathered edges on extraction; AI cleanup pass; user can manually adjust | | Gemini latency makes pipeline feel slow | Medium | Async with progress indicator; pre-generate extraction during idle time | | Solid-color background removal fails on elements that contain the key color | Low | Use uncommon key color; offer manual background selection; edge-detection fallback |

Relationship to Other Ideas

Design Layers (SS-RP-2026-008): Shared research on image differencing and AI cleanup. Extraction could feed into Design Layers by automatically promoting generated elements out of background layers into feature layers.
Generative AI Chat Interface (SS-RP-2026-003): Chat commands like "move the pergola behind the plants" would trigger extraction automatically.
Hardscape Design (shipped): Every hardscape generation would benefit from automatic extraction — the generated element becomes z-orderable immediately.

Open Questions

Automatic vs. manual trigger — Should extraction be offered automatically after every AI generation, or only when the user explicitly requests it?
Element type classification — Should extracted elements be categorized (plant, hardscape, structure) to join the right layer group, or stay as generic "user objects"?
Re-integration — If a user extracts an element and later wants to "flatten" it back into the background, should that be supported?
Multi-element extraction — Should the user be able to select and extract multiple elements in one operation (e.g., pergola + the two pillars)?

Design Decisions (Pending)

These will be resolved during research:

Which pipeline steps are necessary? — The minimal viable pipeline may be as few as 2 steps or as many as 4. Research and prototyping will determine which steps produce acceptable quality.
Should extraction integrate with generation? — If technically feasible, this is clearly the better UX. But it depends on model capabilities and latency budget.
Diff algorithm specifics — Threshold-based, mask-guided, or connected-component. Needs prototyping with real Gemini outputs to determine which handles AI distortion artifacts best.