Skip to main content
Generative AI / Visual DesignerActive Research

Generative AI Chat Interface

SS-RP-2026-003aichat interfacedraft
Created 2026-03-09Updated 2026-03-14
Brief

Generative AI Chat Interface

Status: Draft | ClickUp: 868hu2bq6

Overview

Natural language chat interface enabling users to interact with design tools, get plant recommendations, ask questions about their designs, and receive platform guidance through conversational AI.

Strategic Fit

Foundational piece of the AI product vision. The chat interface is the surface through which professionals interact with their "Jarvis" within SimplyScapes, and the protocol through which external personal AI systems could interact with SimplyScapes as an intelligent service.

TODO: Product Manager to expand with user stories and acceptance criteria.

Research Report

Generative AI Chat Interface: API Design, Interaction Model & Pricing

ID: SS-RR-2026-003 | Date: 2026-03-09 | Status: draft ClickUp: Generative AI Chat Interface Plan: SS-RP-2026-003 Domain: Generative AI / Visual Designer

TL;DR

SimplyScapes' v1 AI inpainting works but is single-turn: prompt in, image out. The v2 chat interface transforms this into a conversational design partner that can ask follow-up questions, present choices from the 2,500+ object library with thumbnails and pricing, accept user markup/annotations for spatial guidance, and answer landscaping knowledge questions — all through a two-phase intent routing architecture that uses cheap text classification (Gemini 2.5 Flash, ~$0.001/request) before expensive image generation ($0.039/image). No competitor in the landscape vertical offers this combination. The patent landscape is favorable (overall Low-Moderate risk, no landscape-specific AI patents exist), and the $995 gap between free AI inspiration tools and human design services represents a massive underserved market. A credit-based pricing model (10 credits = 1 image generation, free tier 50 credits/mo) aligns with the 126% YoY growth in credit-based SaaS pricing. The full architecture — intent routing, credit system, conversation persistence, object library integration, and markup-guided generation — is buildable in three 4-6 week phases using Gemini, Hasura, and the existing Next.js stack.


Part I: The Idea

1. What We're Exploring

SimplyScapes has a working v1 of AI-powered image generation in its Visual Designer — users type a prompt, Gemini generates or modifies the design background, and the result is displayed with edit history and revert. The interaction is single-turn: prompt in, image out.

The v2 evolution introduces a conversational AI layer. Instead of always generating immediately, the AI can ask follow-up questions ("Which style of fountain?"), present choices from the SimplyScapes object library (2,500+ plants, hardscape objects, materials), accept user markup/annotations to guide placement, and answer knowledge questions — all within a chat interface that already exists in the designer UI.

The core challenge is threefold: (1) Intent routing — how does the AI decide whether to generate, clarify, or answer? (2) Pricing — how do you meter and monetize a conversational AI experience with fine-grained credits? (3) API design — what does a general-purpose endpoint look like that handles all these interaction types while staying compatible with Gemini's API?

2. Why It Matters

This is SimplyScapes' transition from "AI as a feature" to "AI as the interaction layer." The current mode dropdown (Edit, Plant Selection, Ask, Finalize, Erase) already signals this ambition — the UI is ahead of the backend.

The business case is direct: AI-assisted design is the primary differentiator against desktop-bound competitors (ProLandscape, DynaSCAPE, Idea Spectrum) and the key retention driver for professionals who need to produce designs quickly in the field. Getting the interaction model right means landscapers spend less time crafting prompts and more time reviewing options — which is exactly the workflow that converts free users to paid subscribers.

The credit system is the monetization mechanism. Done well, it creates predictable revenue from AI usage while keeping the free tier generous enough to demonstrate value. Done poorly, it creates friction that drives users to competitors or to using Gemini directly.

The API design has long-term implications beyond the chat interface. A well-abstracted endpoint becomes the foundation for AI-powered proposals, plant recommendations, maintenance schedules, and any future AI capability — all through the same contract.


Part II: Research Findings

3. Intent Routing & Function Calling

The core architectural question: how does the system decide whether a user message should trigger image generation, a follow-up question, or a text answer?

3.1 Gemini's Function Calling Mechanism

Gemini's function calling API provides the foundation for intent routing. You declare tools (functions) with JSON schemas describing their names, parameters, and descriptions. When a user message arrives, Gemini analyzes it against the declared tools and either calls a function or responds with natural language.

Four calling modes control behavior:

| Mode | Behavior | Use Case | |------|----------|----------| | AUTO (default) | Model decides between function call or natural language | General-purpose routing — recommended for SimplyScapes | | ANY | Model must call a function from declared set | Forced classification — useful when you always want structured output | | NONE | Function calls disabled | Ask mode when you only want text answers | | VALIDATED (preview) | Ensures either valid function call or natural language | Stricter validation for production |

For SimplyScapes, the recommended approach is a two-phase architecture:

  1. Phase 1 — Intent Classification (function calling, text-only model). Send the user's text prompt + mode context to a cheap text model (Gemini 2.5 Flash or 3 Flash) with tool declarations. The model decides which tool to call: generate_image, ask_followup, search_objects, search_plants, answer_question, or request_markup. This call is inexpensive — text-only input/output at $0.30/$2.50 per million tokens on Gemini 2.5 Flash.

  2. Phase 2 — Execution. Based on the tool call, the backend executes the appropriate action. If the tool is generate_image, a separate call to the image generation model (Gemini 2.5 Flash Image or 3.1 Flash Image) produces the image at $0.039-$0.067 per image. If the tool is ask_followup or search_objects, the backend queries the SimplyScapes database directly — no LLM call needed.

Why two phases instead of one? Function calling and native image generation are separate capabilities in Gemini's architecture. You cannot declare tools and also request responseModalities: ["Image"] in the same call. The image generation models (Nano Banana series) handle image output; the standard models handle function calling. Separating intent classification from image generation also means you only pay for image generation when it's actually needed — follow-up questions and text answers skip the expensive image call entirely.

3.2 Mode as Hint vs Hard Constraint

The existing UI has a mode dropdown: Edit, Plant Selection, Ask, Finalize, Erase. The question is whether this mode should fully determine behavior or serve as a hint.

Recommendation: Mode as a strong hint that shapes the tool set, not a hard constraint.

  • When the user selects Edit, the system prompt emphasizes visual modification and the generate_image tool is prioritized, but ask_followup remains available for ambiguous requests.
  • When the user selects Ask, the system prompt emphasizes knowledge and answer_question is prioritized, but the AI can still suggest visual changes if relevant.
  • When the user selects Erase, the tool set is restricted to generate_image (with erase-specific parameters) and request_markup — the AI can ask the user to mark what to erase but shouldn't answer unrelated questions.

This approach uses Gemini's allowed_function_names parameter within ANY mode to restrict the tool set per mode, while still allowing the AI latitude to clarify ambiguous requests.

3.3 Latency and Cost Implications

Based on published benchmarks and pricing:

  • Intent classification call (Gemini 2.5 Flash, ~200 tokens in, ~50 tokens out): ~$0.000075 per call. At 1,000 daily interactions, ~$0.075/day.
  • Image generation call (Gemini 2.5 Flash Image, 1024px): $0.039 per image.
  • Text answer call (Gemini 2.5 Flash, ~500 tokens out): ~$0.0013 per response.

The intent classification adds one extra LLM round trip (~200-400ms based on Gemini Flash benchmarks). For synchronous workflows, this is acceptable — the user sees a brief loading state before the follow-up question appears or image generation starts. The cost is negligible relative to image generation.

Parallel function calling is supported — Gemini can call multiple tools in one response (e.g., search_objects and ask_followup simultaneously to present choices with a question). This is useful for the Plant Selection mode where the AI might search plants and formulate a question in one turn.

3.4 Migration Alert: Gemini 2.0 Flash Deprecation

Critical finding: Gemini 2.0 Flash and 2.0 Flash-Lite are being deprecated and will shut down June 1, 2026. If SimplyScapes' current v1 uses gemini-2.0-flash-exp for inpainting, migration to Gemini 2.5 Flash Image (or newer) must be part of the v2 work. The 2.5 Flash Image model costs $0.039/image at 1024px and supports the same text+image input/image output workflow.

3.5 Key Decisions for SimplyScapes

  1. Two-phase routing is the recommended architecture. Cheap text model for intent classification → expensive image model only when needed.
  2. Use ANY mode with allowed_function_names per UI mode to constrain the tool set while preserving flexibility.
  3. Keep tool declarations to 6-8 total — well within Gemini's recommended ceiling of 10-20 for optimal accuracy.
  4. Low temperature (0-0.2) for intent classification ensures deterministic routing decisions.
  5. Migrate off Gemini 2.0 Flash before June 2026 — target Gemini 2.5 Flash Image or evaluate 3.1 Flash Image (faster, higher resolution, slightly more expensive).

4. Rich Interaction Patterns

Moving from single-turn prompting to a conversational interface requires defining how each mode behaves, when follow-ups occur, how the object library integrates, and how markup/annotation works as input.

4.1 Interaction Model Patterns from Industry Leaders

ChatGPT's image editing model sets the benchmark for multi-turn conversational image editing. GPT Image 1.5 (late 2025) introduced the ability to make precise edits while keeping existing details intact — users describe changes in natural language, and the system modifies the image iteratively without regenerating from scratch. The key pattern: edits are additive and conversational. Users say "move the tree to the left" or "make the sky warmer" and the model maintains context across turns. This is exactly the workflow SimplyScapes needs.

Canva Magic Studio takes a different approach — unified but modal. Users choose between "Design", "Image", "Doc", "Code", or "Video clip" up front (similar to SimplyScapes' mode dropdown), then interact within that mode. Canva's AI assistant supports natural language and voice commands, and can be invoked via @mention in comments. The key insight: Canva keeps the interface simple by limiting AI actions per context rather than exposing all capabilities at once. This validates SimplyScapes' mode dropdown approach.

Figma AI demonstrates in-canvas AI assistance where actions happen directly on the design surface rather than in a separate chat panel. Figma's "Code to Canvas" feature (Anthropic partnership, Feb 2026) converts AI-generated code into editable Figma frames — real design objects, not flat images. The transferable pattern: AI outputs should be native design objects when possible, not just rendered images. For SimplyScapes, this means when the AI suggests a plant, the result should ideally be a placed design object, not just a generated image with the plant painted in.

4.2 End-to-End Flows for Each Mode

Based on the two-phase routing architecture (Topic 1) and industry patterns:

Edit Mode (with follow-up capability):

User: "Add a water feature near the patio" [+ design image]
  ↓
Phase 1 (intent classification):
  Gemini analyzes prompt → detects ambiguity ("water feature" is broad)
  → calls ask_followup(question, options[]) with search_objects(category="water_features")
  ↓
Backend: Queries design_object table WHERE type = 'water_feature'
  Returns: [{name: "Tiered Fountain", thumbnail: url, id: uuid}, ...]
  ↓
Response: {type: "choices", question: "Which style?", options: [...]}
  ↓
User: Selects "Tiered Fountain"
  ↓
Phase 1 again: Gemini sees selection → calls request_markup(instruction)
  ↓
Response: {type: "markup_request", instruction: "Draw where you'd like the fountain placed"}
  ↓
User: Draws circle on design image [markup mode]
  ↓
Client sends: parts[text + selection + image + markup]
  ↓
Phase 2 (image generation):
  Gemini Image model receives: original image + markup overlay + prompt
  "Add a Tiered Fountain at the marked location"
  → Generates new design image
  ↓
Response: {type: "image", data: base64, creditsUsed: 10}

Edit Mode (direct generation — no ambiguity):

User: "Make the sky more dramatic with sunset colors" [+ design image]
  ↓
Phase 1: Gemini analyzes → clear intent, no ambiguity
  → calls generate_image(prompt, style_guidance)
  ↓
Phase 2: Image generation model produces result
  ↓
Response: {type: "image", creditsUsed: 10}

Plant Selection Mode:

User: "I need shade trees for zone 9" [+ design image]
  ↓
Phase 1: Gemini → calls search_plants(query="shade trees", filters={zone: 9})
  ↓
Backend: Queries plant table WHERE zone_min <= 9 AND zone_max >= 9
  AND sun IN ('full shade', 'partial shade') AND habit = 'tree'
  Returns top matches with images and metadata
  ↓
Response: {type: "choices", question: "Here are shade trees for zone 9:",
  options: [{name: "Crape Myrtle", thumbnail: url, ...}, ...]}
  ↓
User: Selects "Crape Myrtle"
  ↓
Phase 2: Image generation with plant placement
  ↓
Response: {type: "image", creditsUsed: 10}

Ask Mode:

User: "What ground cover works well under these oaks?"
  ↓
Phase 1: Gemini → calls answer_question(response_text)
  ↓
Response: {type: "text", text: "For shade under oaks, consider...",
  creditsUsed: 1}

Finalize Mode:

User: "Polish this design — enhance lighting and add depth" [+ design image]
  ↓
Phase 1: Gemini → calls generate_image(prompt, style_guidance="finalize")
  ↓
Phase 2: Image generation with enhancement prompt
  ↓
Response: {type: "image", creditsUsed: 10}

Erase Mode:

User: "Remove the old hedge along the fence" [+ design image]
  ↓
Phase 1: Gemini → needs to know what to erase
  → calls request_markup(instruction="Mark the hedge you want removed")
  ↓
User: Draws over the hedge [markup mode]
  ↓
Client sends: parts[text + image + markup]
  ↓
Phase 2: Image generation with mask-based inpainting
  ↓
Response: {type: "image", creditsUsed: 10}

4.3 Object Library Integration

The SimplyScapes platform has rich structured data that most AI image generators lack:

  • design_object table: Hardscape items (fountains, benches, pavers, pergolas) with images, transparent cutouts, types, and tags
  • plant table: 2,500+ plants with taxonomy, physical attributes, zone data, seasonal characteristics, and multiple image types
  • design_material table: Texture fills (grass, mulch, stone) with type categorization

How choices are presented:

When the AI calls search_objects or search_plants, the backend queries the appropriate table and returns structured results. The chat UI renders these as selectable cards with:

  • Thumbnail image (from plant_image or design_object transparent cutout)
  • Name and key attributes (height, width, zone, water needs for plants)
  • Category badge (e.g., "Water Feature", "Shade Tree")

Filtering strategy:

The AI's tool call includes parameters that map to database queries:

  • search_objects(category, query, limit)SELECT * FROM design_object WHERE type_id = ? AND (name ILIKE ? OR tags @> ?)
  • search_plants(query, filters, limit)SELECT * FROM plant WHERE zone_min <= ? AND zone_max >= ? AND sun = ? AND ...

This grounds the AI's suggestions in real inventory — users can only select items that actually exist in the SimplyScapes library, preventing hallucination.

4.4 Markup/Annotation as Input

User markup is a distinct input type that guides spatial placement in image generation. The implementation:

Markup mode activation: When the AI calls request_markup, the client transitions to a lightweight drawing mode. The user draws directly on the design image — freehand circles, arrows, or highlighted regions.

Markup capture: The drawing is captured as a separate image layer (PNG with transparency). Only the user's strokes are captured, not the underlying design.

Markup as a part: The markup image is sent as {type: "markup", mimeType: "image/png", data: base64} alongside the original design image. The two images are composited or sent as separate inputs to the image generation model.

How Gemini uses markup: The image generation model receives the original image and a prompt that references the markup: "Add a Tiered Fountain at the location marked by the user's drawing." The markup provides spatial guidance that text alone cannot convey — this is a significant UX advantage over pure text prompting.

Existing precedent: Mask-based inpainting is well-established — white pixels indicate the area to modify, black pixels indicate areas to preserve. SimplyScapes' markup follows this pattern but is more expressive: instead of binary mask, the user's drawing conveys intent (circled area = "here", arrow = "direction", scribble = "remove this"). The system prompt guides Gemini to interpret these cues.

4.5 Multi-Turn Context

Within a design session, conversations should maintain context across turns:

  • Turn 1: "Add some Mediterranean landscaping elements"
  • Turn 2: (after viewing choices) "The lavender, and also add a gravel path"
  • Turn 3: "Actually, make the path curved instead of straight"

The system needs to track: (1) what was discussed, (2) what was selected, (3) what was generated. This is covered in detail in Topic 6 (Conversation Persistence).

Recommended context window: Keep the last 5-8 turns of conversation history. Beyond that, the context becomes expensive to replay and the user likely started a new train of thought. This is consistent with ChatGPT's approach where long conversations gradually lose coherence on early context.


5. System Prompt Prototyping

Draft system instructions and tool declarations for each mode. These are starting points for implementation — they'll need iteration based on testing with real design images.

5.1 Shared Preamble (All Modes)

All modes receive this shared context at the start of the system instruction:

You are an AI design assistant for SimplyScapes, a web-based landscape
design platform. You help landscaping professionals and homeowners
create beautiful outdoor spaces.

You are working inside the Visual Designer, where users overlay plants,
hardscape objects, and materials onto property photos.

IMPORTANT RULES:
- Only suggest plants and objects that exist in the SimplyScapes library.
  Use the search_plants and search_objects tools to find real items.
- Never hallucinate plant names, object types, or materials.
- When unsure about what the user wants, ask a clarifying question
  using the ask_followup tool rather than guessing.
- Keep responses concise — users are often on mobile devices in the field.
- Reference the design image when describing placement or changes.

5.2 Mode-Specific System Instructions

Edit Mode:

MODE: EDIT — Visual design modification

You help users modify their landscape design. Users describe changes
they want — adding elements, changing colors, adjusting the scene,
modifying backgrounds, or enhancing areas.

WHEN TO GENERATE vs CLARIFY:
- If the request is specific enough to act on (e.g., "make the sky
  bluer", "add more contrast"), use generate_image immediately.
- If the request is ambiguous or has multiple possible interpretations
  (e.g., "add a water feature", "make it look nicer"), use ask_followup
  with choices from the object library via search_objects.
- If the user needs to show you WHERE to make a change, use
  request_markup to ask them to draw on the image.

GUARDRAILS:
- Do not remove elements unless explicitly asked.
- Preserve existing plants and objects when modifying backgrounds.
- Maintain the overall style and lighting of the original photo.

Plant Selection Mode:

MODE: PLANT SELECTION — Find and place plants

You help users find the right plants for their landscape design.
Use the search_plants tool to find plants that match their criteria.

WORKFLOW:
1. Understand what the user needs (shade/sun, size, type, zone, style)
2. Search the plant library using search_plants with appropriate filters
3. Present results using ask_followup with plant options
4. When the user selects a plant, use generate_image to place it

ALWAYS search the library before suggesting plants. Never recommend
a plant by name without first confirming it exists in the
SimplyScapes plant database via search_plants.

If the user's zone, sun, or soil conditions aren't clear from
context, ask — incorrect plant recommendations are worse than
asking a question.

Ask Mode:

MODE: ASK — Knowledge and advice

You answer questions about landscaping, plants, design principles,
materials, and outdoor spaces. You are knowledgeable but not a
replacement for a licensed landscape architect.

BEHAVIOR:
- Provide helpful, accurate answers about landscaping topics.
- When referencing specific plants, use search_plants to verify
  they exist in the SimplyScapes library and include relevant
  attributes (zone range, water needs, size).
- If a question would be better served by a visual change,
  suggest the user switch to Edit or Plant Selection mode.
- Keep answers concise (2-3 paragraphs max).

Do NOT generate images in this mode. Use answer_question for
all responses.

Finalize Mode:

MODE: FINALIZE — Polish and enhance

You enhance the overall quality of the design image for
presentation purposes — improving lighting, adding depth,
enhancing colors, and making the scene look professionally
rendered.

BEHAVIOR:
- Apply photorealistic enhancement to the entire image.
- Maintain all existing design elements (plants, objects,
  structures) exactly as placed.
- Focus on: lighting consistency, shadow depth, color
  vibrancy, atmospheric quality, and professional polish.
- If the user provides specific enhancement requests,
  prioritize those.

GUARDRAILS:
- Never add or remove design elements.
- Never change plant species or object types.
- Never alter the fundamental composition or layout.

Use generate_image with style_guidance="finalize" for all
requests in this mode.

Erase Mode:

MODE: ERASE — Remove elements from the design

You help users remove unwanted elements from their landscape
design — old plants, structures, objects, or background elements.

WORKFLOW:
1. If the user describes WHAT to remove but not WHERE, use
   request_markup to ask them to mark the area.
2. If the user provides both description and markup, use
   generate_image with the mask to remove the element and
   fill with appropriate background.

GUARDRAILS:
- Only remove what the user explicitly asks to remove.
- Fill erased areas with contextually appropriate background
  (match surrounding grass, fence, sky, etc.).
- Never add new elements during an erase operation.

Do NOT use ask_followup in this mode unless the user's
request is truly incomprehensible. Erase should feel fast
and direct.

5.3 Tool Declarations (JSON Schema)

These are the Gemini function calling tool declarations, shared across modes with per-mode filtering via allowed_function_names:

{
  "tools": [{
    "function_declarations": [
      {
        "name": "generate_image",
        "description": "Generate or modify the design image based on the user's request. Call this when you have enough information to produce a visual result.",
        "parameters": {
          "type": "object",
          "properties": {
            "prompt": {
              "type": "string",
              "description": "Detailed prompt describing the desired image modification. Be specific about what to add, change, or enhance."
            },
            "style_guidance": {
              "type": "string",
              "enum": ["edit", "finalize", "erase", "plant_placement"],
              "description": "The type of image operation to perform."
            },
            "placement_hints": {
              "type": "string",
              "description": "Optional spatial guidance for element placement (e.g., 'left side of patio', 'along the fence line'). Use when the user has described a location in text."
            }
          },
          "required": ["prompt", "style_guidance"]
        }
      },
      {
        "name": "ask_followup",
        "description": "Ask the user a clarifying question, optionally presenting choices from the object or plant library. Use when the request is ambiguous or multiple options exist.",
        "parameters": {
          "type": "object",
          "properties": {
            "question": {
              "type": "string",
              "description": "The question to ask the user. Keep it concise and actionable."
            },
            "options": {
              "type": "array",
              "description": "Optional list of choices to present. Each option references a SimplyScapes library item.",
              "items": {
                "type": "object",
                "properties": {
                  "id": { "type": "string", "description": "The SimplyScapes object or plant ID" },
                  "name": { "type": "string", "description": "Display name" },
                  "type": { "type": "string", "enum": ["design_object", "plant", "design_material"] }
                },
                "required": ["id", "name", "type"]
              }
            }
          },
          "required": ["question"]
        }
      },
      {
        "name": "search_objects",
        "description": "Search the SimplyScapes design object library for hardscape items like fountains, benches, pavers, pergolas, fire pits, etc.",
        "parameters": {
          "type": "object",
          "properties": {
            "category": { "type": "string", "description": "Object category to filter by (e.g., 'water_feature', 'seating', 'paving', 'lighting', 'structure')" },
            "query": { "type": "string", "description": "Free-text search query" },
            "limit": { "type": "integer", "description": "Max results to return (default 5, max 10)" }
          },
          "required": ["query"]
        }
      },
      {
        "name": "search_plants",
        "description": "Search the SimplyScapes plant library. Supports filtering by USDA zone, sun exposure, water needs, plant type, and size.",
        "parameters": {
          "type": "object",
          "properties": {
            "query": { "type": "string", "description": "Free-text search (common name, genus, species)" },
            "zone": { "type": "integer", "description": "USDA hardiness zone (1-13)" },
            "sun": { "type": "string", "enum": ["full_sun", "partial_sun", "partial_shade", "full_shade"] },
            "water": { "type": "string", "enum": ["low", "moderate", "high"] },
            "plant_type": { "type": "string", "description": "e.g., 'tree', 'shrub', 'perennial', 'annual', 'ground_cover', 'grass'" },
            "limit": { "type": "integer", "description": "Max results (default 5, max 10)" }
          },
          "required": ["query"]
        }
      },
      {
        "name": "answer_question",
        "description": "Provide a text-only answer to a knowledge question about landscaping, plants, design, or materials. No image generation.",
        "parameters": {
          "type": "object",
          "properties": {
            "response_text": {
              "type": "string",
              "description": "The answer text. Keep concise (2-3 paragraphs max)."
            }
          },
          "required": ["response_text"]
        }
      },
      {
        "name": "request_markup",
        "description": "Ask the user to draw/annotate on the design image to indicate a location or area. The client will enter markup mode.",
        "parameters": {
          "type": "object",
          "properties": {
            "instruction": {
              "type": "string",
              "description": "Tell the user what to mark. Be specific: 'Circle the area where you want the fountain' or 'Draw over the hedge you want removed'."
            }
          },
          "required": ["instruction"]
        }
      }
    ]
  }]
}

5.4 Tool Availability Per Mode

| Tool | Edit | Plant Selection | Ask | Finalize | Erase | |------|------|-----------------|-----|----------|-------| | generate_image | Yes | Yes | No | Yes | Yes | | ask_followup | Yes | Yes | No | No | No | | search_objects | Yes | No | Yes | No | No | | search_plants | No | Yes | Yes | No | No | | answer_question | No | No | Yes | No | No | | request_markup | Yes | No | No | No | Yes |

This matrix is enforced via allowed_function_names in the Gemini API call. For example, Edit mode sends:

"tool_config": {
  "function_calling_config": {
    "mode": "ANY",
    "allowed_function_names": ["generate_image", "ask_followup", "search_objects", "request_markup"]
  }
}

5.5 Prompt Engineering Notes

Based on Gemini best practices research:

  1. Keep system instructions concise. Gemini 3+ models reason naturally and perform worse with over-explained prompts. The mode-specific instructions above are intentionally short.
  2. Place instructions before input. Gemini performs better when format and rules come first, followed by the user's content.
  3. Use low temperature (0-0.2) for intent classification. This ensures deterministic tool selection. The image generation model can use higher temperature for creative output.
  4. Limit to 6 tools total. Within the recommended ceiling of 10-20, but keeping it tight improves selection accuracy.
  5. Enum constraints on parameters. style_guidance, sun, water, plant_type use enums to prevent invalid values and improve parameter accuracy.

6. Credit System Architecture

The credit system must balance three concerns: (1) covering actual Gemini API costs with healthy margin, (2) creating predictable pricing that users understand, and (3) providing enough granularity to differentiate cheap operations (text answers) from expensive ones (image generation).

6.1 Competitive Pricing Landscape

The adjacent market analysis reveals clear pricing patterns across seven AI-powered creative platforms:

| Product | Model | Free Tier | Entry Paid | Credit Unit | Overage | |---------|-------|-----------|------------|-------------|---------| | Canva AI | Bundled subscription | 50 total (lifetime) | $15/mo → 500/mo | Per generation | Hard wall | | Figma AI | Per-seat credits | 500/mo (Starter) | $5/editor/mo → 3,000/mo | Variable by action (30-100+) | Enforcement Mar 2026 | | Adobe Firefly | Tiered credits | Via free CC | $9.99/mo → 2,000/mo | 1/image; 20-100/sec video | On-demand purchase | | Midjourney | GPU time | None | $10/mo → 3.3 GPU hrs | GPU minutes | Relax Mode (unlimited, lower priority) | | Runway ML | Credits + unlimited | 125 one-time | $12/mo → 625/mo | $0.01/credit | Buy more or Relax Mode | | ChatGPT | Rate-limited | 2-3 images/day | $20/mo → 50/3hrs | Images per window | Wait for reset | | LeanScaper | Credits (landscaping) | 250/mo | $300/mo → 3,000/mo | Per AI action | $150/1,000 credits top-up |

Key patterns:

  • Credits are dominant — 5 of 7 products use credit systems. Time-window rate limiting (ChatGPT) is the exception.
  • Variable cost by complexity — Figma charges 30-100+ credits depending on action complexity. Adobe charges 1 credit for images, 20-100/sec for video. This is best practice.
  • "Unlimited at lower priority" reduces anxiety — Midjourney and Runway both offer unlimited generation at relaxed priority. This is powerful for creative exploration.
  • Free tiers are tight but present — Canva (50 lifetime), ChatGPT (2-3/day), Runway (125 one-time). Enough to evaluate, not enough to work.
  • Team credit pooling is an enterprise upsell — Adobe and Figma both offer or are introducing shared pools.

In the vertical market, LeanScaper is the only landscape competitor with a credit system — $300-$1,500/mo with 3,000-18,000 credits and $150/1,000 top-ups. Their credits cover business operations (marketing, financials, SOPs), not design generation.

6.2 Cost Foundation: Gemini API Pricing

The credit system must be anchored to actual costs. Current Gemini pricing (March 2026):

| Model | Use Case | Cost | Notes | |-------|----------|------|-------| | Gemini 2.5 Flash (text) | Intent classification | $0.30 input / $2.50 output per 1M tokens | ~$0.00008 per classification call | | Gemini 2.5 Flash Image | Image generation | $0.039 per image (1024px) | Primary image model | | Gemini 3.1 Flash Image | Image generation | $0.067 per image (1024px) | Faster, higher res, more expensive | | Gemini 2.5 Flash (text) | Text answer | ~$0.0013 per response (500 tokens) | Ask mode answers |

Per-interaction cost breakdown:

| Operation | Gemini Calls | Estimated Cost | Credit Cost | Margin | |-----------|-------------|---------------|-------------|--------| | Image generation (edit/finalize/erase) | 1 text + 1 image | ~$0.040 | 10 credits | ~100x at $0.40/10 credits | | Image gen with follow-up first | 2 text + 1 image | ~$0.040 | 10 credits (gen only) | Same — follow-up is free | | Text answer (ask mode) | 1 text | ~$0.001 | 2 credits | ~100x at $0.08/2 credits | | Follow-up question | 1 text | ~$0.00008 | 0 credits | Subsidized | | Object/plant search | 0 (DB query) | ~$0.00 | 0 credits | Pure DB cost |

The high margin is intentional — it covers infrastructure overhead, rate limiting, conversation state storage, credit system management, and provides room for Gemini pricing increases. At the recommended credit price of ~$0.04/credit, margins are healthy even if Gemini costs increase 5-10x.

6.3 Credit Cost Table

| Operation | Credits | Rationale | |-----------|---------|-----------| | Image generation (edit, finalize, erase) | 10 | Full Gemini image model call — the expensive operation | | Image generation with markup | 10 | Same cost — markup is additional input, not additional inference | | Plant placement generation | 10 | Image generation with plant overlay | | Text response (ask mode) | 2 | Text-only LLM call, significantly cheaper | | Follow-up / clarification question | 0 | Encourages exploration; negligible LLM cost for intent classification | | Object library search | 0 | Local database query, no LLM cost | | Plant library search | 0 | Local database query, no LLM cost | | Markup request | 0 | No LLM cost — client-side annotation mode |

Design principles:

  • Free actions encourage engagement. Making follow-ups and searches free removes friction from the conversational flow. Users shouldn't hesitate to clarify or browse.
  • 10-credit generations align with the user's mental model. "10 credits = 1 image" is easy to understand and remember.
  • Text answers are cheap but not free. 2 credits acknowledges LLM cost while keeping Ask mode accessible. At 2 credits, a user with 100 credits gets 10 images or 50 text answers.

6.4 Tier Structure

Based on competitive analysis and the cost model:

| Tier | Monthly Credits | Image Generations | Monthly Price | Per-Credit Cost | Target User | |------|----------------|-------------------|---------------|-----------------|-------------| | Free | 50 | ~5 images | $0 | N/A | Homeowner evaluation | | Starter (subscriber) | 200 | ~20 images | Included with subscription | Bundled | Active subscriber | | Pro (subscriber) | 500 | ~50 images | Included with subscription | Bundled | Professional subscriber | | Credit Pack (add-on) | 500 | ~50 images | $19.99 | $0.04/credit | Top-up for any tier | | Credit Pack (add-on) | 1,500 | ~150 images | $49.99 | $0.033/credit | Volume top-up |

Design decisions:

  1. Free tier: 50 credits/month (not lifetime). Canva's 50-lifetime cap is stingy. Monthly reset encourages repeat visits. 50 credits = 5 images + some text answers — enough to experience the feature, not enough for production work.

  2. Bundled with subscription. AI credits are included in the SimplyScapes subscription tiers (Starter, Pro), not sold separately as a base product. This follows Canva and Adobe's model of bundling AI into existing plans.

  3. Credit packs as add-ons. Subscribers who need more credits can purchase packs. This follows Adobe's on-demand purchase model and LeanScaper's $150/1,000 top-up pattern.

  4. Workspace-scoped, not per-user. Credits belong to the workspace. All team members draw from the same pool. This is simpler than per-user allocation and matches SimplyScapes' workspace-first architecture.

  5. No "unlimited" tier initially. The Midjourney/Runway "relaxed unlimited" pattern is attractive but adds infrastructure complexity (priority queuing). Consider for v2.

6.5 Workspace Credit Management

Who manages credits:

  • Workspace admins can view credit balance, purchase credit packs, and see usage history
  • All workspace members can use credits (no per-user limits initially)
  • Credit balance is displayed in the AI tab header (already implemented per PR #1166)

Credit deduction flow:

1. User sends request → API receives
2. API checks workspace credit balance
3. If insufficient credits → return {type: "error", code: "insufficient_credits"}
4. Pre-deduct credits (optimistic deduction)
5. Call Gemini API
6. If Gemini fails → refund credits, return error
7. If Gemini succeeds → confirm deduction, return response

Race condition handling: Use Hasura's atomic operations (e.g., update_workspace SET credits = credits - 10 WHERE credits >= 10 RETURNING credits). The WHERE credits >= 10 clause prevents negative balances without explicit locking.

Credit exhaustion mid-conversation: When credits run out during a conversation, the current turn completes (credits were pre-deducted), but the next generation request returns an insufficient_credits error with a prompt to purchase more. Follow-ups and searches remain free, so the conversation can continue — the user just can't generate images.

6.6 Whitelabel Credit Model

For whitelabel instances (the instance system):

  • Each instance has independent credit configuration. The whitelabel partner decides their own credit allocation per tier.
  • SimplyScapes bills the partner at wholesale. Partners purchase credits in bulk and resell at their own markup.
  • Per-instance enable/disable. Already a planned task — the API checks instance.ai_enabled before processing requests.
  • Instance-scoped credit pools. Credits are tracked per workspace within the instance. The partner's admin dashboard shows aggregate usage.

6.7 Technical Implementation: Stripe + Chargebee

SimplyScapes already has dual billing (Stripe + Chargebee). For credits:

Stripe is the recommended path for credit packs. Stripe's usage-based billing now supports credit-based pricing natively — customers prepay for credits that are spent in real time. Key capabilities:

  • Meters for tracking credit consumption events
  • Billing credits that can be granted, spent, and tracked
  • Recurring credit grants (monthly allocation with subscription)
  • Proration and rollover policy control
  • Up to 100M usage events/month included with Stripe Billing

Implementation approach:

  1. Subscription tiers (Chargebee, existing): Include monthly credit allocation as a plan feature
  2. Credit pack purchases (Stripe): One-time checkout session for credit packs
  3. Credit tracking (Hasura): workspace_ai_credits table tracking balance, transactions, and usage history
  4. Real-time deduction (API): Atomic credit deduction on each AI operation

Hasura schema sketch:

-- Credit balance per workspace
CREATE TABLE workspace_ai_credits (
  workspace_id UUID PRIMARY KEY REFERENCES workspace(id),
  balance INTEGER NOT NULL DEFAULT 0,
  monthly_allocation INTEGER NOT NULL DEFAULT 0,
  last_reset_at TIMESTAMPTZ,
  updated_at TIMESTAMPTZ DEFAULT now()
);

-- Credit transaction log
CREATE TABLE ai_credit_transaction (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  workspace_id UUID NOT NULL REFERENCES workspace(id),
  amount INTEGER NOT NULL, -- positive = credit, negative = debit
  type TEXT NOT NULL, -- 'monthly_grant', 'pack_purchase', 'generation', 'text_answer', 'refund'
  reference_id TEXT, -- Stripe payment ID, AI request ID, etc.
  created_at TIMESTAMPTZ DEFAULT now(),
  created_by UUID REFERENCES "user"(id)
);

7. API Endpoint Design

The API endpoint is the contract between the designer client and the backend AI system. It must handle multi-part payloads (text, images, masks, markup, selections), route to the correct Gemini model based on intent classification, manage conversation state, enforce credit limits, and return polymorphic responses (images, text, choices, errors).

7.1 Endpoint Specification

POST /api/v1/ai/generate
Content-Type: application/json
Authorization: Bearer <Firebase JWT>

Request:

type AIMode = "edit" | "plant_selection" | "ask" | "finalize" | "erase";

type Part =
  | { type: "text"; content: string }
  | { type: "image"; mimeType: string; data: string }       // Design image (base64)
  | { type: "mask"; mimeType: string; data: string }         // Binary mask for erase
  | { type: "markup"; mimeType: string; data: string }       // User annotation overlay
  | { type: "selection"; objectId: string; objectType: string; label?: string };

interface AIGenerateRequest {
  mode: AIMode;
  parts: Part[];
  designId: string;
  conversationId?: string;    // Omit for new conversation
  userId: string;             // From Firebase JWT (validated server-side)
  workspaceId: string;        // From JWT claims
  instanceId?: string;        // Whitelabel instance
}

Response:

interface AIGenerateResponse {
  id: string;                 // Unique response ID
  conversationId: string;     // Created on first turn, returned for subsequent turns
  type: "image" | "text" | "choices" | "markup_request" | "error";

  // Present when type = "image"
  image?: {
    data: string;             // Base64 encoded
    mimeType: string;
    promptUsed: string;       // The actual prompt sent to Gemini (for debugging/transparency)
  };

  // Present when type = "text"
  text?: string;

  // Present when type = "choices"
  choices?: {
    question: string;
    options: Array<{
      id: string;
      name: string;
      thumbnail?: string;     // URL to SimplyScapes CDN
      type: "design_object" | "plant" | "design_material";
      description?: string;
      metadata?: Record<string, unknown>;  // Zone, height, sun, etc.
    }>;
  };

  // Present when type = "markup_request"
  markupRequest?: {
    instruction: string;      // "Circle where you want the fountain placed"
  };

  // Present when type = "error"
  error?: {
    code: "insufficient_credits" | "rate_limited" | "content_filtered"
        | "generation_failed" | "instance_disabled" | "invalid_request";
    message: string;
    retryable: boolean;
  };

  // Always present
  usage: {
    creditsUsed: number;
    creditsRemaining: number;
    model: string;            // e.g. "gemini-2.5-flash-image"
  };
}

Changes from brainstorm draft: Added markup_request response type (separate from choices), added retryable to errors, added metadata to choice options for plant attributes, added promptUsed for transparency.

7.2 Request Processing Pipeline

Client Request
  │
  ├─ 1. Auth: Validate Firebase JWT, extract userId + workspaceId
  ├─ 2. Instance Check: If instanceId, verify ai_enabled = true
  ├─ 3. Credit Check: Verify workspace has sufficient credits
  ├─ 4. Rate Limit: Check per-workspace rate limit (e.g., 30 req/min)
  ├─ 5. Conversation Load: If conversationId, load last N turns
  │
  ├─ 6. Phase 1 — Intent Classification
  │     ├─ Build system prompt (shared preamble + mode-specific)
  │     ├─ Include tool declarations (filtered by mode)
  │     ├─ Include conversation history (if any)
  │     ├─ Send text parts to Gemini 2.5 Flash (text model)
  │     └─ Receive: function call (tool name + args) OR text response
  │
  ├─ 7. Tool Execution
  │     ├─ generate_image → Phase 2 (image generation)
  │     ├─ ask_followup → Build choices response from args
  │     ├─ search_objects → Query design_object table → format as choices
  │     ├─ search_plants → Query plant table → format as choices
  │     ├─ answer_question → Extract text → build text response
  │     └─ request_markup → Build markup_request response
  │
  ├─ 8. Phase 2 — Image Generation (only if generate_image called)
  │     ├─ Build image generation prompt from tool args
  │     ├─ Include: original design image + mask/markup (if provided)
  │     ├─ Call Gemini 2.5 Flash Image (or 3.1 Flash Image)
  │     ├─ Set responseModalities: ["TEXT", "IMAGE"]
  │     └─ Extract generated image from response
  │
  ├─ 9. Credit Deduction: Deduct credits based on operation type
  ├─ 10. Conversation Save: Store turn in conversation history
  └─ 11. Response: Return AIGenerateResponse

7.3 Gemini Translation Layer

The translation layer converts SimplyScapes' request format to Gemini's API format. This is the abstraction boundary that enables future provider swapping.

// Provider interface — Gemini-first, but swappable
interface AIProvider {
  classifyIntent(
    systemPrompt: string,
    tools: ToolDeclaration[],
    parts: GeminiPart[],
    history: ConversationTurn[],
    config: { temperature: number; toolConfig: ToolConfig }
  ): Promise<IntentResult>;

  generateImage(
    prompt: string,
    referenceImage: Buffer,
    mask?: Buffer,
    config: { model: string; size: number }
  ): Promise<GeneratedImage>;
}

// Gemini-specific implementation
class GeminiProvider implements AIProvider {
  async classifyIntent(/* ... */): Promise<IntentResult> {
    const response = await genAI.models.generateContent({
      model: "gemini-2.5-flash",
      contents: this.buildContents(parts, history),
      config: {
        systemInstruction: systemPrompt,
        temperature: 0.1,
        tools: [{ functionDeclarations: tools }],
        toolConfig: { functionCallingConfig: config.toolConfig },
      },
    });
    return this.parseIntentResult(response);
  }

  async generateImage(/* ... */): Promise<GeneratedImage> {
    const response = await genAI.models.generateContent({
      model: "gemini-2.5-flash-image",
      contents: this.buildImageContents(prompt, referenceImage, mask),
      config: {
        responseModalities: ["TEXT", "IMAGE"],
      },
    });
    return this.extractImage(response);
  }
}

Key abstraction decisions:

  • classifyIntent and generateImage are separate methods — reflecting the fundamental Gemini constraint that function calling and image generation can't be in the same API call.
  • The provider interface is minimal — only the methods SimplyScapes actually needs. Not a full LLM abstraction.
  • Parts translation (SimplyScapes format → Gemini format) happens inside the provider, not in the API route.

7.4 App Router vs Pages Router

The current endpoint is at src/pages/api/ai-inpanting.ts (Pages Router). The recommendation:

Migrate to App Router Route Handler at src/app/api/v1/ai/generate/route.ts.

Reasons:

  • App Router is Next.js' future — Pages Router is in maintenance mode
  • Route Handlers support streaming responses natively via ReadableStream (for future text streaming)
  • Server Actions pattern is cleaner for auth validation
  • Colocated with the rest of the v2 AI infrastructure
  • The existing Pages Router endpoint continues working during migration — no breaking change

Migration path:

  1. Build the new endpoint at src/app/api/v1/ai/generate/route.ts
  2. Keep src/pages/api/ai-inpanting.ts running for v1 compatibility
  3. Designer client switches to the new endpoint when v2 chat UI ships
  4. Deprecate the old endpoint after v2 is stable

7.5 Validation and Rate Limiting

Request validation:

  • mode must be one of the 5 valid modes
  • parts must contain at least one text part
  • Image parts must be valid base64 with accepted MIME types (image/png, image/jpeg, image/webp)
  • designId must reference an existing design in the user's workspace
  • userId is validated against the JWT (not trusted from the request body)

Rate limiting:

  • Per-workspace: 30 requests/minute (prevents abuse, allows burst usage during design sessions)
  • Per-user within workspace: No limit initially (workspace credits are the natural throttle)
  • Global: Gemini API rate limits are the backstop

Implementation: Use a simple Redis-based or in-memory rate limiter. Vercel's edge middleware can handle this, or use a lightweight library like @upstash/ratelimit.

7.6 Error Handling Matrix

| Error | Code | HTTP Status | Retryable | User-Facing Message | |-------|------|-------------|-----------|---------------------| | Workspace has no credits | insufficient_credits | 402 | No | "You're out of AI credits. Purchase more in Settings." | | Rate limit exceeded | rate_limited | 429 | Yes | "Slow down — try again in a moment." | | Gemini content filter | content_filtered | 200 | No | "The AI couldn't process that request. Try rephrasing." | | Gemini API error/timeout | generation_failed | 200 | Yes | "Generation failed. Trying again..." | | AI disabled for instance | instance_disabled | 403 | No | "AI features are not available." | | Invalid request format | invalid_request | 400 | No | "Something went wrong. Please try again." |

Note: Content filter and generation failure return HTTP 200 with an error response body (not 4xx/5xx). This is intentional — the API processed the request successfully; the AI model declined or failed. The client distinguishes via response.type === "error".

7.7 Synchronous-First, Streaming-Ready

The v2 endpoint is synchronous — the client sends a request and waits for the complete response. This is simpler to implement and sufficient for image generation (which returns a single image, not a token stream).

Streaming upgrade path for text responses: When Ask mode text responses become long enough to benefit from streaming, the endpoint can:

  1. Accept an Accept: text/event-stream header
  2. Return SSE-formatted chunks for text responses
  3. Continue returning JSON for image/choices/error responses

This can be implemented using Next.js Route Handler's native ReadableStream support or the Vercel AI SDK's streamText utility. The API contract remains backward-compatible — clients that don't send the streaming header get the synchronous response.


8. Conversation Persistence & History

Conversation state enables multi-turn interactions — "add a water feature" → (selects fountain) → "actually, put it closer to the patio" — without losing context between turns.

8.1 Storage Strategy: Server-Side (Hasura)

Recommendation: Store conversations in Hasura, not client-side.

Reasons:

  • Conversations persist across sessions — user can close the designer, come back, and continue
  • Analytics and usage tracking are server-side concerns
  • Multi-device support — start a conversation on desktop, continue on mobile
  • Audit trail for credit usage
  • Foundation for future "workspace AI memory" capabilities

Alternative considered: Client-side state. Simpler to implement but conversations vanish on page reload, can't be analyzed, and don't support multi-device. Rejected for production.

8.2 Hasura Schema

-- Conversation container
CREATE TABLE ai_conversation (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  design_3d_id UUID NOT NULL REFERENCES design_3d(id) ON DELETE CASCADE,
  workspace_id UUID NOT NULL REFERENCES workspace(id),
  user_id UUID NOT NULL REFERENCES "user"(id),
  mode TEXT NOT NULL,                    -- Initial mode (may change within conversation)
  created_at TIMESTAMPTZ DEFAULT now(),
  updated_at TIMESTAMPTZ DEFAULT now()
);

-- Individual turns within a conversation
CREATE TABLE ai_conversation_turn (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  conversation_id UUID NOT NULL REFERENCES ai_conversation(id) ON DELETE CASCADE,
  turn_number INTEGER NOT NULL,
  role TEXT NOT NULL,                    -- 'user' | 'model' | 'tool_result'
  parts JSONB NOT NULL,                 -- Array of parts (text, image refs, selections)
  tool_calls JSONB,                     -- Function calls made by the model
  response_type TEXT,                   -- 'image' | 'text' | 'choices' | 'markup_request'
  credits_used INTEGER DEFAULT 0,
  model_used TEXT,                      -- 'gemini-2.5-flash', 'gemini-2.5-flash-image'
  created_at TIMESTAMPTZ DEFAULT now()
);

-- Indexes
CREATE INDEX idx_conv_design ON ai_conversation(design_3d_id);
CREATE INDEX idx_conv_workspace ON ai_conversation(workspace_id);
CREATE INDEX idx_turn_conv ON ai_conversation_turn(conversation_id, turn_number);

Relationship to existing ai_inpainting table: The new schema replaces the ai_inpainting table for v2 conversations. The existing table continues to serve v1 requests. Migration is not required — v1 and v2 coexist.

8.3 Context Replay Strategy

When a request includes a conversationId, the API loads previous turns and replays them to Gemini for context:

Sliding window: Last 5-8 turns. This balances context quality with token cost:

  • 5 turns covers most follow-up flows (clarify → select → markup → generate → refine)
  • 8 turns handles longer design sessions
  • Beyond 8 turns, early context is usually irrelevant (the user has moved on)

Token budget: At ~500 tokens per turn average (text + tool calls), 8 turns = ~4,000 tokens of history. This is well within Gemini 2.5 Flash's 1M token context window and adds ~$0.0012 to the intent classification cost — negligible.

What's included in replayed turns:

  • User text prompts
  • Model tool calls and text responses
  • Tool results (summarized — not full image data)
  • User selections (which plant/object was chosen)

What's NOT replayed:

  • Full image data from previous turns (too expensive to re-encode)
  • Credit usage details
  • Technical metadata

8.4 Analytics Capture

Every conversation turn creates analytics events:

| Event | Fields | Purpose | |-------|--------|---------| | ai.request | mode, hasImage, hasMarkup, workspaceId | Usage patterns by mode | | ai.intent | toolCalled, mode, classificationTime | Intent routing accuracy | | ai.generation | model, generateTime, imageSize | Performance monitoring | | ai.credits | creditsUsed, creditsRemaining, operation | Credit consumption tracking | | ai.error | errorCode, model, retried | Error rate monitoring | | ai.conversation | turnCount, totalCredits, duration | Session analysis |

These events power the admin dashboard for credit usage, the workspace usage history view, and internal product analytics.

8.5 Future: Workspace AI Memory

The conversation persistence schema is designed to support future "AI memory" capabilities:

  • Cross-conversation learning: Aggregate plant preferences, style tendencies, and frequent requests across all conversations in a workspace
  • Design context: When starting a new conversation on an existing design, the AI knows what was previously generated and discussed
  • pgvector embeddings: Conversation summaries can be embedded and stored for semantic retrieval — "what did we discuss about the backyard last month?"

These are not v2 features, but the schema design doesn't block them.


Part III: Market Landscape

9. Market Overview

How the market currently handles AI in landscape design:

The landscape design market is split into two distinct camps: products deeply invested in AI (PRO Landscape+, Yardzen, LeanScaper) and products with essentially no AI capabilities (iScape, Planter). There is no middle ground — companies either committed to AI as a core strategy or haven't started. None offer what SimplyScapes is building: a conversational AI design partner.

PRO Landscape+ has the most AI features (5 tools including an outdoor living designer and AI eraser) but is Windows-only, desktop-only, and professional-only. Yardzen offers free AI-generated inspiration images but charges $995-$1,995 for human-designed plans with no self-service middle tier. LeanScaper has built a chat-based AI — but for business operations, not design. iScape has 4 million downloads but relies on AR, not AI.

In adjacent markets, the creative AI tools (Canva, Figma, Midjourney, Adobe Firefly, Runway, ChatGPT) have established mature interaction patterns for AI-assisted creation — but none have domain knowledge about plants, hardiness zones, companion planting, or landscape design principles.

Market maturity: Emerging — AI adoption is bimodal with no established middle ground. Customer satisfaction: Underserved — the gap between free AI inspiration and $995+ human design is the largest unmet need.

10. Vertical Market Analysis

Full analysis available in supporting/vertical-competitor-analysis.md. Key findings per competitor:

LeanScaper

Approach: Chat-based AI for landscape business operations (not design). Specialized agents (CFO, CMO, SOP) handle financial analysis, marketing, and process documentation. Voice interaction for fieldwork. Strengths: Validates conversational AI for the landscape vertical. Professionals engage with chat-based AI for complex tasks. Credit-based pricing works. Gaps: No design visualization, no image generation, no spatial design. Entirely non-overlapping with SimplyScapes' design focus. Takeaway: LeanScaper proves the interaction model; SimplyScapes brings it to design.

iScape

Approach: Photo-based landscape design via AR (augmented reality). Drag-and-drop placement of plants and objects on user's yard photo. 4M downloads. Strengths: Photo-based design on your own yard is the baseline expectation. Professional proposal generation bridges design to sales. Gaps: No AI features. No chat interface. Users must manually browse catalogs and place every element. App Store reviews criticize the absence of AI. Takeaway: The "design on your own photo" paradigm is validated. SimplyScapes leapfrogs by adding conversational AI to this paradigm.

PRO Landscape+ (Drafix)

Approach: Most AI-forward professional tool with 5 AI features. AI-to-CAD pipeline (concept → scaled drawing → material takeoff → proposal). Windows desktop only. Strengths: AI outputs feed directly into CAD for actionable professional output. 1,000+ real manufacturer paver patterns. Ask Wayne chatbot for software help. Gaps: Windows-only desktop. Professional-only ($900/yr). No natural language design requests. Ask Wayne is limited to software help, not design guidance. Takeaway: PRO Landscape+ is the quality bar for professional AI output, but their desktop-only, Windows-only positioning leaves the entire mobile-first and homeowner market open.

Yardzen

Approach: Hybrid AI + human design service. Free YardAI generates instant concepts from photos (16 style choices). Paid packages ($995-$1,995) add human designers. Trained on 50,000 real designs. Strengths: Free AI tool is a powerful acquisition channel. 28% conversion increase after leaning into human touchpoints over AI-only. Gaps: YardAI generates inspiration only — no interactive refinement, no actionable plans. $995 gap between free AI and human design. Takeaway: The "inspiration → professional design" gap is the core opportunity. SimplyScapes fills it with conversational AI that produces actionable output at $50-200 price points.

Planter

Approach: Focused vegetable garden planner with rule-based companion planting intelligence. Grid-based drag-and-drop. No AI (beyond icon generation). Strengths: Demonstrates that plant intelligence (companion planting, spacing, zone awareness) creates genuine user value even without AI. Gaps: No landscape design. No AI. No chat interface. Takeaway: The horticultural intelligence (zone data, companion planting, spacing) should be embedded in SimplyScapes' AI as foundational knowledge — then AI-powered generation layers on top.

Vertical market patterns:

  1. AI is used for generation, not conversation. No competitor offers a conversational design partner.
  2. Photo-based design is the baseline expectation. Every design competitor starts from the user's yard photo.
  3. The funnel gap is universal. Free AI inspiration → $995+ human design, with nothing in between.
  4. Human-AI hybrid outperforms AI-only. Yardzen's 28% conversion increase validates that consumers want AI assistance + human validation.
  5. Credit-based pricing is emerging. LeanScaper's credit model validates consumption-based pricing in the landscape vertical.

11. Adjacent Market Patterns

Full analysis available in supporting/adjacent-market-analysis.md. Key transferable patterns:

Canva AI — Contextual AI placement

Pattern: AI tools are embedded at the point of need, not siloed. Progressive disclosure via tiered credit limits (50 free / 500 Pro). Multi-backend abstraction (DALL-E + Imagen behind one interface). Adaptation: Embed AI generation directly into the landscape canvas — select empty area → "AI Fill" contextually. Route landscaping prompts to best model based on task. What doesn't transfer: Template-based paradigm breaks for site-specific landscape designs requiring 3D spatial reasoning.

Figma AI — Bidirectional canvas integration

Pattern: AI output is fully editable structured objects (not flat images). Variable credit cost by action complexity (30-100+ credits). Non-prompt AI tools (erase/isolate without text input). Adaptation: When AI suggests a plant, the result should ideally be a placed design object, not just a painted image. Offer direct-manipulation AI (select area → "fill with seasonal color"). What doesn't transfer: Figma's component library assumes identical reusable elements; landscape elements are variable.

Midjourney — Conversational prompt translation

Pattern: Users speak naturally; AI translates to optimized generation prompts. 2x2 grid for comparing variations. Voice-driven ideation. Preference learning from image ratings. Adaptation: Translate "I want a cozy backyard" into specific plant species, materials, and layout. Show 4 landscape variations for comparison. Voice-driven on-site design sessions. What doesn't transfer: Generates artistic images, not actionable plans. GPU-time billing doesn't map to design tasks.

ChatGPT Image Generation — Multi-turn conversational editing

Pattern: Users edit images through natural language dialogue ("move the tree left", "make the path wider"). Context-aware across multiple turns. No modal switch between chat and generation. Adaptation: Build iterative refinement where each turn modifies the existing design, not regenerates from scratch. Maintain full conversation context for coherent changes. What doesn't transfer: Generates flat raster images with no structured data (no plant names, no dimensions, no bill of materials).

Vercel AI SDK — Provider-agnostic infrastructure

Pattern: Unified interface across OpenAI, Anthropic, Google. Type-safe tool calling with Zod schemas. Agent loop for multi-step operations. Streaming-first. Next.js-optimized. Adaptation: Use the provider abstraction to start with Gemini and swap/add providers later. Define landscape-specific tools with Zod schemas for validated AI suggestions. Consider for the SDK layer. What doesn't transfer: Infrastructure only — provides building blocks but not domain logic.

Cross-industry insights not yet applied to landscaping:

  1. Conversational-to-structured-output. No product generates structured spatial designs through conversation — every tool produces either flat images or structured UI objects, never structured landscape plans.
  2. Passive preference learning. Midjourney's image-rating personalization hasn't been applied — show 30 landscape photos at onboarding, learn aesthetic preferences.
  3. Tiered quality/cost generation. Quick sketches, detailed plans, and photorealistic presentations should be different products at different price points.
  4. Enterprise credit pooling. Landscape companies with 3-15 designers need shared pools, not per-user limits.
  5. Voice-driven on-site design. Walk the property, dictate observations, get real-time concepts.

12. Patent & IP Findings

Overall patent risk: Low-Moderate Freedom to operate: Clear (with caution areas noted)

Full analysis available in supporting/patent-landscape.md.

12.1 Patents by Feature Area

| Feature Area | Risk | Key Concern | Mitigation | |-------------|------|-------------|------------| | Conversational Image Editing | High | Adobe has 5+ granted US patents (2012-2024) including US11972757B2 | Differentiate via landscape domain + LLM function calling (not canonical intention mapping) | | AI Landscape Design Generation | High | Home Outside US12518067B2 covers AI landscape design with scoring | Use generative AI (not database-comparison scoring); architecturally distinct | | Text-Guided Diffusion Editing | Moderate | Google has 22+ patents on specific diffusion techniques | Use third-party model APIs (providers indemnify against technique-level claims) | | Credit-Based Pricing | Low | Standard business practice, no AI generation pricing patents | Well-established SaaS prior art | | Intent Routing (Multimodal) | Low | General assistant patents exist but not design-tool-scoped | Design-domain-specific classification is distinct | | Markup-Guided Generation | Low | No blocking patents in landscape context | OpenAI's annotation patent covers analysis, not generation guidance | | Follow-Up Clarification | Low | Closest patent (WO2024158398A1) has ceased | Consider defensive publication |

12.2 Key Patents Found

| # | Patent | Title | Assignee | Filed | Risk | Notes | |---|--------|-------|----------|-------|------|-------| | 1 | US11972757B2 | Conversational Image Editing & Enhancement | Adobe | 2018/2023 | High | Aesthetic scoring + canonical intention mapping in conversational editing | | 2 | US12518067B2 | System for Generating a Landscape Design | Home Outside | 2019 | High | AI landscape design with calculator + scoring engine — most direct competitive threat | | 3 | US10579737B2 | NL Image Editing Annotation Framework | Adobe | 2018 | High | NL-to-editing-command translation | | 4 | US11257491B2 | Voice Interaction for Image Editing | Adobe | 2018 | High | Voice-driven editing commands | | 5 | US20230230198A1 | Interactive Image Creation via NL Feedback (TiGAN) | Adobe | 2022 | Moderate | Multi-round conversational editing | | 6 | US20240037822A1 | Prompt-to-Prompt Editing with Cross-Attention | Google | 2022 | Moderate | Diffusion-based cross-attention manipulation | | 7 | US11983806B1 | Image Generation (inpainting/outpainting) | OpenAI | 2023 | Moderate | Core diffusion-based editing mechanics | | 8 | US12039431B1 | Multimodal ML Model Interaction (annotations) | OpenAI | 2023 | Moderate | GUI annotation-based region analysis | | 9 | EP4553759A2 | Multi-round Conversational Image Editing | Baidu | 2024 | Moderate | CN/EP jurisdiction — monitor for US continuation | | 10 | US9412366B2 | NL Image Spatial and Tonal Localization | Adobe | 2012 | Moderate | Foundational NL image editing (2012) |

12.3 Competitor IP Activity

Adobe: Very strong portfolio — 5+ granted US patents spanning 2012-2024 on conversational image editing. US11972757B2 (granted 2024) is the broadest, covering conversational editing with aesthetic scoring and canonical intention mapping. Strategy is aggressive and expanding. Primary patent concern.

Home Outside, Inc.: Single granted patent (US12518067B2, exp. 2041) specifically covering AI landscape design generation with calculator engine, scoring engine, and database comparison. Most direct competitive threat in the vertical, but describes a fundamentally different architecture (database-lookup scoring) than generative AI conversation.

Google/Alphabet: 22+ patents covering specific diffusion-based editing techniques (cross-attention manipulation, null-text inversion, hint-driven editing). Implementation-specific rather than application-layer — manageable through third-party API usage where providers handle licensing.

OpenAI: Three relevant patents covering inpainting mechanics, hierarchical text-to-image generation, and visual annotation interaction. Limited but growing portfolio.

Baidu / ByteDance: Recent filings covering multi-round conversational image editing (EP4553759A2, WO2025209146A1). Primarily CN/EP jurisdictions — monitor for US continuation filings.

Canva: No AI-specific patents found. Relies on third-party models and licensing.

LeanScaper / Stability AI / Midjourney: No relevant patents found.

12.4 Defensive Publications Found

TDCommons search returned 22 defensive publications (2022-2026) related to AI creative tools:

| # | Source | Title | Published | Overlap | |---|--------|-------|-----------|---------| | 1 | TDCommons | AI-driven Special Effects Generation Framework | 09/2025 | Partial — AI generation pipeline | | 2 | TDCommons | Assistive Interaction Mechanisms for AI-Powered Art | 09/2025 | Partial — accessible AI art interaction | | 3 | TDCommons | Conversational Agent for Physical Fulfillment of GenAI | 12/2025 | Partial — conversational agent → physical actions | | 4 | TDCommons | Location/Context-Specific Generative Multimedia | 08/2025 | Partial — property-specific generation | | 5 | TDCommons | AI-Based Creative Companion for Content Creation | 01/2022 | Low — general AI creative assistant prior art |

The defensive publication landscape is growing for AI creative tools but remains thin for landscape-design-specific AI. This is an opportunity for SimplyScapes to file.

12.5 Freedom-to-Operate Assessment

Favorable factors:

  1. The full combination (conversational editing + domain object library + markup guidance + proactive clarification + credit billing + landscape-specific) is not claimed by any single patent.
  2. SimplyScapes operates in landscape design (plant placement, hardscape, spatial layout) — fundamentally different from Adobe's photographic enhancement domain (exposure, contrast, color balance, aesthetic scores).
  3. Credit/token billing, markup-guided generation, and intent routing for design tools remain largely unencumbered.
  4. Key competitors Canva, LeanScaper, Midjourney have no relevant patents.

High-concern areas:

  1. Adobe's conversational editing patents (US11972757B2 especially) — use LLM-based function calling for intent routing, NOT rule-based canonical intention mapping. Focus on landscape-specific intents rather than general photo editing. Avoid implementing aesthetic attribute scoring systems.
  2. Home Outside's landscape design patent (US12518067B2) — use generative AI (diffusion models) for visual generation, NOT database-lookup design recommendation. Ensure the workflow is conversational/iterative, not score-and-improve. Do not implement a scoring engine comparing against online landscape databases.

Medium-concern areas: 3. Google's diffusion technique patents — use third-party model APIs where providers handle patent licensing. Avoid implementing specific patented techniques in custom code. 4. Baidu's multi-round conversational editing — primarily CN/EP jurisdiction. Monitor for US continuations.

Recommendations:

  1. File defensive publications on TDCommons for: (a) property-context-aware landscape generation, (b) climate zone and plant hardiness integration in AI design, (c) multi-turn landscape refinement with spatial memory, (d) credit metering for domain-specific generation complexity, (e) object library integration with conversational editing.
  2. Design around Adobe: use LLM function calling (not canonical intention mapping), focus on landscape domain (not photo editing), avoid aesthetic scoring systems.
  3. Design around Home Outside: use generative AI conversation (not database-comparison scoring engines).
  4. Establish quarterly patent watch for Adobe, Google, Baidu, ByteDance, and Home Outside in CPC G06T11/00, G06F30/13.
  5. Flag US12518067B2 (Home Outside) and US11972757B2 (Adobe) for IP counsel review.

Note: Patent findings are NOT legal advice. Flag concerns for qualified counsel when warranted.

13. Academic & Open Source

Full analysis available in supporting/academic-open-source-scan.md.

13.1 Key Papers

| # | Type | Reference | Date | Relevance | |---|------|-----------|------|-----------| | 1 | Paper | DialogGen: Multi-modal Dialogue System for Multi-turn T2I | 2024 | Core architecture for MLLM-orchestrated multi-turn generation | | 2 | Paper | Talk2Image: Multi-Agent System for Multi-Turn Image Editing | 2025 | Multi-agent decomposition prevents intention drift | | 3 | Paper | TDRI: Two-Phase Dialogue Refinement for Interactive Generation | 2025 | Clarify-then-generate pattern for ambiguous requests | | 4 | Paper | Proactive Agents for Multi-Turn T2I Under Uncertainty | 2024 | Proactive clarifying questions improve satisfaction | | 5 | Paper | SmartEdit: Complex Instruction-based Image Editing (CVPR 2024) | 2024 | Joint MLLM+diffusion handles multi-step editing instructions | | 6 | Paper | BrushEdit: All-in-One Inpainting and Editing | 2024 | Free-form, multi-turn interactive editing with masks+text | | 7 | Paper | RouteLLM: Learning to Route LLMs with Preference Data | 2024 | 2x cost reduction routing between strong/weak models | | 8 | Paper | ToolACE: Winning the Points of LLM Function Calling (ICLR 2025) | 2024 | 8B models match GPT-4 on function calling with right training | | 9 | Benchmark | BFCL v3: Multi-turn Function Calling Benchmark | 2025 | JSON output reduces accuracy 27% vs. natural language reasoning | | 10 | Paper | Training-Free Sketch-Guided Diffusion | 2024 | Add sketch conditioning without retraining models | | 11 | Paper | AIdeation: Human-AI Collaborative Ideation (CHI 2025) | 2025 | Support both divergent and convergent design thinking | | 12 | Paper | Effects of Generative AI on Design Fixation (CHI 2024) | 2024 | Caution: AI generators increase fixation, reduce variety | | 13 | Paper | CVPR 2024 Instruction-guided Editing (Winning Solution) | 2024 | Pipeline: classify → identify region → mask → inpaint | | 14 | Survey | Multi-modal Intent Recognition Survey (EMNLP 2025) | 2025 | Single-modal data insufficient for complex intent classification | | 15 | Paper | Improving LLM Function Calling via Structured Templates (EMNLP 2025) | 2025 | Reasoning chain before tool calls improves accuracy |

13.2 Open Source Projects

| # | Project | Description | Relevance | |---|---------|-------------|-----------| | 1 | RouteLLM | Dynamic routing between strong/weak LLMs | Cost optimization for intent routing | | 2 | BrushNet | Plug-and-play inpainting for diffusion models | Mask-based editing without retraining | | 3 | Gorilla | LLM for API/function calling | Retriever Aware Training for changing tool sets | | 4 | Open Pencil | AI-native design editor (87 tools) | Reference for AI tool architecture in design | | 5 | Open Canvas | LangChain collaborative writing/coding agents | Content + reflection agent architecture | | 6 | Vercel AI SDK 6 | Provider-agnostic AI toolkit for Next.js | Direct integration path — tool calling, streaming, agents | | 7 | LangGraph | Stateful agent graphs with human-in-the-loop | State machine pattern for chat flow orchestration |

13.3 Key Architectural Patterns from Literature

1. Hybrid Intent Router (RouteLLM + EMNLP 2025 survey) Lightweight classifier as first stage → route to specialized agents. 2x cost reduction without quality loss. Directly maps to SimplyScapes' two-phase architecture.

2. Multi-Agent Decomposition (Talk2Image) Separate intention parser, task decomposer, specialized executors, and evaluator agents. Prevents intention drift in multi-turn conversations — critical for iterative design editing.

3. Clarify-Before-Generate (TDRI + Proactive Agents) When intent is ambiguous, ask targeted clarifying questions rather than guessing. Improved user satisfaction documented. Essential for landscape design where "make it look better" needs disambiguation.

4. Pipeline Editing (CVPR 2024 winning solution) Classify edit type → identify target region → generate mask → inpaint. More reliable than end-to-end approaches. Maps to SimplyScapes' mode-based architecture.

5. Structured Reasoning Before Tool Calls (EMNLP 2025) Generate reasoning chain before structured output. JSON output during reasoning reduces accuracy by 27%. Use natural language intermediate reasoning, then emit structured tool calls.

6. Anti-Fixation Design (CHI 2024) AI generators increase design fixation — users produce fewer ideas with less variety. Counter by proactively offering alternatives and variations, not just refining the first generated image.

7. Training-Free Sketch Conditioning (arXiv 2024) Add sketch/markup guidance to existing diffusion models without retraining. Latent optimization at each denoising step ensures adherence to spatial structure. Directly enables markup-guided generation.

13.4 Pricing Model Research

Academic and industry research confirms credit-based pricing is the dominant emerging model:

  • Credit models grew 126% YoY (35 to 79 companies in PricingSaaS 500 Index)
  • Seat-based pricing dropped from 21% to 15%
  • Top blocker: Customer anxiety about unpredictable credit burn rates
  • Mitigation: Display credit cost before each action, provide usage dashboards
  • Evolution path: Cost-plus credits → value-aligned credits → outcome-based pricing (McKinsey)

Part IV: Synthesis

14. Opportunity Map

Validated Patterns (safe to build on)

These approaches are used by multiple products with no patent barriers. They represent table-stakes expectations:

  1. Credit-based AI billing. Used by Canva, Figma, Midjourney, Adobe, Runway, ChatGPT. No IP barriers. Standard SaaS practice. SimplyScapes' 10-credit-per-image model aligns with market norms.

  2. Mode-based AI interfaces. Canva, Figma, and Adobe all organize AI capabilities into distinct modes/tools (generate, edit, erase, expand). The mode dropdown pattern is validated.

  3. Mask-based inpainting. Every design tool with AI editing uses user-defined masks to constrain generation. Well-established technique with open-source implementations (BrushNet, BrushEdit).

  4. Conversation history for context. ChatGPT, Midjourney, and Canva all maintain multi-turn context. Sliding window approaches are standard.

  5. Provider abstraction. Vercel AI SDK, LangChain, and every production AI app abstract the model provider. The AIProvider interface pattern is universal.

Differentiation Opportunities (where to innovate)

These combine market gaps, transferable patterns, and patent-free zones into genuine competitive advantages:

  1. Conversational design partner with domain object library

    • Market gap: No landscape AI product offers conversational flow that routes between generation, clarification, and structured object selection from a curated catalog.
    • Inspiration: ChatGPT's multi-turn editing + Figma's structured output + Canva's contextual placement.
    • Patent risk: Low. The combination of conversational editing with domain-specific object library is unclaimed.
    • Why it's different: Every competitor generates flat images. SimplyScapes can generate images AND place structured design objects (plants with metadata, hardscape with pricing) — bridging the gap between inspiration and actionable design.
  2. Two-phase intent routing with function calling

    • Market gap: No landscape tool classifies user intent before routing to specialized agents. All competitors are single-path (prompt → image).
    • Inspiration: RouteLLM (2x cost reduction), EMNLP 2025 structured reasoning, Gemini function calling.
    • Patent risk: Low. Intent routing patents are scoped to general voice assistants, not design tools.
    • Why it's different: A cheap text model decides what the user wants (10x cheaper than always running image generation). The mode dropdown provides a hint but doesn't constrain — the AI can override when intent doesn't match mode.
  3. Proactive clarification with visual choices

    • Market gap: No product in the vertical proactively asks follow-up questions with thumbnail-rich choices from a product catalog.
    • Inspiration: Proactive Agents paper (2024), TDRI two-phase dialogue (2025), ChatGPT's conversational refinement.
    • Patent risk: Low. Closest patent (WO2024158398A1) has ceased.
    • Why it's different: Instead of guessing what "a water feature" means, the system asks "which style?" and presents 6 fountains from the object library with thumbnails and prices. The user picks, and the selection (including metadata) feeds back to Gemini for contextually accurate generation.
  4. Markup-as-generation-guidance

    • Market gap: No landscape tool lets users draw on the canvas to guide AI placement. Competitors use only text prompts or predefined masks.
    • Inspiration: Training-free sketch-guided diffusion (2024), CVPR 2024 pipeline approach.
    • Patent risk: Low-Moderate. OpenAI's annotation patent covers analysis, not generation guidance. Differentiate clearly.
    • Why it's different: Users draw rough shapes and arrows on the design, and these spatial cues guide where and what Gemini generates. Landscapers think spatially — this matches their workflow.
  5. $995 gap between free AI and human design

    • Market gap: iScape offers free AI inspiration images. Yardzen charges $999+ for human-designed plans. Nothing exists between.
    • Inspiration: Canva's AI-assisted DIY model (user + AI together for $15/mo).
    • Patent risk: None — this is market positioning.
    • Why it's different: SimplyScapes can offer AI-assisted design at $29-79/mo that produces structured, actionable landscape plans — not just pretty pictures. The object library, plant database, and structured output bridge the gap between "inspiration" and "professional design."

Caution Zones (promising but constrained)

  1. Adobe's conversational editing portfolio — Adobe holds 5+ granted US patents covering conversational image editing (US11972757B2 is broadest — aesthetic scoring, canonical intention mapping, iterative suggestion). Do not implement: rule-based canonical intention mapping, aesthetic attribute scoring systems, or NL-to-editing-command translation. Alternative: Use LLM function calling for intent routing (architecturally distinct from Adobe's approach). Focus on landscape-specific intents (plant placement, hardscape, seasonal) rather than photo editing (exposure, contrast, color).

  2. Home Outside's landscape design patent — US12518067B2 (granted, exp. 2041) covers AI landscape design with calculator engine, scoring engine, and database comparison. Do not implement: a scoring engine that retrieves landscape data from online databases and compares against existing designs. Alternative: Use generative AI (diffusion models via Gemini API) for visual generation and conversational refinement. The approach is architecturally distinct — conversation-driven generation vs. database-comparison scoring.

  3. Autonomous multi-step editing — The Talk2Image multi-agent pattern (parser → generator → evaluator) is academically validated but production-complex. Multi-agent systems add latency and failure modes. Alternative: Start with the two-phase router (classify → execute) and only decompose into more agents if single-model quality degrades on complex requests.

  4. Sketch-to-design generation — Training-free sketch conditioning is research-validated but depends on Stable Diffusion / FLUX architectures. Gemini's image generation pipeline may not support external conditioning inputs. Alternative: Treat markup as metadata that enriches the text prompt ("generate a fountain in the upper-right quadrant, approximately 3 feet wide") rather than as a spatial conditioning signal.

15. The Technical Landscape

The architecture that emerges from research is a two-phase, provider-abstracted pipeline with these key components:

User Input (text + image + mask + markup + selection)
  │
  ├─ Phase 1: Intent Classification (Gemini 2.5 Flash — text only)
  │    ├─ Function calling with 6 tool declarations
  │    ├─ Mode hint from dropdown informs but doesn't constrain
  │    ├─ Result: generate_image | search_objects | answer_question |
  │    │          ask_followup | request_markup | erase_region
  │    └─ Cost: ~$0.001 per classification
  │
  ├─ Phase 2: Execution (model depends on intent)
  │    ├─ generate_image → Gemini 2.5 Flash Image ($0.039/image)
  │    ├─ search_objects → Hasura query (free)
  │    ├─ answer_question → Gemini 2.5 Flash text ($0.001)
  │    ├─ ask_followup → Return choices JSON (free)
  │    ├─ request_markup → Return instruction (free)
  │    └─ erase_region → Gemini 2.5 Flash Image ($0.039)
  │
  ├─ Credit Deduction (atomic Hasura operation)
  │    ├─ Pre-deduct before generation
  │    ├─ Refund on failure
  │    └─ 10 credits (image) / 2 credits (text) / 0 (follow-up)
  │
  └─ Conversation Persistence (Hasura)
       ├─ ai_conversation + ai_conversation_turn tables
       ├─ 5-8 turn sliding window for context replay
       └─ Analytics events for usage tracking

Key technical decisions validated by research:

  1. Synchronous-first, streaming-ready. Image generation is inherently synchronous (Gemini returns the full image). Text responses could stream but the added complexity isn't justified for short answers. Build the API synchronous with a stream: boolean parameter for future use.

  2. App Router migration. New endpoint at src/app/api/v1/ai/generate/route.ts. Keep the existing Pages Router endpoint (src/pages/api/ai-gen.ts) for v1 compatibility. Both can coexist in Next.js.

  3. Gemini-first, provider-agnostic. The AIProvider interface abstracts Gemini specifics. Start with Gemini (best price/quality for image generation as of March 2026). The interface allows swapping in other providers or routing between them based on task.

  4. Function calling for intent, not for generation. Gemini's function calling and image generation capabilities cannot be combined in a single API call. The two-phase architecture turns this limitation into a strength: cheap intent classification, expensive generation only when needed.

  5. Hasura for state, not for logic. Conversation persistence, credit management, and analytics all go through Hasura. Business logic (intent routing, prompt construction, provider selection) stays in the Next.js API layer.

Migration requirement: Gemini 2.0 Flash shuts down June 1, 2026. Migrate to Gemini 2.5 Flash before then. The 2.5 Flash model is cheaper ($0.30 vs. $0.15/1M input tokens) and supports both text and image generation with a single model family.

16. Open Questions

  • [ ] Gemini 2.5 Flash Image quality vs. 3.1 Flash Image. The 3.1 model costs 72% more ($0.067 vs. $0.039/image). Is the quality difference worth it for landscape-specific generation? Needs A/B testing with real design prompts.
  • [ ] Context window for image history. Gemini's context window accepts multiple images, but each 1024px image costs ~258 input tokens. How many reference images can be included before quality degrades or costs spike? Needs empirical testing.
  • [ ] Object library search quality. The search_objects function call depends on Hasura's text search or pg_trgm similarity. Is this sufficient for natural language queries like "something with purple flowers that grows in shade"? May need vector search (pgvector) or Gemini-mediated search.
  • [ ] Markup interpretation fidelity. When users draw on the canvas, how reliably can Gemini interpret spatial annotations in a reference image? Early testing needed with annotated design images.
  • [ ] Whitelabel credit isolation. The instance_id field in the API supports multi-tenant credit pools, but the Hasura permission model needs validation. Can workspace admins see credit usage across instances?
  • [ ] Rate limiting granularity. 30 req/min per workspace is proposed, but image generation requests are expensive and slow (~3-5s). Should image-generation requests have a separate, lower limit (e.g., 10/min)?
  • [ ] Conversation cleanup policy. How long should conversations persist? Storage is cheap, but old conversation context could confuse analytics. Consider auto-archiving after 30 days of inactivity.
  • [ ] Anti-fixation UX. CHI 2024 research shows AI generators increase design fixation. How should the UI counteract this — offer unprompted variations? Show "try something different" affordances?

17. Opportunity Assessment

Novelty: High. No product in the landscape vertical offers conversational AI design with structured object library integration, intent-based routing, proactive clarification, and markup-guided generation. The closest analog is ChatGPT's image editing, but it produces flat images with no domain knowledge, structured data, or object placement. The combination of capabilities is novel and defensible.

Feasibility: High. All core components are buildable with current technology:

  • Gemini 2.5 Flash handles both intent classification (function calling) and image generation
  • The two-phase architecture is well-documented in academic literature (RouteLLM, EMNLP 2025)
  • Credit billing is standard SaaS (Stripe native credits or Chargebee)
  • Hasura handles conversation persistence and credit state
  • The existing v1 inpainting pipeline proves the core image generation loop works

The main technical risk is Gemini's image generation quality for landscape-specific scenes. Early testing with real design prompts is the fastest way to validate.

Impact: Very High. This transforms SimplyScapes from "a design tool with an AI feature" to "an AI-powered design partner." The $995 gap between free AI inspiration (iScape) and human design services (Yardzen) is a massive, unserved market. A conversational AI design partner at $29-79/mo could capture landscape professionals who want AI assistance but need structured, actionable output — not just pretty pictures.

Timeline:

  • Phase 1 (4-6 weeks): Two-phase intent routing, basic credit system, conversation persistence. Extends v1 from single-turn to multi-turn with intent classification.
  • Phase 2 (4-6 weeks): Object library integration with search_objects and ask_followup response types. Proactive clarification with visual choices.
  • Phase 3 (4-6 weeks): Markup-guided generation, provider abstraction layer, advanced credit management (workspace pools, credit packs, Stripe integration).
  • Ongoing: System prompt refinement, model quality testing, analytics-driven optimization.

Recommended Next Steps

  1. Run /ss-product spec to write the technical specification for Phase 1 implementation (two-phase routing + credit system + conversation persistence).
  2. Run /ss-legal disclosure to generate a defensive publication covering: (a) conversational landscape design with domain object libraries, (b) intent routing in design-specific multimodal interfaces, (c) proactive clarification in generative design tools, (d) markup-guided generation for spatial design.
  3. Flag for IP counsel review: Home Outside US12518067B2 (AI landscape design generation) and Adobe US11972757B2 (conversational image editing). Both are granted and active — architectural differentiation strategy should be validated.
  4. A/B test Gemini 2.5 Flash Image vs. 3.1 Flash Image with 20-30 real landscape design prompts to validate quality/cost tradeoffs.
  5. Prototype system prompts using the templates in Section 5 with Gemini API Playground to validate intent routing accuracy before building the full pipeline.
  6. Migrate from Gemini 2.0 Flash to 2.5 Flash before the June 1, 2026 deprecation deadline.
  7. Establish quarterly patent watch for Adobe, Google, Baidu, ByteDance, and Home Outside in CPC G06T11/00, G06F30/13.

Sources

| # | Type | Reference | URL | |---|------|-----------|-----| | 1 | API Docs | Gemini 2.5 Flash API — Function Calling | https://ai.google.dev/gemini-api/docs/function-calling | | 2 | API Docs | Gemini 2.5 Flash API — Image Generation | https://ai.google.dev/gemini-api/docs/image-generation | | 3 | API Docs | Gemini API Pricing | https://ai.google.dev/gemini-api/docs/pricing | | 4 | Paper | RouteLLM: Learning to Route LLMs with Preference Data (2024) | https://arxiv.org/abs/2406.18665 | | 5 | Paper | DialogGen: Multi-modal Dialogue for Multi-turn T2I (2024) | https://arxiv.org/abs/2403.08857 | | 6 | Paper | Talk2Image: Multi-Agent Multi-Turn Image Editing (2025) | https://arxiv.org/abs/2508.06916 | | 7 | Paper | TDRI: Two-Phase Dialogue Refinement for Interactive Generation (2025) | https://arxiv.org/abs/2503.17669 | | 8 | Paper | Proactive Agents for Multi-Turn T2I Under Uncertainty (2024) | https://arxiv.org/abs/2412.06771 | | 9 | Paper | SmartEdit: Complex Instruction-based Image Editing (CVPR 2024) | https://openaccess.thecvf.com/CVPR2024 | | 10 | Paper | BrushEdit: All-in-One Image Inpainting and Editing (2024) | https://arxiv.org/abs/2412.10316 | | 11 | Paper | ToolACE: Winning the Points of LLM Function Calling (ICLR 2025) | https://arxiv.org/abs/2409.00920 | | 12 | Benchmark | BFCL v3: Berkeley Function-Calling Leaderboard | https://openreview.net/forum?id=2GmDdhBdDk | | 13 | Paper | Training-Free Sketch-Guided Diffusion (2024) | https://arxiv.org/abs/2409.00313 | | 14 | Paper | AIdeation: Human-AI Collaborative Ideation (CHI 2025) | https://arxiv.org/abs/2502.14747 | | 15 | Paper | Effects of GenAI on Design Fixation (CHI 2024) | https://dl.acm.org/doi/10.1145/3613904.3642919 | | 16 | Paper | Multi-modal Intent Recognition Survey (EMNLP 2025) | https://aclanthology.org/2025.findings-emnlp.823 | | 17 | Paper | Improving LLM Function Calling via Structured Templates (EMNLP 2025) | https://arxiv.org/abs/2509.18076 | | 18 | Paper | CVPR 2024 Instruction-guided Editing Winning Solution | https://arxiv.org/abs/2407.13139 | | 19 | Industry | 2025 State of SaaS Pricing (Growth Unhinged) | https://www.growthunhinged.com/p/2025-state-of-saas-pricing-changes | | 20 | Industry | Rise of AI Credits (Metronome) | https://metronome.com/blog/the-rise-of-ai-credits | | 21 | Industry | AI Pricing 2025 Field Report (Metronome) | https://metronome.com/blog/ai-pricing-in-practice-2025-field-report | | 22 | Industry | Evolving AI SaaS Monetization (McKinsey) | https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/upgrading-software-business-models-to-thrive-in-the-ai-era | | 23 | SDK | Vercel AI SDK 6 — Tool Calling | https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling | | 24 | Patent | US11972757B2 — Adobe Conversational Image Editing | https://patents.google.com/patent/US11972757B2 | | 25 | Patent | US12518067B2 — Home Outside AI Landscape Design | https://patents.google.com/patent/US12518067B2 | | 26 | Patent | US10579737B2 — Adobe NL Image Editing Framework | https://patents.google.com/patent/US10579737B2 | | 27 | Patent | US20230230198A1 — Adobe TiGAN Interactive Editing | https://patents.google.com/patent/US20230230198A1 | | 28 | Patent | US11983806B1 — OpenAI Inpainting/Outpainting | https://patents.google.com/patent/US11983806B1 | | 29 | Patent | US12039431B1 — OpenAI Visual Annotation Interaction | https://patents.google.com/patent/US12039431B1 | | 30 | Patent | US20250111139A1 — Adobe Design Document from Text | https://patents.google.com/patent/US20250111139A1 | | 31 | Patent | WO2024158398A1 — AI Drawing with Clarifying Questions | https://patents.google.com/patent/WO2024158398A1 | | 32 | Defensive | TDCommons — 22 publications on AI creative tools (2022-2026) | https://www.tdcommons.org/ | | 33 | Product | LeanScaper — Landscape Business Platform | https://www.leanscaper.com | | 34 | Product | iScape — Landscape Design App | https://www.iscapeit.com | | 35 | Product | PRO Landscape — Professional Design Software | https://www.prolandscape.com | | 36 | Product | Yardzen — Online Landscape Design | https://yardzen.com | | 37 | Product | Planter — Garden Planning App | https://planter.garden | | 38 | Open Source | RouteLLM (GitHub) | https://github.com/lm-sys/RouteLLM | | 39 | Open Source | BrushNet (GitHub) | https://github.com/TencentARC/BrushNet | | 40 | Open Source | Gorilla (GitHub) | https://github.com/ShishirPatil/gorilla | | 41 | Open Source | Open Pencil (GitHub) | https://github.com/open-pencil/open-pencil | | 42 | Open Source | LangGraph (GitHub) | https://github.com/langchain-ai/langgraph | | 43 | Report | WIPO GenAI Patent Landscape Report 2024 | https://www.wipo.int/web-publications/patent-landscape-report-generative-artificial-intelligence-genai |

Vertical Competitor Analysiscompetitive analysis

Vertical Competitor Analysis: AI Features in Landscape Design

Date: 2026-03-09 Scope: AI capabilities, interaction models, and pricing across five landscape design vertical competitors Purpose: Inform SimplyScapes generative AI chat interface design decisions


Competitor Analysis


LeanScaper

How they approach it: LeanScaper is the first AI-powered platform purpose-built for the landscape industry, but it targets business operations rather than design visualization. The platform centers on "Lana," a conversational AI assistant trained specifically for landscaping businesses. Lana coordinates a suite of specialized agents (CFO Agent, CMO Agent, SOP Agent, Strategy Agent, Assessment Agent) that handle financial analysis, marketing campaigns, process documentation, and business planning. Users interact through natural language chat prompts like "Build me a marketing strategy for my ideal customer profile" or "Build my 2026 growth plan for my hardscape division." The mobile field app includes voice interaction for hands-free use during fieldwork, a Huddle Agent that records and processes crew meetings into structured notes and action items, and a Field Request Agent that automatically categorizes and routes requests to Kanban boards.

What works well:

  • Chat-based conversational interface is natural and approachable for non-technical landscape contractors
  • Domain-specific agents avoid the "blank prompt" problem by channeling users toward defined business functions
  • Voice interaction from the field demonstrates practical understanding of how landscapers actually work
  • Credit-based pricing aligns cost with value delivered rather than seat-based licensing
  • Unlimited seats per plan removes adoption friction within organizations

Limitations / gaps:

  • No landscape design visualization capabilities whatsoever -- this is purely a business operations platform
  • No image generation, AR, or spatial design features
  • The platform does not help homeowners or designers create or visualize landscape concepts
  • Credit system may create anxiety about consumption, causing users to self-limit usage
  • Relatively new to market (credit-based billing launched March 2026)

Technical approach: LLM-powered conversational AI with specialized agent routing. Knowledge base ("LeanDocs & Files") feeds context to agents. Voice-to-text for field use. No computer vision, image generation, or spatial computing components. Architecture appears to be a multi-agent orchestration layer on top of general-purpose LLMs, fine-tuned with landscape industry domain knowledge.

Pricing model:

  • Free: $0/mo, 250 credits/mo, all features, unlimited users
  • Core: $300/mo, 3,000 credits/mo (1-3 crew operations)
  • Premium: $750/mo, 8,250 credits/mo (4-8 crews) -- most popular
  • Max: $1,500/mo, 18,000 credits/mo (12+ crews)
  • Top-up: $150 per 1,000 additional credits
  • All plans include identical features; credits are the only differentiator

Key takeaway for SimplyScapes: LeanScaper validates the chat-based AI interaction model for the landscape vertical -- professionals are willing to engage with conversational AI for complex business tasks. Their agent-routing architecture (directing users to specialized agents by domain) is a pattern worth studying. However, the complete absence of design features means they occupy a non-overlapping space. SimplyScapes has an opportunity to bring the same conversational approachability to the design side of landscaping, which LeanScaper does not address at all.


iScape

How they approach it: iScape is the most-downloaded landscape design app (~4 million downloads, 4.6-star rating) and positions itself as the tool that lets homeowners and professionals design directly on photos of their own yard. The core technology is augmented reality (AR), not AI -- users place trees, shrubs, patios, and landscape elements using their smartphone camera in real-time 3D, or work in a 2D mode overlaying objects on photos. The interaction model is drag-and-drop: users browse a product catalog, select elements, and position them on their yard image. Professionals use the Pro tier to generate PDF proposals with pricing and business branding. iScape has announced an AI-powered design feature in beta that would help visualize outdoor designs using photos, but as of early 2026, AI is not a shipping core feature. App Store reviewers have specifically criticized the absence of AI capabilities that competing apps offer.

What works well:

  • Photo-based design on your own yard is immediately understandable and builds trust in outcomes
  • AR visualization removes ambiguity about scale, placement, and fit
  • Massive product database (thousands of plants, hardscapes, products) gives specificity
  • Professional proposal generation (PDF with pricing, branding) bridges design to sales
  • Large existing user base provides distribution advantage for new feature adoption

Limitations / gaps:

  • No shipping AI features -- AR is the visualization technology, not generative AI
  • Users must manually browse catalogs and place every element; no automated design generation
  • No chat interface or natural language interaction
  • App Store reviews report crashes, slow loading, limited object manipulation (no tilt/rotate)
  • Premium pricing ($29.99/mo or $299.99/yr) is high relative to the feature set for homeowners
  • No intelligence around plant selection, climate zone recommendations, or companion planting

Technical approach: ARKit/ARCore-based spatial overlay on camera feed or static photos. Object library with pre-modeled 3D assets. Traditional UI (drag-and-drop, toolbars) rather than AI-driven workflows. The announced AI beta is uncharacterized technically but likely involves image generation given the "visualize designs using photos" description.

Pricing model:

  • Free: Limited database access, basic 2D/3D design (iOS only)
  • Plus: $14.99/mo (intermediate tier)
  • Pro: $29.99/mo or $299.99/yr -- full database, proposals, custom uploads (most popular)
  • Enterprise: Custom pricing, multi-user licenses, premium support, integrations
  • No AI features are currently gated behind any tier

Key takeaway for SimplyScapes: iScape proves that photo-based design on your own yard is the interaction model homeowners want -- they do not want to design in abstract. However, their lack of AI creates a large opening. Users still must do all the creative work manually. SimplyScapes can leapfrog by combining the "design on your own photo" paradigm with generative AI that proposes complete designs, handles plant selection intelligently, and engages through conversation rather than catalog browsing. The gap between iScape's manual drag-and-drop model and a conversational AI design partner is the core opportunity.


PRO Landscape+ (Drafix Software)

How they approach it: PRO Landscape+ is the most AI-forward professional landscape design tool on the market as of early 2026. Drafix Software has aggressively integrated AI into a comprehensive desktop platform that combines photo imaging, CAD, 3D rendering, and proposal generation. Their AI suite includes five distinct tools: (1) AI Outdoor Living Designer -- upload a customer photo, define the project area, and generate complete outdoor living concepts including paver selections from 1,000+ manufacturer patterns, pergolas, outdoor kitchens, and fire features; (2) AI Paver Tool -- apply realistic paver layouts to photos with perspective adjustment and multi-pattern preview; (3) AI Eraser -- instantly remove existing landscaping from photos; (4) AI Cutout Tool -- precise object extraction for compositing; (5) Ask Wayne -- an AI help assistant (chatbot) for navigating the software. The interaction model is button-driven within a traditional desktop application: users click tools, define areas, and let AI generate or modify within bounded parameters. The key differentiator is that AI outputs feed directly into CAD for scaled drawings and material takeoffs, making AI-generated concepts immediately actionable for construction.

What works well:

  • AI is deeply integrated into an established professional workflow, not bolted on
  • AI-to-CAD pipeline (concept to scaled drawing to material takeoff to proposal) is uniquely complete
  • Manufacturer-specific product libraries (1,000+ real paver patterns) ground AI outputs in purchasable reality
  • The eraser and cutout tools address real pain points in photo-based design workflows
  • Ask Wayne chatbot lowers the learning curve for a feature-rich desktop application

Limitations / gaps:

  • Windows-only desktop application -- no mobile, no web, no cross-platform access
  • Professional-only pricing and complexity; not accessible to homeowners
  • AI interaction is tool-based (click buttons, define areas), not conversational or generative in the open-ended sense
  • No natural language design requests (cannot say "design me a Mediterranean patio")
  • Single-user license per computer; no collaborative or cloud-based workflows
  • Ask Wayne appears limited to software help rather than design guidance

Technical approach: Desktop Windows application with integrated AI modules for image manipulation (eraser, cutout), generative concept creation (outdoor living designer), and pattern application (paver tool). Likely uses a combination of image segmentation models, inpainting/outpainting for the eraser, and template-based generation with perspective warping for paver layouts. The outdoor living designer appears to use constrained generation within defined project areas rather than unconstrained image generation. Ask Wayne is likely an LLM-based support chatbot scoped to software documentation.

Pricing model:

  • Value: $900/yr ($75/mo equivalent, billed annually, saves $180)
  • Flexibility: $90/mo (month-to-month)
  • Both plans include all features, updates, support, training, and mobile companion app
  • Single user, single computer license
  • AI features are included in all subscriptions -- not separately gated

Key takeaway for SimplyScapes: PRO Landscape+ demonstrates the highest current bar for AI in landscape design and proves that professionals will adopt AI tools when they fit into existing workflows. Their AI-to-CAD-to-proposal pipeline is the gold standard for professional output. However, their desktop-only, Windows-only, professional-only positioning leaves the entire homeowner and mobile-first market unaddressed. SimplyScapes can bring comparable AI intelligence to a conversational, mobile-first, cloud-based experience that serves both homeowners and professionals. The Ask Wayne chatbot is notably limited to software help -- a genuine design-focused chat AI would be significantly more valuable.


Yardzen

How they approach it: Yardzen operates as a hybrid AI + human design service rather than a software product. Their model has two layers: (1) YardAI, a free web tool where homeowners upload a yard photo, choose from 16 design aesthetics (Cottage, Contemporary, Mediterranean, Rustic, Southwestern, Traditional, Modern Boho, etc.), and receive instant AI-generated landscape concepts; and (2) paid design packages ($995-$1,995) where professional landscape designers create custom CAD plans, photorealistic renderings, plant/materials lists, and lighting plans with human revision cycles. YardAI is explicitly positioned as an "inspiration tool" and "starting point," not a replacement for human design. It is trained on 50,000+ real landscape designs created by Yardzen's professional designers. After AI-only competitors emerged in 2024-2025, Yardzen deliberately leaned into human touchpoints -- adding dedicated project managers, one-on-one kickoff calls, and cost advisors. This counter-positioning resulted in a 28% increase in website conversion and 94% of customers choosing project-manager-guided packages over cheaper options.

What works well:

  • Free AI tool (YardAI) serves as top-of-funnel acquisition -- users experience instant value before paying
  • Training on 50,000 real professional designs produces more architecturally aware results than generic AI
  • 16 named design styles give users a vocabulary for expressing preferences without design expertise
  • Integrated product specifications (Belgard, PebbleTec, Crate & Barrel) ground AI suggestions in purchasable products
  • Human-AI hybrid model builds trust: AI inspires, humans refine and finalize
  • Deliberate counter-positioning against AI-only competitors proved that consumers value human expertise

Limitations / gaps:

  • YardAI generates inspiration images, not actionable plans -- no measurements, no plant lists, no materials takeoff
  • AI-generated images can include fictional elements and unrealistic trees
  • No interactive refinement of AI concepts (cannot say "move the tree left" or "make it more rustic")
  • Paid design packages are expensive ($995-$1,995) and slow (multi-week turnaround with revision cycles)
  • 7% revenue share fee on contractor projects through their partner program
  • No self-service design tools -- gap between free AI inspiration and expensive human service

Technical approach: Image-to-image generative AI (likely diffusion-based) trained on proprietary dataset of 50,000+ professional landscape designs. The model conditions on uploaded photos and selected style tokens. Spatial understanding (architecture, elevation, layout) is achieved through training data curation rather than explicit 3D reasoning. Product integration appears to be post-generation tagging/recommendation rather than generation-time conditioning. YardAI runs as a standalone web app (ai.yardzen.com) separate from the core service platform.

Pricing model:

  • YardAI: Free ($0), unlimited use, web-based
  • Essential package: $995 (or 6x $165.83) -- 2D CAD plans, plant/materials list, no revisions
  • Classic package: $1,395 (or 6x $232.50) -- adds photorealistic renders, furniture selections, 1 revision, project manager
  • Signature package: $1,995 (or 6x $332.50) -- adds lighting plan, nighttime renders, 2 revisions, cost advisor
  • Installment plans available (6-month split)

Key takeaway for SimplyScapes: Yardzen's two-layer model (free AI inspiration to paid human design) is the most strategically instructive competitor. Their YardAI proves that free AI-generated landscape concepts are a powerful acquisition tool, but their deliberate pivot toward human touchpoints reveals that consumers do not trust AI alone for high-stakes design decisions. SimplyScapes should study the "gap" in Yardzen's model: there is no middle tier between free AI inspiration images and $995+ human design packages. A conversational AI chat interface that offers iterative refinement, actionable plant lists, and budget-aware recommendations could occupy this profitable middle ground.


Planter

How they approach it: Planter is a focused garden planning app for vegetable gardens and raised beds, operating on a square-foot gardening model. The interaction model is purely visual drag-and-drop: users create a grid representing their garden bed, then drag plant icons onto squares. The app provides companion planting guidance (which plants work well together and which conflict), automatic spacing calculations (e.g., tomatoes occupy 4 squares, shallots occupy 1), planting calendars based on hardiness zone, and a database of 80+ base plants with 1,000+ varieties. There is no generative AI -- the "intelligence" is rule-based: when you place a plant, the app flags compatible and incompatible neighbors and suggests optimal timing. The only AI feature is a minor one: AI-generated icons for custom plant varieties (added September 2025). The app is praised for simplicity and focus, earning strong reviews from vegetable gardeners.

What works well:

  • Extreme simplicity -- plan a garden in minutes with no learning curve
  • Companion planting rules are immediately useful and educational
  • Zone-based planting calendar personalizes timing recommendations
  • Affordable pricing ($24.99/yr or $99.99 lifetime) makes it accessible
  • Narrow focus on vegetable gardening means it does one thing very well
  • Cross-platform (iOS, Android, web) with sync

Limitations / gaps:

  • No AI beyond icon generation -- all planning intelligence is rule-based
  • No landscape design capability (only vegetable/herb gardens in raised beds)
  • No photo-based visualization of how the garden will look
  • No chat or conversational interface
  • No generative design suggestions ("design me a salsa garden" is not possible)
  • Limited to square-foot gardening paradigm; does not handle ornamental or full-yard design
  • No climate-aware plant recommendations beyond basic hardiness zone

Technical approach: Grid-based spatial planner with a plant database storing companion relationships, spacing requirements, and zone-specific planting windows. Rule engine evaluates plant adjacencies and flags conflicts. No machine learning, computer vision, or generative models. The AI icon generation (for custom plants) likely uses an image generation API but is peripheral to core functionality.

Pricing model:

  • Free: 1 garden, calendar, custom plants
  • Premium: $24.99/yr -- unlimited gardens, no ads, notes, custom backgrounds, web access
  • Lifetime: $99.99 one-time -- all Premium features permanently
  • No AI-specific pricing tier

Key takeaway for SimplyScapes: Planter demonstrates that even without AI, a well-designed companion planting engine with clear visual feedback creates genuine user delight. Their rule-based intelligence (companion/combative plant relationships, zone-based timing, spacing logic) represents baseline domain knowledge that any AI-powered landscape tool should embed. SimplyScapes should incorporate this kind of horticultural intelligence into its AI chat interface -- when a user asks for plant recommendations, the system should understand companion planting, spacing, zone compatibility, and seasonal timing as foundational knowledge, then layer generative design capabilities on top.


Vertical Market Patterns

  1. AI adoption is bimodal. Competitors are either deeply invested in AI (PRO Landscape+, Yardzen, LeanScaper) or have essentially no AI features (iScape, Planter). There is no gradual middle ground -- companies either committed to AI as a core strategy or have not yet started.

  2. Photo-based design is the baseline expectation. Every design-focused competitor (iScape, PRO Landscape+, Yardzen) uses the customer's own yard photo as the starting canvas. Abstract design tools with no connection to the user's actual space are not competitive.

  3. AI is used for generation, not conversation. None of the competitors offer a true conversational AI interface for design. PRO Landscape+ uses button-driven AI tools. Yardzen uses style-selector-to-generation. LeanScaper has chat, but for business ops, not design. The conversational design partner is an unoccupied niche.

  4. Professional and consumer markets remain separate. PRO Landscape+ serves professionals exclusively (Windows desktop, $900/yr). iScape bridges both but leans consumer. Yardzen is consumer-only with human designers. No single product serves both audiences with AI-powered design.

  5. The funnel gap is consistent. Yardzen's free AI inspiration has no self-service upgrade path below $995. iScape's free tier has no AI at all. There is a consistent gap between "free/cheap exploration" and "professional output" that no competitor fills with AI.

  6. Manufacturer product integration is emerging. Both PRO Landscape+ (1,000+ paver patterns) and Yardzen (Belgard, PebbleTec, Crate & Barrel) are integrating real purchasable products into AI outputs. This grounds designs in commercial reality and opens monetization through product partnerships.

  7. Human-AI hybrid models outperform AI-only. Yardzen's deliberate pivot toward human touchpoints (project managers, kickoff calls) after AI-only competitors emerged -- resulting in 28% higher conversion -- signals that consumers want AI assistance but human validation for high-stakes outdoor renovation decisions.

  8. Credit-based and usage-based pricing is emerging. LeanScaper's credit model (where different actions cost different amounts) represents a new pricing paradigm in the vertical, moving away from flat subscriptions toward value-aligned consumption pricing.


Vertical Market Gaps

  1. No conversational AI design partner exists. No competitor offers a chat-based interface where homeowners can describe what they want in natural language and iteratively refine a landscape design through conversation. This is the single largest gap in the market.

  2. No AI-powered middle tier between inspiration and professional design. Yardzen's model exposes a $995 gap between free AI inspiration images and human-designed plans. A product that generates actionable (not just inspirational) landscape plans through AI at $50-200 would address massive unmet demand.

  3. No mobile-first AI design tool. PRO Landscape+ is Windows-only desktop. YardAI is a basic web tool. iScape is mobile but has no AI. No competitor combines mobile-first design with generative AI capabilities.

  4. No AI-driven plant intelligence. Despite landscape design being fundamentally about plants, no competitor uses AI for intelligent plant selection that considers climate zone, soil type, sun exposure, water requirements, companion planting, seasonal interest, maintenance level, and budget simultaneously. Planter has rule-based companion planting; everyone else treats plants as catalog items.

  5. No iterative AI refinement. YardAI generates a concept but cannot refine it ("make it more drought-tolerant," "swap the maple for something smaller"). PRO Landscape+ AI generates within constraints but does not accept natural language feedback. The ability to conversationally iterate on AI-generated designs is absent from every competitor.

  6. No budget-aware AI design. No competitor's AI considers budget constraints when generating designs. Cost information exists (Yardzen has a cost advisor, PRO Landscape+ does material takeoffs) but is not integrated into the generative process. An AI that designs within a stated budget would be differentiated.

  7. No cross-platform cloud collaboration. PRO Landscape+ is single-user, single-machine. iScape is single-device. Yardzen collaboration happens through the service team. No competitor offers real-time collaborative design editing with AI assistance across devices.

  8. No AI that learns from the user's property. Despite every competitor starting from a user's photo, none build a persistent model of the property (existing plants, soil conditions, sun patterns, irrigation) that improves recommendations over time. Each design session starts from scratch.

  9. No integration between business operations AI and design AI. LeanScaper does business ops. PRO Landscape+ does design. No product bridges both -- an AI that helps design the landscape AND generates the proposal, schedules the crew, and estimates the job would be uniquely comprehensive.

  10. No AI-generated maintenance plans. Every competitor focuses on the design moment. None use AI to generate ongoing maintenance schedules, seasonal care instructions, or long-term landscape evolution plans based on the design they helped create.

Adjacent Market Analysismarket analysis

Adjacent Market Analysis: AI Interaction Patterns, Credit Systems & Design Integration

Date: 2026-03-09 Type: Supporting Research — Competitive Intelligence Parent: generative-ai-chat-interface


Purpose

This analysis examines seven adjacent-market products that have built AI-powered creative tools with interaction patterns, credit systems, and design integration approaches relevant to SimplyScapes. Each product offers transferable lessons for building a generative AI chat interface for landscape design — even though none of them operate in the landscaping vertical.


Product Analyses


Canva AI (Magic Studio) — Democratized AI design with credit-gated features

Their solution: Canva Magic Studio bundles over 25 AI-powered tools directly into the design canvas, including Magic Design (prompt-to-layout), Magic Media (text-to-image using DALL-E and Imagen backends), Magic Write (AI copywriting), Magic Animate (one-click animation), and Magic Eraser. Rather than presenting AI as a separate mode, Canva distributes AI tools throughout the editing interface — users encounter them contextually right where they are working. The platform supports multimodal input (text prompts, image uploads, voice dictation) and outputs across text, image, and video. Users can select from modalities ("Design," "Image," "Doc," "Code," "Video clip") and fine-tune results by providing feedback, uploading reference images, or trying suggested prompts.

The pattern worth studying:

  1. Contextual AI placement: AI tools are embedded at the point of need rather than siloed in a separate "AI mode." Users do not leave their workflow to access AI features.
  2. Progressive disclosure of complexity: Free users get 50 AI image generations total; Pro users get 500/month. The system introduces AI capabilities gradually — users see what is possible, hit a soft limit, and are nudged toward paid tiers.
  3. Multi-backend abstraction: Canva routes to OpenAI DALL-E, Google Imagen, or its own models behind a single "Magic Media" interface. Users never choose a model — they describe what they want and the platform picks the best backend.
  4. Real-time usage tracking: As of March 2026, Canva introduced a real-time tracker so users can monitor their AI usage allowance within the app settings.
  5. Suggested prompts and inspiration: When users open the image generation tool, they see suggested prompt ideas to reduce the blank-canvas problem.

How it could adapt to landscaping:

  • Embed AI generation directly into the landscape design canvas — e.g., a user selects an empty yard area and an "AI Fill" option appears contextually.
  • Use the multi-backend abstraction pattern to route landscaping prompts to the best model (e.g., a fine-tuned garden model for plant placement, a general model for hardscape rendering).
  • Offer suggested prompts tuned to landscaping scenarios: "Modern xeriscaped front yard," "Shade garden under mature oaks," "Low-maintenance backyard with play area."
  • Implement real-time credit tracking so landscaping professionals can monitor usage across their team.

What doesn't transfer:

  • Canva's design paradigm is 2D/flat templates. Landscaping requires 3D spatial reasoning, plant growth simulation, and terrain awareness that template-based generation cannot handle.
  • The "everything is a template" mental model breaks down when users need site-specific designs tied to actual property dimensions.
  • Canva's AI is generic — it has no domain knowledge about hardiness zones, drainage, sun exposure, or plant compatibility.

Credit/pricing model:

  • Free: 50 total AI generations (lifetime cap)
  • Pro ($15/month): 500 AI generations per month
  • Teams ($10/user/month, 3-user minimum): 500 AI generations per user per month
  • Business (~$20/user/month): Same AI allowance plus Leonardo.ai integration and IP indemnification
  • Credits are per-user, not pooled (except enterprise custom deals)
  • No per-generation pricing — AI is bundled into the subscription tier
  • Overage: users hit a wall and must upgrade; no pay-as-you-go option for extra credits

Figma AI — In-context AI assistance tightly coupled to the design canvas

Their solution: Figma has layered AI capabilities directly into the design tool rather than offering a separate AI product. Key features include: Figma Make (prompt-to-prototype generation that creates responsive layouts, components, and interactive prototypes from natural language), First Draft (AI-generated starting points for common UI patterns), AI image editing tools (erase object, isolate object, expand image — all operating directly on canvas elements without text prompts), and Code-to-Canvas (announced February 2026 with Anthropic, enabling developers to push Claude-generated UI code directly into Figma as editable design layers). Critically, Figma Make is not just a generator — users can start with a prompt, refine visually, adjust generated code, or re-prompt to explore new directions, all within a single workspace.

The pattern worth studying:

  1. Bidirectional AI-canvas integration: AI does not just generate static outputs — it produces fully editable, structured design objects that integrate into the user's existing design system (colors, typography, components from their libraries).
  2. Granular credit pricing by complexity: Simple AI actions (image edits) cost fewer credits than complex ones (full prototype generation at 100+ credits). This aligns cost with value delivered.
  3. Non-prompt AI tools: The erase/isolate/expand image tools require no text prompts — they work through direct manipulation (select an area, click "erase"). This is important for users who find prompting difficult.
  4. Design system awareness: Figma Make can import and respect existing design libraries, so AI output is brand-consistent rather than generic.
  5. Iterative refinement loop: Users prompt, see a result, refine visually, re-prompt if needed — the AI output is a starting point, not a final product.

How it could adapt to landscaping:

  • Build a landscape design canvas where AI-generated plans are fully editable objects (draggable plants, resizable hardscape elements, adjustable zones) rather than flat images.
  • Offer direct-manipulation AI tools for landscape editing: select a garden bed area and click "fill with shade-tolerant perennials" without needing to write a prompt.
  • Let users import their design preferences (favorite plant palettes, preferred styles, material choices) so AI output respects their established design language.
  • Implement the iterative refinement loop: AI generates a base landscape plan, the user adjusts plant placement, AI fills in gaps or suggests complementary plantings.

What doesn't transfer:

  • Figma's output is screen-based UI — it has no concept of physical space, elevation changes, or real-world constraints.
  • The Code-to-Canvas pattern (converting code to visual design) does not have a direct landscaping equivalent, though the concept of "specification to visual" could apply.
  • Figma's component library model assumes reusable, identical UI elements; landscaping elements are more variable (each tree grows differently, each site has unique conditions).

Credit/pricing model:

  • Starter: 150 credits/day, 500/month maximum
  • Professional ($5/editor/month): 3,000 credits/month per full seat
  • Organization ($5/editor/month): 3,500 credits/month per full seat
  • Enterprise ($5/editor/month): 4,250 credits/month per full seat
  • Dev/Viewer seats: 500 credits/month on all paid plans
  • Credit costs by action: Simple image edits ~few credits; Figma Make generation 30-100+ credits depending on complexity
  • New (March 2026): AI credit subscriptions (shared team pool at better rates) and pay-as-you-go billing (Q2 2026)
  • Credits reset monthly; enforcement of limits begins March 18, 2026

Midjourney — Conversational image generation with voice-driven iteration

Their solution: Midjourney is an AI image generation platform that originated on Discord and has since built a full web interface at midjourney.com. Its core innovation is the conversational, iterative generation workflow. In Conversational Mode, users describe ideas in natural language to an AI that writes optimized prompts on their behalf. Users can activate voice input (click microphone, speak, stop) for a hands-free experience. The platform recently introduced Draft Mode for rapid idea exploration and a unified web editor combining inpainting (brush-based region replacement), outpainting (canvas extension), smart select, layers, and a "Suggest Prompt" feature that reverse-engineers prompts from existing images. V7 added personalization — users rate ~200 images and the model adapts toward their aesthetic preferences.

The pattern worth studying:

  1. Conversational-to-prompt translation: Users speak naturally ("a peaceful Japanese garden with a koi pond") and the AI translates this into an optimized generation prompt. This removes the "prompt engineering" barrier entirely.
  2. Grid-based choice presentation: Midjourney generates 4 variations simultaneously (a 2x2 grid), letting users compare options at a glance before committing to upscale (U1-U4) or create variations (V1-V4).
  3. Voice-driven creative workflow: The Draft Mode + voice input combination allows rapid ideation — talk, see results, tweak, re-roll — without typing.
  4. Personalization through preference learning: By rating ~200 images, users train a personal style model. This is a passive way to capture design preferences without requiring users to articulate them.
  5. Unified editor for post-generation refinement: The web editor brings inpainting, outpainting, layers, and re-prompting into a single view, so users can refine AI output without leaving the platform.
  6. Suggest Prompt (reverse engineering): Users can upload an image and get an AI-generated prompt that would recreate it — useful for understanding and iterating on reference images.

How it could adapt to landscaping:

  • Implement conversational prompt translation for landscape design: "I want a low-maintenance front yard with some color" becomes an optimized prompt with specific plant suggestions for the user's zone.
  • Show 4 landscape plan variations side-by-side for comparison — different styles, plant palettes, or layout approaches for the same yard.
  • Offer voice-driven design sessions where landscapers describe their vision while looking at the property, and the AI generates concepts in real time.
  • Build preference learning into the onboarding: show users 50-100 landscape photos, have them rate preferences, and use this to bias all future AI output toward their aesthetic.
  • Allow "Suggest Prompt" from existing landscape photos: upload a photo of a yard the user admires, and the AI describes the design elements to use as a starting point.

What doesn't transfer:

  • Midjourney generates artistic images, not actionable plans. A beautiful rendering of a garden is not the same as a plantable design with species names, spacing, and installation instructions.
  • The Discord-first heritage created a community-oriented workflow (public galleries, shared channels) that does not fit a B2B landscaping tool where designs are client-confidential.
  • GPU time billing works for art generation but is hard to map to landscaping design tasks that have variable complexity (a 500 sq ft patio vs. a 5-acre estate).

Credit/pricing model:

  • Basic ($10/month, $96/year): ~3.3 GPU hours/month of Fast generation
  • Standard ($30/month, $288/year): ~15 GPU hours/month Fast + unlimited Relax Mode
  • Pro ($60/month, $576/year): ~30 GPU hours/month Fast + unlimited Relax + Stealth Mode (private gallery)
  • Mega ($120/month, $1,152/year): ~60 GPU hours/month Fast + unlimited Relax + Stealth Mode
  • Billing is by GPU time, not per image — a standard 4-image grid uses ~1 minute of Fast time
  • Relax Mode (Standard+ plans) offers unlimited generations at lower priority, with no Fast time consumption
  • No free tier or trial as of 2025
  • Annual billing gives ~20% discount

Adobe Firefly — Enterprise-grade AI generation with IP-safe credit pooling

Their solution: Adobe Firefly is Adobe's generative AI engine, integrated across Creative Cloud applications (Photoshop, Illustrator, Premiere Pro, After Effects) and available as a standalone web app. Its key differentiator is IP indemnification — Firefly models are trained exclusively on Adobe Stock, openly licensed content, and public domain material, making outputs commercially safe. The credit system is tiered by action complexity: standard image generation costs 1 credit, while premium video generation costs 20-100 credits per second. Enterprise customers can access Firefly Foundry for brand-specific model training on their own content, guidelines, and IP. Recent additions include Firefly Boards (collaborative AI generation workspace) and deeper video integration with Premiere Pro.

The pattern worth studying:

  1. Tiered credit cost by output type: Standard image generation (1 credit) vs. premium video (20-100 credits/second) creates a natural value hierarchy where more expensive outputs cost more.
  2. IP indemnification as a feature: Adobe guarantees commercial safety of AI outputs and provides legal protection. This is a premium feature that justifies higher pricing.
  3. Deep integration into existing professional workflows: Firefly is not a standalone tool — it lives inside Photoshop's Generative Fill, Illustrator's vector generation, and Premiere's video editing. Professionals never leave their primary tool.
  4. Enterprise credit pooling: While individual plans do not pool credits, enterprise customers can purchase shared credit pools, allowing teams to allocate AI resources based on project needs.
  5. Brand-specific model training (Firefly Foundry): Enterprise customers can train custom models on their own content, ensuring AI output matches brand guidelines.
  6. Unlimited standard generation promotions: Adobe periodically offers unlimited standard image generation to subscribers, using it as a growth lever and upsell mechanism.

How it could adapt to landscaping:

  • Implement tiered credit costs: a quick "style preview" rendering might cost 1 credit, while a full 3D walkthrough video might cost 20+ credits.
  • Offer IP indemnification for AI-generated landscape designs — guarantee that generated designs do not infringe on existing landscape architecture IP.
  • Integrate AI generation directly into whatever design tool landscapers already use, rather than requiring them to switch to a separate AI tool.
  • Enable landscape companies to train custom models on their portfolio of completed projects, so AI output matches their established design style.
  • Offer enterprise credit pools so a landscape company can allocate AI credits across their design team based on project needs.

What doesn't transfer:

  • Adobe's ecosystem lock-in strategy (Firefly works best inside Creative Cloud) may not apply — SimplyScapes is building a new platform, not extending an existing one.
  • The IP indemnification model requires training exclusively on licensed content, which is expensive and may limit output diversity for landscaping use cases.
  • Adobe's pricing assumes professional creative workers who already pay $55-80/month for Creative Cloud. Landscaping professionals have different software budgets.

Credit/pricing model:

  • Firefly Standard ($9.99/month): 2,000 generative credits/month
  • Firefly Pro ($29.99/month): 7,000 generative credits/month
  • Firefly Premium ($199.99/month): 50,000 generative credits/month
  • Creative Cloud All Apps ($59.99/month): Includes 4,000 credits/month (recently increased from 3,000)
  • Credit costs by action: 1 credit per standard image generation; 20 credits/second for 1080p video; 100 credits/second for high-quality video; 5 credits/second for translation
  • Enterprise (ETLA): 4,000-8,000 credits/user/month; optional credit pool add-on
  • On-demand purchase: Available for extra credits beyond monthly allocation
  • Credits are per-user, not pooled on individual and team plans; enterprise can optionally pool

Runway ML — Creative AI with transparent per-second API pricing

Their solution: Runway ML is a browser-based creative AI platform focused on video and image generation, offering models like Gen-4 (text/image-to-video), Gen-3 Alpha, and integrations with third-party models like Google Veo 3. The platform provides both a consumer-facing web interface and a developer API for embedding AI generation into custom applications. The API uses a simple, transparent pricing model: credits are purchased at $0.01/credit, and each model has a per-second credit cost (e.g., Gen-4 Video at 12 credits/second = $0.12/second). The web interface provides tools for text-to-video, image-to-video, video editing, and image manipulation, with Gen-4 achieving strong character consistency across scenes using reference images.

The pattern worth studying:

  1. Transparent per-unit API pricing: $0.01/credit with published per-second costs per model. Developers can calculate exact costs before building. No hidden fees or complex tiering.
  2. Tiered model quality/speed/cost: Gen-4 Turbo (5 credits/sec, fast, cheaper) vs. Gen-4 standard (12 credits/sec, higher quality) vs. Gen-4 Aleph (15 credits/sec, highest quality). Users choose their quality-cost tradeoff.
  3. "Unlimited" with quality tradeoff: The Unlimited plan ($76/month) offers unlimited generations at "relaxed rate" (lower priority queue). This lets heavy users generate without anxiety while managing infrastructure costs.
  4. Dual interface strategy: A polished web UI for creative professionals and a developer API for integration into custom workflows. Same underlying models, different access patterns.
  5. Reference image consistency: Gen-4 uses reference images to maintain character/scene consistency across multiple generations — critical for any design workflow that requires coherent output across iterations.

How it could adapt to landscaping:

  • Offer transparent per-generation pricing for the API, so landscape software companies can embed SimplyScapes AI generation at predictable costs.
  • Provide tiered quality levels: "Quick Preview" (fast, low credit cost, low resolution) vs. "Client Presentation" (slower, higher cost, photorealistic rendering).
  • Offer a "Relax Mode" equivalent for unlimited low-priority generation, useful for exploration and brainstorming phases of landscape design.
  • Build both a web UI (for landscape designers using SimplyScapes directly) and an API (for integration into existing landscape design software).
  • Use reference image consistency to maintain design coherence across multiple views of the same landscape (front view, side view, aerial view).

What doesn't transfer:

  • Runway is optimized for video generation, which has a very different cost structure from landscape plan generation.
  • The per-second billing model assumes time-based output (video). Landscape design output is area-based or element-based, requiring a different unit of measurement.
  • Runway's creative user base (filmmakers, video editors) has different expectations and workflows than landscape professionals.

Credit/pricing model:

  • Free: 125 credits (one-time, not recurring)
  • Standard ($12/month): 625 credits/month
  • Pro ($28/month): 2,250 credits/month
  • Unlimited ($76/month): 2,250 credits/month + unlimited Relax Mode (lower priority)
  • Enterprise: Custom pricing
  • API: $0.01 per credit; Gen-4 Video = 12 credits/sec ($0.12/sec); Gen-4 Turbo = 5 credits/sec ($0.05/sec); Veo 3 = 40 credits/sec ($0.40/sec)
  • Credits do not roll over; monthly reset
  • No per-user model — credits are per-account/organization

ChatGPT (Image Generation) — Multi-turn conversational image editing

Their solution: OpenAI replaced DALL-E 3 with GPT-4o's native image generation capabilities in ChatGPT (March 2025-2026), representing an architectural shift from a separate image generation model to a unified model that can both converse and generate/edit images natively. The key breakthrough is multi-turn image editing: unlike DALL-E 3's regeneration-only approach, GPT-4o enables iterative refinement through conversation. Users can request specific changes ("move the tree to the left," "make the flowers more vibrant," "change the pathway material to flagstone") and the model modifies the existing image without starting from scratch. GPT-4o also resolved longstanding issues with text rendering in images and hand anatomy. Free users get 2-3 images per day; Plus subscribers ($20/month) get 50 images per 3-hour window.

The pattern worth studying:

  1. Conversational image refinement: Users edit images through natural language dialogue rather than masking tools or parameter sliders. "Make the garden path wider" is more intuitive than drawing a selection and adjusting width.
  2. Context-aware editing: The model understands the full conversation history, so it can make coherent changes across multiple editing rounds without losing the overall design intent.
  3. Unified text + image model: There is no modal switch between "chat mode" and "image mode" — the same conversation can include questions, explanations, and image generation/editing seamlessly.
  4. Low barrier to entry: Free tier access (2-3 images/day) lets users experience the capability before paying. The limit is tight enough to drive upgrades but generous enough for evaluation.
  5. Iterative refinement without re-prompting from scratch: Users build on previous generations rather than starting over, which preserves design intent and reduces wasted credits.
  6. Rolling window rate limits: The 50-images-per-3-hours model prevents abuse while allowing burst usage for active design sessions.

How it could adapt to landscaping:

  • Build a conversational landscape design interface where users describe changes in natural language: "add a water feature near the patio," "replace the lawn with native grasses," "show me this design in fall colors."
  • Maintain conversation context across a design session so the AI remembers the full design history and makes coherent incremental changes.
  • Eliminate the modal split between "asking questions about plants" and "generating a design" — the same conversation should handle both seamlessly.
  • Offer a free tier with limited daily generations to let homeowners experience AI landscape design before committing to a paid plan.
  • Implement rolling window limits (rather than hard monthly caps) to support the bursty nature of design sessions.

What doesn't transfer:

  • ChatGPT generates images as flat raster outputs with no structured data (no plant species names, no dimensions, no bill of materials). Landscape design needs structured output alongside the visual.
  • The model has no domain-specific knowledge of plant compatibility, hardiness zones, sun requirements, soil types, or local regulations.
  • Multi-turn editing works well for single images but does not scale to multi-view landscape plans (plan view, elevation, 3D walkthrough) that need to stay synchronized.
  • ChatGPT's rate limiting model (per-user, per-time-window) does not account for the collaborative nature of landscape design where a designer and client may iterate together.

Credit/pricing model:

  • Free: 2-3 images per day (24-hour rolling window)
  • Plus ($20/month): 50 images per 3-hour rolling window
  • Pro ($200/month): Higher limits, priority access
  • API (GPT-4o image generation): Per-token pricing; image output tokens are priced at the standard GPT-4o output rate
  • No explicit "credit" system — rate limits are time-based rather than credit-based
  • Free tier is a true free tier (no trial expiration), providing ongoing low-volume access
  • Image generation is bundled with chat — no separate image pricing

Vercel AI SDK — Provider-agnostic infrastructure for building AI interfaces

Their solution: The Vercel AI SDK is a free, open-source TypeScript toolkit for building AI-powered applications, primarily targeting Next.js. It is not an AI product itself but rather the infrastructure layer for building AI products. Key abstractions include: Provider abstraction (a unified interface across OpenAI, Anthropic, Google, Mistral, and self-hosted models — switch providers by changing one line of code), Streaming (Server-Sent Events streaming with React hooks that reduce boilerplate from 200-300 lines to 10-20 lines), Tool/Function calling (type-safe tool definitions with Zod schemas, automatic conversation management, and multi-step agent loops), Structured output (type-safe schema enforcement for AI responses), and the new Agent abstraction (AI SDK 6, announced 2025, introducing a reusable Agent interface with tools, instructions, and type-safe UI streaming). The SDK is free and works with any hosting provider — it is not locked to Vercel.

The pattern worth studying:

  1. Provider abstraction eliminates vendor lock-in: A unified interface means the application can switch between OpenAI, Anthropic, Google, or a custom model without rewriting UI code. This is critical for a startup that may need to change providers as models improve or costs change.
  2. Streaming-first architecture: streamText for user-facing features (tokens appear as generated) vs. generateText for background tasks. This distinction is important for landscape design where some operations are interactive (chat) and others are background (rendering).
  3. Type-safe tool calling: Tools are defined with Zod schemas, ensuring the AI can only call functions with valid parameters. For landscape design, this means AI-suggested plant placements would be validated against the schema before being applied to the canvas.
  4. Agent loop with automatic conversation management: The SDK handles the full tool execution loop — LLM decides to call a tool, SDK executes it, result is appended to conversation, new generation triggered — until a text response is produced. This pattern enables complex multi-step design operations.
  5. Speech integration (AI SDK 5+): Unified speech generation and transcription interface, enabling voice-driven design workflows.
  6. Framework-agnostic but Next.js-optimized: Works with any React framework but provides optimized patterns for Next.js App Router, which is the SimplyScapes stack.

How it could adapt to landscaping:

  • Use the provider abstraction to route different landscape AI tasks to different models: plant identification to a vision model, design generation to an image model, conversation to a language model — all through a single SDK interface.
  • Implement streaming for interactive design features (show the landscape plan rendering progressively) and background generation for complex operations (full 3D rendering).
  • Define landscape-specific tools with Zod schemas: addPlant({ species: z.string(), position: z.object({x, y}), zone: z.number() }) — the AI can only suggest valid plant placements.
  • Build an agent loop for multi-step landscape design: user says "design a low-maintenance backyard," agent calls plant database tool, checks zone compatibility tool, generates layout tool, renders preview tool — all automatically orchestrated.
  • Leverage speech integration for on-site design: landscapers dictate observations while walking a property, and the AI processes them into design inputs.

What doesn't transfer:

  • The SDK is infrastructure, not a product — it provides the building blocks but not the domain-specific logic, training data, or design intelligence needed for landscaping.
  • The SDK's streaming patterns assume text-based output. Landscape design involves spatial data, images, and 3D models that require different streaming approaches.
  • Tool calling patterns work well for discrete actions but may struggle with continuous spatial operations (dragging a plant across the canvas in real time).

Credit/pricing model:

  • AI SDK: Free, open-source (MIT license), no usage costs
  • Vercel AI Gateway (optional): Pay-as-you-go, zero markup on token costs; free tier included with Vercel account
  • Model costs: Pass-through to the model provider (OpenAI, Anthropic, etc.) at their published rates
  • Vercel hosting (if used): Standard Vercel pricing applies to the application, not the SDK
  • The SDK itself has no credit system — the credit/pricing model is determined by the application builder (i.e., SimplyScapes would define its own credit model on top of the SDK)
  • This is the only product in this analysis that is infrastructure rather than a consumer/prosumer product

Cross-Industry Insights

Patterns That Have Not Been Applied to Landscaping

  1. Conversational-to-structured-output pipeline. Every product studied either generates flat images (Midjourney, ChatGPT, Firefly) or structured UI objects (Figma). No product in any market generates structured spatial designs through conversational input. The opportunity is to build a system where natural language ("I want a cozy fire pit area with native plantings") produces a structured landscape plan with named plants, dimensions, materials, and costs — not just a pretty picture.

  2. Preference learning without explicit configuration. Midjourney's image-rating personalization and Canva's adaptive design suggestions represent passive preference capture. In landscaping, this could mean showing homeowners 20-30 landscape photos during onboarding, learning their aesthetic, and biasing all future suggestions — modern vs. traditional, formal vs. naturalistic, colorful vs. green.

  3. Multi-turn editing with spatial awareness. ChatGPT's conversational image editing is powerful but spatially unaware. A landscaping-specific version could understand "move the pergola 3 feet closer to the house" as a precise spatial operation rather than an image manipulation.

  4. Tiered quality/speed/cost generation. Runway's model tiering (Turbo vs. standard vs. Aleph) has not been applied to landscape design. Quick concept sketches, detailed plan views, and photorealistic client presentations are different products with different cost structures — they should be priced differently.

  5. Direct-manipulation AI (non-prompt). Figma's erase/isolate/expand tools prove that AI does not require prompts. In landscaping, selecting a bare area and clicking "fill with seasonal color" is more intuitive than writing a prompt describing what should go there.

  6. Enterprise credit pooling for design teams. Adobe's credit pool model and Figma's upcoming shared pool subscriptions address a real need: landscape design companies have 3-15 designers with variable AI usage. Per-user limits waste credits on light users and constrain heavy users.

  7. Reference image consistency across views. Runway Gen-4's reference image system maintains character consistency across scenes. This pattern, applied to landscaping, would ensure that a design looks consistent whether shown as a plan view, a front elevation, or a 3D walkthrough.

  8. Voice-driven on-site design. Midjourney's voice input + Draft Mode enables rapid ideation. For landscapers, this could enable on-site design sessions: walk the property, dictate observations and preferences into the app, and get real-time concept generation.

Architectural Insight

The Vercel AI SDK provides the exact infrastructure layer needed to implement most of these patterns in a Next.js application. Its provider abstraction means SimplyScapes can start with one model provider and switch or multi-route as the landscape AI model market develops. The tool-calling abstraction maps naturally to landscape design operations (add plant, check zone, calculate cost, generate rendering). The streaming architecture supports both conversational chat and progressive rendering.


Pricing Model Comparison

| Product | Model Type | Free Tier | Entry Paid | Mid Tier | Top Tier | Unit of Measurement | Overage Handling | Team/Pool Support | |---------|-----------|-----------|------------|----------|----------|---------------------|------------------|-------------------| | Canva AI | Bundled subscription | 50 total generations | $15/mo (500/mo) | $10/user/mo Teams (500/mo) | ~$20/user/mo Business | Per generation | Hard wall; must upgrade | Per-user only (no pooling) | | Figma AI | Credits per seat | 500/mo (Starter) | $5/editor/mo (3,000/mo) | $5/editor/mo Org (3,500/mo) | $5/editor/mo Enterprise (4,250/mo) | Credits (variable cost per action) | Enforcement begins Mar 2026 | Shared pool subscription (Mar 2026) | | Midjourney | GPU time subscription | None | $10/mo (3.3 GPU hrs) | $30/mo (15 GPU hrs + Relax) | $120/mo (60 GPU hrs + Relax) | GPU minutes | Relax Mode (unlimited, lower priority) | Per-account only | | Adobe Firefly | Credits per plan | Via free Creative Cloud | $9.99/mo (2,000 credits) | $29.99/mo (7,000 credits) | $199.99/mo (50,000 credits) | Credits (1 per image; 20-100/sec video) | On-demand credit purchase | Enterprise credit pool (optional add-on) | | Runway ML | Credits + unlimited tier | 125 credits (one-time) | $12/mo (625 credits) | $28/mo (2,250 credits) | $76/mo (unlimited Relax) | Credits ($0.01/credit; per-second for video) | Buy more or use Relax Mode | Per-organization | | ChatGPT | Rate-limited subscription | 2-3 images/day | $20/mo (50/3hrs) | — | $200/mo Pro (higher limits) | Images per time window | Wait for window reset | Per-user only | | Vercel AI SDK | Open-source (free) | Full SDK, no limits | N/A (pass-through to providers) | N/A | N/A | N/A (infrastructure layer) | N/A | N/A |

Key Pricing Takeaways for SimplyScapes

  1. Credits are the dominant model — 5 of 7 products use some form of credit system (Canva, Figma, Adobe, Runway, and even Midjourney's GPU time is effectively credits). Time-window rate limiting (ChatGPT) is the exception, not the rule.

  2. Variable credit cost by action complexity is best practice — Figma (30-100+ credits per action) and Adobe (1 credit/image, 20-100 credits/sec video) both price AI actions by complexity. A landscape design system should similarly charge differently for a quick concept sketch vs. a full photorealistic rendering.

  3. "Unlimited at lower priority" reduces anxiety — Both Midjourney (Relax Mode) and Runway (Unlimited plan) offer unlimited generation at reduced priority. This is a powerful pattern for creative work where users need to experiment freely without watching a credit counter.

  4. Free tiers drive adoption but must be tight — Canva (50 lifetime), ChatGPT (2-3/day), and Runway (125 one-time) all offer free access that is sufficient for evaluation but insufficient for real work. The ideal free tier for landscaping AI would let a homeowner generate 2-3 concept designs before requiring a paid plan.

  5. Team credit pooling is an enterprise upsell — Adobe and Figma both offer or are introducing shared credit pools at premium rates. For landscape companies with variable designer utilization, this is a high-value feature worth charging for.

  6. Bundling AI into existing subscriptions works for platforms with existing user bases — Canva and Adobe bundle AI credits into their existing plans. SimplyScapes, as a newer platform, may benefit more from a standalone AI credit model that can be priced and communicated clearly.

Patent Landscapepatent analysis

Patent Landscape & Freedom-to-Operate Assessment

Date: 2026-03-09 (Updated) Scope: AI-assisted landscape design, conversational image editing, credit-based pricing, intent routing, markup-guided generation, multi-turn dialogue for iterative image refinement Databases Searched: Google Patents, TDCommons Time Range: 2012-2026 CPC Codes: G06T 11/00, G06F 3/04845, G06N 3/08, G06F30/13, G10L15/1815, G06F17/2785 Search Date: 2026-03-09


1. Overall Risk Assessment: LOW-MODERATE

The combined feature set -- AI landscape design + conversational editing with object library + markup-guided generation + proactive clarification + credit billing -- represents a novel combination not directly claimed by any single patent or combination of patents. However, specific component technologies carry patent risk that must be managed through architectural differentiation and domain specificity.

Key Risk Summary:

  • Adobe holds the strongest portfolio in conversational image editing (5+ granted US patents)
  • Google/Alphabet has expanding coverage of diffusion-based text-guided editing techniques
  • Home Outside, Inc. holds a recently granted patent specifically for AI landscape design generation
  • Credit/token billing and markup-guided generation in the landscape context remain largely unencumbered
  • No single patent covers the full SimplyScapes architecture (conversational + generative AI + landscape-specific + credit metering)

2. Patents by Topic Area

2.1 Conversational Image Editing (PRIMARY CONCERN)

| # | Patent | Title | Assignee | Filed | Status | Risk | |---|--------|-------|----------|-------|--------|------| | 1 | US11972757B2 (US20230148406A1) | Digital Media Environment for Conversational Image Editing and Enhancement | Adobe Inc. | 2023-01-03 (Priority 2018-08-22) | Granted, Active (exp. 2038) | HIGH | | 2 | US10579737B2 | Natural Language Image Editing Annotation Framework | Adobe Inc. | 2018-03-06 | Granted, Active (exp. 2038) | HIGH | | 3 | US11257491B2 | Voice Interaction for Image Editing | Adobe Inc. | 2018-11-29 | Granted, Active (exp. 2039) | HIGH | | 4 | US9412366B2 | Natural Language Image Spatial and Tonal Localization | Adobe Inc. | 2012-11-21 | Granted, Active (exp. 2034) | MODERATE | | 5 | US9436382B2 | Natural Language Image Editing | Adobe Inc. | 2012-11-21 | Granted, Active | MODERATE | | 6 | US20230230198A1 | Interactive Image Creation via NL Feedback (TiGAN) | Adobe Inc. | 2022 | Published | MODERATE | | 7 | EP4553759A2 | Image Editing Method, Apparatus, and Storage Medium (multi-round conversational) | Beijing Baidu Netcom | 2024-12-27 (Priority 2024-01-05) | Published | MODERATE | | 8 | WO2025209146A1 | Image Editing Method and Apparatus | ByteDance | 2025-03-13 (Priority 2024-04-01) | Published | LOW | | 9 | CN120655750A | Multi-round Image Modification Processing Method | Shenzhen Kukai Software | 2025-05-22 | Published | LOW | | 10 | US20200327884A1 | Customizable Speech Recognition System (creative applications) | Adobe Inc. | 2019-04-12 | Published | LOW |

Analysis: Adobe dominates this space with granted patents spanning from 2012 to 2024. The most concerning patent is US11972757B2, which describes a conversational digital image editing system with aesthetic scoring, intent mapping to canonical intentions, and iterative suggestion of editing operations. SimplyScapes should design its intent-routing architecture to differ from Adobe's "canonical intention mapping" approach.

Key Differentiator for SimplyScapes: Adobe's patents focus on photographic enhancement (exposure, contrast, color balance, aesthetic scores). SimplyScapes operates in landscape design (plant placement, hardscape, spatial layout, seasonal visualization) -- a fundamentally different editing domain. The object library concept (selecting from a curated catalog of landscape items and placing them via conversation) is a distinct, unclaimed combination.

2.2 Text-Guided Image Editing with Diffusion Models

| # | Patent | Title | Assignee | Filed | Status | Risk | |---|--------|-------|----------|-------|--------|------| | 1 | US20240037822A1 | Prompt-to-Prompt Image Editing with Cross-Attention Control | Google LLC | 2023-07-31 (Priority 2022-08-01) | Published | MODERATE | | 2 | JP7691548B2 | Text-Based Real-Life Image Editing Using Diffusion Models | Google LLC | 2024-04-18 (Priority 2023-04-18) | Granted | MODERATE | | 3 | CN119422137A | Hint-Driven Image Editing Using Machine Learning | Google LLC | 2024-05-09 (Priority 2023-05-09) | Published | MODERATE | | 4 | WO2024107884A1 | Null-Text Inversion for Editing Real Images Using Guided Diffusion Models | Google LLC | 2023-11-15 (Priority 2022-11-16) | Published | MODERATE | | 5 | EP4487299B1 | Fine-tuning Diffusion-Based Generative Neural Networks | Google LLC | 2024-03-13 (Priority 2023-03-17) | Granted | LOW | | 6 | US20240412458A1 | Diffusion-Guided Three-Dimensional Reconstruction | Google LLC | 2024-06-12 | Published | LOW | | 7 | US11983806B1 | Image Generation (inpainting/outpainting) | OpenAI | 2023 | Granted | MODERATE | | 8 | US11922550B1 | Hierarchical Text-Conditional Image Generation | OpenAI | 2023 | Granted | LOW | | 9 | US20220270310A1 | Web-Based Real-Time Image Editing with Neural Networks | Adobe Inc. | 2021 | Published | LOW |

Analysis: Google has 22+ patents covering diffusion-based image editing techniques. These are primarily implementation-specific (covering particular algorithms like cross-attention manipulation, null-text inversion) rather than broad application-layer claims. The risk is manageable if SimplyScapes uses third-party model APIs rather than implementing patented techniques directly.

Mitigation: Use third-party AI model APIs (OpenAI, Stability AI, etc.) for the generative layer. Model providers typically indemnify users against patent claims related to internal model techniques.

2.3 AI-Assisted Landscape / Design Generation

| # | Patent | Title | Assignee | Filed | Status | Risk | |---|--------|-------|----------|-------|--------|------| | 1 | US12518067B2 | System and Method for Generating a Landscape Design | Home Outside, Inc. | 2023-07-05 (Priority 2019-10-24) | Granted, Active (exp. 2041) | HIGH | | 2 | US20250225479A1 | Systems and Methods for 3D Model Visualization of Landscape Design | State Farm Mutual Auto Insurance | 2025-03-25 (Priority 2020-04-27) | Published | LOW | | 3 | CN118333571B | Intelligent Management System for Landscaping Engineering Projects | Liaocheng Zhengyuan | 2024-05-10 | Granted | LOW | | 4 | CN118940364B | Automatic Building Design Scene Generation Based on AI | Shenzhen Kuboo Architecture | 2024-07-17 | Granted | LOW | | 5 | US20210173968A1 | AI Systems for Interior Design | Realsee (Beijing) | 2021 | Published | LOW |

Analysis: The Home Outside patent (US12518067B2) is the most direct competitive concern. It describes a landscape design generation system with:

  • A calculator engine, landscape design engine, and scoring engine
  • Retrieving landscape data from online databases and comparing to existing designs
  • Calculating landscape scores and generating improvements
  • Displaying improved designs as 3D images

Key Differentiator for SimplyScapes: Home Outside's patent describes a score-based comparison system that retrieves data from online databases. SimplyScapes uses a conversational AI approach with generative diffusion models for visual generation, real-time iterative refinement, and domain-specific object libraries -- architecturally distinct from the patented system.

2.4 Intent Classification / Routing in Multimodal AI

| # | Patent | Title | Assignee | Filed | Status | Risk | |---|--------|-------|----------|-------|--------|------| | 1 | US9542949B2 | Multimodal Intent Satisfaction | Microsoft | Pre-2020 | Granted | LOW | | 2 | US12124508B2 | Multimodal Intent Discovery | Adobe Inc. | 2022 | Granted | LOW | | 3 | US9570070B2 | Multi-Modal Device Interactions in Voice Services | Amazon | Pre-2020 | Granted | LOW | | 4 | US11347801B2 | Multi-Modal Interaction with Automated Assistants | Google | Pre-2020 | Granted | LOW | | 5 | US10810274B2 | Optimizing Dialogue Policy Decisions for Digital Assistants | Apple Inc. | 2017-08-15 | Granted | LOW | | 6 | US10482874B2 | Hierarchical Belief States for Digital Assistants | Apple Inc. | 2017-08-15 | Granted | LOW | | 7 | US12197857B2 | Digital Assistant Handling of Personal Requests | Apple Inc. | 2021-07-15 | Granted | LOW |

Analysis: Existing intent classification patents are scoped to general-purpose voice/text assistants (Siri, Alexa, Google Assistant), not design-tool-specific routing between generation, editing, and clarification. SimplyScapes' intent routing between landscape-specific actions (add plant, modify hardscape, change season, refine area) is sufficiently differentiated.

2.5 Credit/Token-Based Pricing for AI Services

| # | Patent | Title | Assignee | Filed | Status | Risk | |---|--------|-------|----------|-------|--------|------| | 1 | US9197642B1 | Token-Based Billing for Rendering | Otoy | 2010 | Granted | LOW |

Analysis: No patents claim credit-based pricing specifically for AI image generation. The credit/token billing search returned 3,779 results across all generative AI billing topics, but none specifically cover the combination of credit metering for AI design generation services. Credit pricing is standard SaaS business practice with extensive prior art. SimplyScapes can implement credit/token-based billing without significant patent risk.

2.6 Markup/Annotation-Guided Generation

| # | Patent | Title | Assignee | Filed | Status | Risk | |---|--------|-------|----------|-------|--------|------| | 1 | US10825219B2 | Segmentation Guided Image Generation | Northeastern University | 2019 | Granted | LOW | | 2 | US12039431B1 | Multimodal ML Model Interaction (annotations) | OpenAI | 2023 | Granted | MODERATE | | 3 | WO2024158398A1 | AI Drawing Generation with Clarifying Questions | Individual | 2023 | Ceased | LOW |

Analysis: The search for markup-guided image generation patents returned mostly irrelevant results (medical imaging, video annotation, XAML troubleshooting). Using visual markup (user-drawn annotations, region selection) to guide AI generation does not have specific blocking patents in the landscape design context. OpenAI's US12039431B1 covers analysis of marked regions, not generation guided by canvas markup.

2.7 Multi-Turn Dialogue for Iterative Image Refinement

| # | Patent | Title | Assignee | Filed | Status | Risk | |---|--------|-------|----------|-------|--------|------| | 1 | US12142298B1 | Creating Digital Stories Based on Memory Graphs and Multi-Turn Dialogs | Meta Platforms, Inc. | 2023-01-09 (Priority 2022-02-14) | Granted | LOW | | 2 | JP6835398B2 | Multi-Turn Canned Dialog | Apple Inc. | 2018-10-29 | Granted | LOW | | 3 | EP4553759A2 | Image Editing Method (multi-round conversational) | Baidu | 2024-12-27 | Published | MODERATE |

Analysis: Multi-turn dialogue patents primarily cover general digital assistant functionality (Apple) or story generation (Meta). The Baidu patent on multi-round conversational image editing is the most relevant but primarily covers Chinese/EP jurisdictions. No active US patent broadly claims multi-turn conversational image editing with iterative refinement -- though Adobe's US11972757B2 partially covers this through its iterative suggestion mechanism.


3. Competitor IP Activity

Adobe Inc.

Portfolio Strength: VERY STRONG Adobe has the most comprehensive patent portfolio in NL/conversational image editing, with 5+ granted US patents spanning 2012-2024:

  • Natural language spatial/tonal localization (foundational, 2012)
  • Natural language image editing (2012)
  • NL annotation framework for image editing (2018)
  • Voice interaction for image editing (2018)
  • Full conversational image editing environment (2018/2023, granted 2024)
  • Customizable speech recognition for creative apps (2019)
  • Interactive image creation with NL feedback (2022)
  • Web-based real-time neural network editing (2021)

Adobe's Firefly product line commercially implements many of these patents. Their aggressive filing pattern suggests continued expansion.

Google/Alphabet

Portfolio Strength: STRONG (Technical Layer) Google has 22+ patents focused on underlying ML techniques:

  • Prompt-to-prompt editing via cross-attention (2022)
  • Null-text inversion for real image editing (2022)
  • Text-based diffusion image editing (2023)
  • Hint-driven image editing (2023)
  • Diffusion model fine-tuning techniques (2023)
  • 3D reconstruction and generation (2023)

Google's patents cover specific algorithms rather than application-layer products. Less likely to create broad blocking positions for vertical applications.

OpenAI

Portfolio Strength: LIMITED but GROWING

  • Image generation including inpainting/outpainting (US11983806B1)
  • Multimodal ML model interaction with annotations (US12039431B1)
  • Hierarchical text-conditional image generation (US11922550B1)

OpenAI's DALL-E/ChatGPT products are significant commercially but their patent portfolio in this specific domain remains limited.

Canva

Portfolio Strength: NOT FOUND No relevant patents found. Canva's "Magic" AI features likely rely on third-party APIs and licensing.

Autodesk

Portfolio Strength: NOT FOUND (in AI image editing) Extensive traditional CAD/BIM portfolio but no filings in generative AI image editing or conversational design.

Home Outside, Inc.

Portfolio Strength: NICHE but DIRECT Single granted patent (US12518067B2) specifically covering AI landscape design generation with scoring. Most direct competitive threat in the landscape vertical, but describes a fundamentally different technical approach (database-comparison scoring) than generative AI chat interfaces.

Chinese Tech (Baidu, ByteDance)

Portfolio Strength: GROWING Both have recent filings covering multi-round conversational image editing. Primarily cover Chinese and European jurisdictions with limited direct US enforcement risk. Worth monitoring.

Stability AI / Midjourney

Portfolio Strength: NONE FOUND No patent applications identified. Primarily engaged in copyright litigation rather than patent prosecution.


4. Defensive Publication Findings (TDCommons)

Search of Technical Disclosure Commons for "AI image editing conversational" returned 22 defensive publications (2022-2026). Most relevant:

| # | Title | Authors | Date | Relevance | |---|-------|---------|------|-----------| | 1 | AI-driven Special Effects Generation and Application Framework for Visual Content | Colvin Pitts | 09/2025 | AI generation pipeline; establishes prior art for effect-based AI generation | | 2 | Assistive and Accessible User Interaction Mechanisms for AI-Powered Art | Alex Olwal, Shaun Kane | 09/2025 | Accessible interaction patterns for AI art; relevant to UI design | | 3 | Conversational Agent for Orchestrating Physical Fulfillment of Generative AI Outputs | Lance Nanek | 12/2025 | Conversational agent bridging AI outputs to physical actions; relevant to design-to-execution | | 4 | Generating Personalized Multimedia Content by Prompting Models with Location and Context Specific Inputs | Shiblee Hasan, Joseph Johnson Jr | 08/2025 | Location-specific generation; relevant to property-specific landscape generation | | 5 | Artificial Intelligence Based Creative Companion for Content Creation | Anonymous | 01/2022 | AI companion for creative workflows; prior art for AI creative assistants | | 6 | Policy-compliant Generative AI Deployment Using a Multimodal Critic | Xingyu Federico Xu et al. | 12/2025 | Safety/compliance for generative AI; relevant to content moderation | | 7 | Contextual Conversational Advertisements in Agents | Oded Elyada et al. | 12/2024 | Monetization of conversational AI; relevant to business model | | 8 | Three-Dimensional Mesh Editing Using Masked Large Reconstruction Models | Anonymous | 02/2025 | 3D editing with AI; relevant to future 3D landscape features | | 9 | A System and Method for Real-Time AI Driven Image Personalization for Digital Advertisements | Tarushi Dubey et al. | 02/2026 | Real-time AI image personalization; establishes prior art | | 10 | Generative Video Models to Create Panning Videos Anchored on a User Image | Lenord Melvix Joseph Stephen Max | 11/2023 | AI video from user images; establishes prior art for image-to-video in design context |

Key Takeaway: The defensive publication landscape is growing for conversational AI creative tools. Several publications establish prior art that could challenge overly broad patent claims. TDCommons coverage for design-specific AI tools remains thin, representing an opportunity for SimplyScapes to file defensive publications.


5. Freedom-to-Operate Assessment

5.1 High-Concern Areas

Adobe Conversational Image Editing Patents

  • US11972757B2 broadly covers conversational image editing systems with aesthetic scoring and intent mapping
  • Mitigation: Use LLM-based intent routing (function calling) rather than rule-based canonical intention mapping. Focus on landscape-specific intents rather than general photo editing. Avoid implementing aesthetic attribute scoring systems that mirror Adobe's claims.

Home Outside Landscape Design Patent

  • US12518067B2 covers AI landscape design with calculator engine + scoring engine + database comparison
  • Mitigation: Use generative AI (diffusion models) for visual generation rather than database-lookup design recommendation. Ensure the workflow is conversational/iterative, not score-and-improve. Do not implement a scoring engine that compares against online landscape databases.

5.2 Medium-Concern Areas

Google Diffusion Editing Techniques

  • 22+ patents covering specific diffusion-based editing algorithms
  • Mitigation: Use third-party model APIs where providers handle patent licensing. Avoid implementing specific patented techniques (cross-attention manipulation, null-text inversion) in custom code.

Baidu Multi-Round Conversational Editing

  • EP4553759A2 covers multi-round conversational image editing
  • Mitigation: Primarily Chinese/EP jurisdiction. Monitor for US continuation filings.

5.3 Low-Concern Areas

  • Credit/token billing: No blocking patents; standard SaaS practice
  • Markup/annotation-guided generation: No blocking patents in landscape context
  • Intent routing for design tools: Existing patents scoped to general assistants, not vertical design tools
  • Landscape-specific AI features: Plant libraries, climate-zone awareness, seasonal visualization, property context -- all largely unpatented

5.4 Recommended Actions

  1. Architectural Differentiation: Use LLM-based function-calling for intent routing (not rule-based canonical intention mapping per Adobe's claims)

  2. Domain Specificity: Frame patent-sensitive features within landscape design domain; "landscape design editing via conversation" is more defensible than "general image editing via conversation"

  3. Model-Layer Separation: Use third-party AI model APIs for generation; model providers typically indemnify against technique-level patent claims

  4. Design Around Home Outside: Use conversational AI + generative models rather than scoring-based database comparison systems

  5. File Defensive Publications on TDCommons for:

    • Property-context-aware landscape generation from conversational AI
    • Climate zone and plant hardiness integration in generative AI design
    • Multi-turn landscape design refinement with spatial memory
    • Credit-based metering for domain-specific AI generation complexity
    • Object library integration with conversational generative editing
  6. Patent Watch Program: Monitor quarterly filings from Adobe, Google, Baidu, ByteDance, and Home Outside in CPC classifications G06T11/00, G06F30/13, G10L15/1815


6. Search Methodology

6.1 Google Patents Searches Executed

  1. "conversational image editing" -- 13 results
  2. "multi-turn" "image editing" -- 37 results
  3. "inpainting" "natural language" editing -- 584 results (top results sampled)
  4. "markup guided" OR "annotation guided" image generation -- 15 results
  5. "landscape design" AI generation -- 180 results (top results sampled)
  6. "text guided" "image editing" "diffusion" assignee:Adobe -- searched
  7. "natural language" "image editing" "intent" -- 1,664 results (top results sampled)
  8. "image editing" "text prompt" "diffusion" assignee:Google -- 22 results
  9. "generative fill" OR "text guided image editing" "diffusion model" -- searched
  10. "credit system" OR "token based" "AI generation" OR "generative AI" billing -- 3,779 results (top sampled)
  11. "image generation" "natural language" "user interface" "chat" assignee:Canva OR Autodesk OR OpenAI -- 0 results

6.2 TDCommons Search

  • Query: AI image editing conversational -- 22 results across Defensive Publications Series

6.3 Company-Specific Patent Searches

  • Adobe Inc. -- multiple targeted searches (strongest portfolio found)
  • Google/Alphabet -- targeted diffusion editing search (22+ patents)
  • OpenAI -- searched, limited relevant results
  • Canva -- searched, no relevant results
  • Autodesk -- searched, no relevant results in AI image editing
  • Home Outside, Inc. -- found via landscape design search
  • Baidu -- found via conversational image editing search
  • ByteDance -- found via conversational image editing search
  • Meta Platforms -- found via multi-turn dialog search
  • Apple Inc. -- found via multi-turn dialog search (digital assistant focus)

6.4 Limitations

  • Patent applications filed within the last 18 months may not yet be published
  • Google Patents may not include all international jurisdictions
  • This analysis does not constitute legal advice; formal FTO opinions should be obtained from patent counsel for specific product features
  • Some companies may file under subsidiary or holding company names not captured in assignee searches

Patent findings are NOT legal advice. Consult qualified IP counsel for freedom-to-operate opinions before making product decisions based on this analysis.

Academic Open Source Scanliterature review

Academic & Open Source Scan: Conversational AI Image Editing and Supporting Systems

Date: 2026-03-09 Scope: Academic papers, open source projects, and industry patterns relevant to building a conversational AI image editing interface for landscape/hardscape design. Sources: arXiv, Google Scholar, CVPR 2024, ECCV 2024, NeurIPS 2024/2025, ICLR 2024/2025, ACL/EMNLP, GitHub, industry reports.


1. Summary of Key Findings

Instruction-Based Image Editing has matured rapidly. The field has moved from single-instruction models (InstructPix2Pix) to MLLM-guided systems (SmartEdit, BrushEdit, GenArtist) that decompose complex edits into sub-tasks, self-verify results, and support multi-turn interaction. Agent-based architectures that orchestrate multiple specialized models via tool calling are now the dominant paradigm for production-quality editing.

Tool/Function Calling in LLMs is well-benchmarked (BFCL v4, ToolACE) and production-ready across major providers. The Vercel AI SDK (v6) and LangGraph provide mature TypeScript/Python frameworks for building tool-calling agents with structured outputs. Anthropic's Model Context Protocol (MCP) is emerging as a cross-platform standard for tool integration.

Model Routing can reduce LLM costs by 85%+ (RouteLLM) or match GPT-4 quality at 2% of cost (FrugalGPT). These techniques are directly applicable to routing simple edits to lightweight models while reserving expensive models for complex reasoning.

Multi-Turn Visual Dialogue systems like DialogGen and Talk2Image demonstrate that multi-agent architectures outperform single-agent approaches for complex editing workflows, addressing "intention drift" in long conversations.

Generative UI is an emerging pattern where LLMs dynamically render custom interface components (Vercel AI SDK, Google A2UI, assistant-ui). This is a natural fit for a chat-based image editing interface that needs to present different controls depending on the editing context.

Usage-Based Billing for AI services has converged on credit-based systems. Stripe's $1B acquisition of Metronome (Dec 2025) validates the market. Open source options (Lago, FlexPrice) exist for self-hosted billing infrastructure.


2. Reference Table

| # | Type | Reference | Venue/Date | Relevance | |---|------|-----------|------------|-----------| | Instruction-Based Image Editing | | | | | | 1 | Paper | InstructPix2Pix: Learning to Follow Image Editing Instructions | CVPR 2023 | Foundation model for text-instruction image editing; baseline for all subsequent work | | 2 | Paper | SmartEdit: Complex Instruction-based Image Editing with MLLMs | CVPR 2024 | Uses MLLM for complex instruction understanding; Bidirectional Interaction Module; Reason-Edit benchmark | | 3 | Paper | BrushNet: Plug-and-Play Image Inpainting with Dual-Branch Diffusion | ECCV 2024 | Plug-and-play inpainting architecture; decomposed dual-branch diffusion; BrushData/BrushBench | | 4 | Paper | BrushEdit: All-In-One Image Inpainting and Editing | arXiv 2412.10316 (2024) | Agent-cooperative framework with MLLM + inpainting; free-form instruction editing; category classification + mask acquisition pipeline | | 5 | Paper | MGIE: Guiding Instruction-based Image Editing via MLLMs | ICLR 2024 | Apple's MLLM-guided editing; derives expressive instructions from ambiguous user input; open source | | 6 | Paper | Emu Edit: Precise Image Editing via Recognition and Generation Tasks | CVPR 2024 | Meta's multi-task editing model; 7-task benchmark; task embeddings for generalization | | 7 | Paper | GenArtist: Multimodal LLM as Agent for Unified Image Generation and Editing | NeurIPS 2024 (Spotlight) | MLLM agent with tool library; tree-structured planning; self-correction with verification; 7%+ improvement over DALL-E 3 | | 8 | Paper | InstructDiffusion: A Generalist Modeling Interface for Vision Tasks | CVPR 2024 | Unified framework casting vision tasks as pixel manipulation via human instructions | | 9 | Paper | OmniGen: Unified Image Generation | arXiv 2409.11340 (2024) | Simplified architecture; no ControlNet/IP-Adapter needed; unified transformer for text+image | | 10 | Paper | UltraEdit: Instruction-based Fine-Grained Image Editing at Scale | NeurIPS 2024 | 4M editing sample dataset; region-based editing; anchored on real images | | 11 | Paper | AnyEdit: Mastering Unified High-Quality Image Editing | arXiv 2411.15738 (2024) | 2.5M editing pairs across 20+ edit types; task-aware routing; AnyEdit-Test benchmark | | 12 | Paper | ImgEdit: A Unified Image Editing Dataset and Benchmark | NeurIPS 2025 | 1.2M edit pairs; multi-turn editing tasks; comprehensive benchmark | | 13 | Paper | Step1X-Edit: Open Source Image Editing Framework | 2025 | MLLM-based instruction processing; open source; ComfyUI integration | | Tool Calling / Function Calling | | | | | | 14 | Paper | Gorilla: Large Language Model Connected with Massive APIs | NeurIPS 2024 | Retriever-Aware Training (RAT); APIBench; surpasses GPT-4 on API calls | | 15 | Paper | ToolACE: Winning the Points of LLM Function Calling | ICLR 2025 | Self-evolution synthesis for tool-learning data; 26,507 API pool; SOTA on BFCL | | 16 | Benchmark | Berkeley Function Calling Leaderboard (BFCL) v4 | Ongoing (2024-2025) | Industry-standard benchmark; AST evaluation; v3 adds multi-turn; v4 adds agentic evaluation | | 17 | Paper | Octopus v2: On-device Language Model for Super Agent | arXiv 2024 | Functional token strategy; 99.5% accuracy; 140x faster inference than RAG; 2B parameters | | Model Routing & Cascading | | | | | | 18 | Paper | RouteLLM: Learning to Route LLMs with Preference Data | ICLR 2025 | Routes between strong/weak LLMs; 85% cost reduction on MT Bench; 45% on MMLU; open source framework | | 19 | Paper | FrugalGPT: How to Use LLMs While Reducing Cost | arXiv 2305.05176 (2023) | LLM cascade approach; matches GPT-4 at 2% cost; 4% accuracy improvement at same cost | | 20 | Paper | A Unified Approach to Routing and Cascading for LLMs | ETH Zurich 2024 | Theoretical unification of routing and cascading strategies | | Multi-Turn Visual Dialogue | | | | | | 21 | Paper | DialogGen: Multi-modal Interactive Dialogue System for Multi-turn T2I Generation | arXiv 2403.08857 (2024) | MLLM + T2I integration; multi-turn generation quality enhancement | | 22 | Paper | Talk2Image: Multi-Agent System for Multi-Turn Image Generation and Editing | arXiv 2508.06916 (2025) | Multi-agent architecture; addresses intention drift and incoherent edits | | 23 | Paper | DialogPaint: A Dialog-based Image Editing Model | arXiv 2303.10073 (2023) | Conversational image editing through natural dialogue; iterative multi-round editing | | 24 | Paper | TDRI: Two-Phase Dialogue Refinement for Interactive Image Generation | arXiv 2503.17669 (2025) | Initial Generation Phase + Interactive Refinement Phase; handles ambiguous prompts | | Multimodal UI/UX Design Patterns | | | | | | 25 | Paper | MAUI: Multimodal AI-augmented UI Development Architecture | Stanford 2024 | Explicit instruction + implicit preference handling; ReactGenie and AMMA frameworks | | 26 | Paper | A Multimodal GUI Architecture for LLM-Based Conversational Assistants | arXiv 2510.06223 (2025) | Strong cohesion between GUI and linguistic UI | | 27 | Paper | Generative Interfaces for Language Models | arXiv 2508.19227 (2025) | Theoretical framework for LLM-generated interfaces | | 28 | Report | Generative UI Report 2025 | Thesys 2025 | Industry survey of generative UI patterns and adoption | | Credit/Billing Systems | | | | | | 29 | Report | AI Pricing in Practice: 2025 Field Report | Metronome 2025 | Cost-plus credit systems (30-50% markup); customer anxiety about unpredictable costs | | 30 | Report | Token-Based Pricing: How to Account for AI Credits | Afternoon 2024 | Accounting and revenue recognition for credit-based AI billing | | Protocols & Standards | | | | | | 31 | Standard | Model Context Protocol (MCP) | Anthropic Nov 2024 | Open standard for tool integration; adopted by OpenAI, Google; SDKs in TS/Python/C#/Java |


3. Detailed Analysis by Topic Area

3.1 Instruction-Based Image Editing

The evolution of instruction-based image editing follows a clear trajectory from simple to complex:

Generation 1 - Direct Instruction (2022-2023): InstructPix2Pix established the paradigm of editing images from natural language instructions. It uses paired training data generated by combining GPT-3 (for instruction generation) and Stable Diffusion (for image pairs). The limitation is reliance on the CLIP text encoder, which struggles with complex or ambiguous instructions.

Generation 2 - MLLM-Guided Editing (2024): SmartEdit (CVPR 2024) introduced the use of Multimodal Large Language Models to understand complex editing instructions. Its Bidirectional Interaction Module enables comprehensive information flow between the input image and MLLM output. The Reason-Edit benchmark specifically targets complex instruction scenarios where prior methods fail.

MGIE (ICLR 2024, Apple) takes a similar approach but focuses on deriving "expressive instructions" from ambiguous user input. For example, it converts vague instructions like "make the sky more blue" into precise operations like "increase sky saturation by 20%." This translation step is critical for production systems where users give imprecise instructions.

Emu Edit (CVPR 2024, Meta) introduced multi-task training across editing types plus computer vision tasks (segmentation, keypoint detection). Its task embedding approach enables rapid adaptation to new editing tasks with minimal examples, which is valuable for extending the system to domain-specific edits (e.g., landscape-specific operations).

Generation 3 - Agent-Based Editing (2024-2025): GenArtist (NeurIPS 2024 Spotlight) represents the current state of the art. It uses an MLLM as an agent that:

  1. Decomposes complex requests into sub-tasks
  2. Selects appropriate tools from a library of specialized models
  3. Constructs a planning tree for execution
  4. Verifies results at each step and self-corrects

This architecture achieves 7%+ improvement over DALL-E 3 on T2I-CompBench and state-of-the-art on MagicBrush. The agent pattern maps directly to a chat interface where the LLM orchestrates multiple image processing operations.

BrushEdit extends this with a specific focus on inpainting workflows. Its agent pipeline performs: editing category classification, main object identification, mask acquisition, and editing area inpainting. The dual-branch architecture (from BrushNet, ECCV 2024) handles arbitrary mask shapes without needing separate models for different mask types.

Datasets and Benchmarks: The field has produced increasingly large and diverse training datasets:

  • UltraEdit (NeurIPS 2024): 4M editing samples, region-based, anchored on real images
  • AnyEdit (2024): 2.5M pairs across 20+ edit types with task-aware routing
  • ImgEdit (NeurIPS 2025): 1.2M pairs including multi-turn editing tasks
  • BrushData/BrushBench: Segmentation-based inpainting benchmarks

Key Takeaway for SimplyScapes: The agent-based approach (GenArtist/BrushEdit pattern) is the most promising architecture. An MLLM orchestrator can decompose landscape editing instructions ("add a stone patio behind the garden beds") into sub-operations (identify garden area, determine patio placement, generate stone texture, blend into scene) using specialized tools. The translation of ambiguous instructions into precise operations (MGIE pattern) is essential for consumer-facing products.


3.2 Tool Calling / Function Calling in LLMs

Benchmarking and Evaluation: The Berkeley Function Calling Leaderboard (BFCL) has become the industry standard for evaluating LLM tool-calling capabilities. Now at version 4, it has evolved through:

  • v1: AST-based evaluation metric for function call correctness
  • v2: Enterprise and community-contributed function definitions
  • v3: Multi-turn interactions (critical for conversational editing)
  • v4: Holistic agentic evaluation (complete task workflows)

ToolACE (ICLR 2025) demonstrated that purpose-built training data for tool calling can enable an 8B parameter model to rival GPT-4's function calling performance. Its self-evolution synthesis process generated 26,507 diverse APIs for training, suggesting that domain-specific tool-calling fine-tuning (e.g., for image editing APIs) is feasible.

On-Device Function Calling: Octopus v2 (Nexa AI, 2024) achieved 99.5% accuracy in function-calling tasks at 2B parameters using a "functional token" strategy that reduces context length by 95%. This approach could enable lightweight, on-device routing of simple editing operations without cloud API calls.

Production Frameworks: Three frameworks dominate the tool-calling implementation space:

  1. Vercel AI SDK (v6): Unifies structured output generation with tool calling. The Agent abstraction enables reusable agent definitions with model, instructions, and tools. Supports streaming React Server Components for generative UI. TypeScript-native.

  2. LangGraph (LangChain): Recommended for production agents as of 2025. Graph-based state machine for multi-agent workflows. Supports MCP, streamable HTTP, and OpenAPI-based tool calls. Better suited for complex stateful workflows with branching logic.

  3. Anthropic Claude Tool Use: First-class structured outputs with strict: true guaranteeing schema validation. Agent SDK supports JSON Schema, Zod, or Pydantic for output validation. MCP for cross-platform tool integration.

Model Context Protocol (MCP): Announced by Anthropic in November 2024, MCP is an open standard for connecting AI assistants to external tools and data sources. Adopted by OpenAI and Google DeepMind. SDKs available in TypeScript, Python, C#, and Java. Pre-built servers exist for GitHub, Slack, Postgres, and other enterprise systems. Security concerns were raised in April 2025 regarding prompt injection and tool permission vulnerabilities.

Key Takeaway for SimplyScapes: The Vercel AI SDK is the natural choice given the Next.js stack. Its tool-calling + structured output unification in v6 and generative UI support via React Server Components directly address the need for a chat interface that renders custom editing controls. MCP provides future-proofing for tool integration, though security hardening is needed.


3.3 Model Routing and Cascading

RouteLLM (ICLR 2025, LMSYS): Proposes router models that dynamically select between a stronger and weaker LLM per query based on estimated difficulty. Key results:

  • 85% cost reduction on MT Bench vs. GPT-4-only
  • 45% cost reduction on MMLU
  • 35% cost reduction on GSM8K The routing approach selects a single model per query (not cascading), which keeps latency constant. The framework is open source.

FrugalGPT (Stanford, 2023): Uses an LLM cascade approach: sequentially query models from cheapest to most expensive until a reliable response is obtained. A learned scoring function decides when to accept a response. Key results:

  • Match GPT-4 quality at 2% of cost
  • Exceed GPT-4 accuracy by 4% at the same cost Trade-off: latency increases with cascade depth.

ETH Zurich Unified Approach (2024): Provides a theoretical framework unifying routing (single model selection) and cascading (sequential model queries) under a common optimization objective, enabling hybrid strategies.

Key Takeaway for SimplyScapes: For a conversational image editing system, a hybrid routing strategy is optimal:

  • Simple operations (crop, rotate, brightness): Route to lightweight models or deterministic functions (no LLM needed)
  • Standard edits (object removal, style transfer): Route to mid-tier models (e.g., fine-tuned Stable Diffusion)
  • Complex creative edits (scene composition, multi-step transformations): Route to full MLLM agent with tool calling This tiered approach can dramatically reduce per-edit costs while maintaining quality where it matters.

3.4 Multi-Turn Visual Dialogue Systems

Core Challenge - Intention Drift: Talk2Image (2025) identifies the key problem in multi-turn image editing: single-agent systems suffer from "intention drift" where cumulative user goals become misaligned as the conversation progresses, leading to incoherent edits. Multi-agent architectures address this by separating concerns (understanding, planning, execution, verification).

DialogGen (2024): Equips MLLMs with text-to-image models to extend output modality. Focuses on maintaining generation quality across multiple conversation turns through strong multi-modal comprehension. Key innovation: the MLLM maintains conversation context while delegating generation to specialized models.

TDRI (2025): Introduces a two-phase approach:

  1. Initial Generation Phase: Interpret the user's first request and generate a baseline
  2. Interactive Refinement Phase: Iteratively refine based on follow-up instructions

This maps well to a landscape editing workflow where users start with a general request and progressively refine specific areas.

DialogPaint (2023): Early demonstration of multi-round image editing through natural dialogue. Establishes the pattern of maintaining edit history for undo/redo within the conversation flow.

Key Takeaway for SimplyScapes: Multi-turn editing requires explicit state management to prevent intention drift. The two-phase approach (generate-then-refine) from TDRI is a practical pattern. A multi-agent architecture (Talk2Image) should be considered for complex editing flows, with separate agents for understanding intent, selecting tools, executing edits, and verifying quality.


3.5 Multimodal LLM UI/UX Design Patterns

Generative UI: The most significant UX trend is generative UI, where the LLM dynamically renders custom interface components based on conversation context. Key implementations:

  • Vercel AI SDK: Streams React Server Components from the server, allowing the LLM to render custom UI elements (sliders, image previews, comparison views) within the chat flow.
  • Google A2UI (2025): An open project for agent-driven interfaces. Uses a flat list of components with ID references that LLMs can generate incrementally, enabling progressive rendering. Supports incremental UI updates based on conversation progression.
  • assistant-ui: Open source TypeScript/React library for AI chat with tool call rendering, human approval workflows, and safe frontend actions.

Architectural Patterns: The Stanford MAUI architecture identifies two interaction modes:

  1. Explicit direct instructions: User tells the system what to do ("remove the tree")
  2. Implicit preference inference: System learns from user behavior and feedback

Most current interfaces fall into the "chatbot on the side" category, but the trend is toward strong cohesion between GUI elements and the conversational interface.

Post-Chat UI (2025): Industry analysis identifies the shift beyond pure chat interfaces. LLMs can generate context-specific controls, buttons, layouts, and navigation tailored to the current task. This is particularly relevant for image editing where a chat-only interface is insufficient -- users need visual controls for spatial operations.

Key Takeaway for SimplyScapes: The interface should blend conversational AI with generated UI components. When a user asks to adjust brightness, the system should render a slider within the chat flow (generative UI). When they ask to select a region, it should render an interactive canvas overlay. The Vercel AI SDK's React Server Component streaming and the A2UI incremental rendering pattern are directly applicable.


3.6 Credit/Token Billing Systems for AI Services

Industry Trends: Over 61% of new B2B SaaS products are exploring usage-based pricing (OpenView Partners, 2024). Stripe's $1B acquisition of Metronome in December 2025 validates the credit-based billing infrastructure market.

Common Architecture: The dominant pattern is a cost-plus credit system:

  1. Credits are allocated (purchased or included in subscription tier)
  2. Credits are consumed based on operation complexity (input/output tokens, GPU time, model tier)
  3. Standard markup of 30-50% over raw API costs
  4. Real-time dashboards for usage visibility
  5. Usage alerts to prevent bill shock

Key Challenge - Customer Anxiety: Multiple industry reports flag unpredictable costs as the top barrier to adoption. Buyers do not understand credit burn rates. Solutions include:

  • Real-time usage dashboards
  • Spend alerts and budget caps
  • Predictable credit packages (e.g., "100 edits per month")
  • Value-based credit pricing (charge per edit, not per token)

Infrastructure Options:

  • Stripe Billing + Metronome: Enterprise-grade, now integrated. Handles 100K+ events/sec.
  • Lago (open source): Self-hosted billing with event-driven metering. $34M funding. GitHub: github.com/getlago/lago
  • FlexPrice (open source): AI-native billing with credit/top-up support. Self-hosted. GitHub: github.com/flexprice/flexprice
  • Amberflo: Real-time usage metering focused on cost allocation across AI models
  • Orb: Usage-based billing for AI companies (proprietary)

Key Takeaway for SimplyScapes: A credit-based system mapped to user-understandable units ("design edits" rather than "tokens") reduces customer anxiety. The tiered model routing strategy (Section 3.3) directly feeds into credit pricing: simple edits cost 1 credit, complex edits cost 5-10 credits. Lago or FlexPrice can provide the billing infrastructure. Stripe's native integration with Metronome simplifies payment processing.


4. Open Source Projects

4.1 Image Editing Models & Frameworks

| Project | GitHub | Stars | Description | |---------|--------|-------|-------------| | ComfyUI | comfyanonymous/ComfyUI | 89K+ | Node-based UI for Stable Diffusion; extensible with custom nodes; supports Flux, SDXL, SD3 | | AUTOMATIC1111 WebUI | AUTOMATIC1111/stable-diffusion-webui | 140K+ | Most popular SD web UI; extensive extension ecosystem | | BrushNet | TencentARC/BrushNet | - | Plug-and-play inpainting; ECCV 2024 | | BrushEdit | TencentARC/BrushEdit | - | Agent-cooperative inpainting + editing pipeline | | MGIE (Apple) | apple/ml-mgie | - | MLLM-guided image editing; ICLR 2024 | | GenArtist | zhenyuw16/GenArtist | - | MLLM agent for unified generation + editing; NeurIPS 2024 Spotlight | | InstructPix2Pix | timothybrooks/instruct-pix2pix | - | Foundation instruction-based editing model | | InstructDiffusion | cientgu/InstructDiffusion | - | Generalist vision task interface; CVPR 2024 | | OmniGen | VectorSpaceLab/OmniGen | - | Unified image generation without extra modules | | UltraEdit | pkunlp-icler/UltraEdit | - | 4M editing sample dataset + models; NeurIPS 2024 | | Step1X-Edit | stepfun-ai/Step1X-Edit | - | Open source MLLM-based image editing; 2025 | | Qwen-Image | QwenLM/Qwen-Image | - | Foundation model for image generation + editing |

4.2 Tool Calling & Agent Frameworks

| Project | GitHub | Description | |---------|--------|-------------| | Gorilla | ShishirPatil/gorilla | LLM for API calls; Berkeley Function Calling Leaderboard; NeurIPS 2024 | | ToolACE-8B | HuggingFace: Team-ACE/ToolACE-8B | SOTA function calling at 8B parameters; ICLR 2025 | | Vercel AI SDK | vercel/ai | TypeScript SDK for AI apps; tool calling + generative UI; v6 with Agent abstraction | | LangChain / LangGraph | langchain-ai/langchain | Python/TS agent framework; graph-based workflows; MCP support | | MCP (Model Context Protocol) | modelcontextprotocol | Anthropic's open standard for tool integration; TS/Python/C#/Java SDKs | | RouteLLM | lm-sys/RouteLLM | Open source LLM routing framework; ICLR 2025 |

4.3 UI Frameworks for AI Chat

| Project | GitHub | Description | |---------|--------|-------------| | assistant-ui | assistant-ui/assistant-ui | TypeScript/React library for AI chat; tool call rendering; human approval flows | | Vercel AI Chatbot Template | vercel/ai-chatbot | Next.js + AI SDK reference implementation with generative UI | | Google A2UI | Announced 2025 | Open project for agent-driven interfaces; flat component list with incremental updates |

4.4 Billing Infrastructure

| Project | GitHub | Description | |---------|--------|-------------| | Lago | getlago/lago | Open source metering + usage-based billing; event-driven; $34M funding | | FlexPrice | flexprice/flexprice | Open source usage-based billing; credits + top-ups; AI-native focus |


5. Key Takeaways for SimplyScapes Implementation

5.1 Architecture Recommendation: Agent-Based Editing with Tool Calling

The research strongly supports an architecture where:

  • A conversational MLLM (e.g., Claude, GPT-4) serves as the orchestrator
  • It calls specialized image editing tools via function calling (Vercel AI SDK tool definitions)
  • A planning layer decomposes complex editing requests into sub-tasks (GenArtist pattern)
  • A verification step checks results and triggers self-correction if needed
  • State management tracks conversation history and edit history to prevent intention drift

5.2 Model Routing for Cost Optimization

Implement a tiered routing strategy: | Tier | Operations | Model | Est. Cost | |------|-----------|-------|-----------| | 0 - Deterministic | Crop, rotate, resize, brightness/contrast | No LLM needed (Sharp/Canvas API) | Near zero | | 1 - Lightweight | Simple object removal, background blur, color adjustment | Fine-tuned SD model or API | Low | | 2 - Standard | Style transfer, material swap, object addition | Mid-tier image model (SDXL/Flux) | Medium | | 3 - Complex | Scene composition, multi-element design, creative interpretation | Full MLLM agent + specialized tools | High |

A RouteLLM-style classifier at the conversation level determines which tier handles each request.

5.3 UI Pattern: Hybrid Chat + Generative UI

Based on the research, the optimal interface combines:

  1. Chat thread for natural language interaction and edit history
  2. Generative UI components rendered within the chat (sliders, color pickers, region selectors) via Vercel AI SDK RSC streaming
  3. Persistent canvas for image display with interactive overlays for spatial operations
  4. Before/after comparison views generated dynamically based on edit type
  5. Approval gates for destructive or high-credit operations (assistant-ui pattern)

5.4 Credit System Design

Map billing to user-understandable units:

  • Simple edits (Tier 0-1): 1 credit
  • Standard edits (Tier 2): 3-5 credits
  • Complex edits (Tier 3): 8-15 credits
  • Batch operations: Volume discount
  • Implement real-time credit counter in the UI
  • Offer credit packages aligned to common use cases (e.g., "Design Preview Pack: 50 edits")
  • Use Lago or FlexPrice for backend metering; Stripe for payment processing

5.5 Multi-Turn Conversation Design

Critical patterns from the research:

  1. Maintain explicit edit state -- track all edits as a stack for undo/redo
  2. Two-phase workflow (TDRI): Generate initial design, then iterative refinement
  3. Intention summarization: Periodically summarize the user's cumulative intent to prevent drift
  4. Context windowing: For long sessions, compress early conversation turns while preserving key decisions
  5. Multi-agent separation: Separate understanding, planning, execution, and verification into distinct agent roles (even if running on the same model)

5.6 Competitive Advantage Opportunities

Based on the gap analysis across all research:

  • No existing system combines landscape-specific editing with conversational AI -- the domain specialization is the moat
  • Agent-based architectures are published but not yet productized for consumer verticals -- there is a first-mover window
  • Cost optimization via routing is well-researched but rarely implemented in consumer image editing products
  • Generative UI for image editing is nascent -- most tools still use traditional UI with optional AI features
  • Multi-turn editing with state management is an unsolved UX problem at scale -- the research identifies the problems but production solutions are scarce

6. Research Gaps and Future Directions

  1. Domain-specific editing evaluation: No benchmarks exist for landscape/hardscape editing quality. Creating a domain-specific evaluation set would enable systematic quality improvement.

  2. Cost-quality Pareto frontier: While RouteLLM and FrugalGPT optimize for general LLM tasks, no published work optimizes the cost-quality trade-off specifically for image editing pipelines with heterogeneous models.

  3. Real-time collaborative editing: All multi-turn dialogue research assumes a single user. Collaborative design review (homeowner + designer) in a shared editing session is unexplored.

  4. Deterministic + generative hybrid pipelines: Most research focuses purely on generative approaches. Combining deterministic rendering (3D landscape visualization) with generative editing (style/material transfer) is a gap.

  5. User intent disambiguation for spatial operations: How to efficiently resolve ambiguous spatial references ("the area near the fence") in landscape images is under-explored in the academic literature.