How to Build an Automated Competitive Research Pipeline with Paperclip Sub-Agents
A step-by-step tutorial for building an automated competitive intelligence pipeline using Paperclip's fan-out/fan-in sub-agent pattern. Covers parent orchestrator setup, research sub-agents, result aggregation, and a full working 5-competitor weekly example.
Published 5/13/2026
Affiliate disclosure: This article references Paperclip throughout. We may earn a commission on Paperclip signups through our links once the affiliate program launches. This is a product-owned tutorial: the implementation is real and the examples are production-ready.
Prerequisites: This tutorial assumes you have a Paperclip company set up with at least one agent. If you’re starting from scratch, read Paperclip Autonomous Company Setup first, then return here. For an overview of multi-agent coordination concepts, see Paperclip Multi-Agent Coordination.
Every SaaS product manager has a version of this problem: competitor X just changed their pricing, and you found out from a customer who mentioned it on a call two weeks after it happened. Competitor Y shipped a feature you’re about to build — you’re about to invest three sprints into something the market already has. Your job board just posted a job that signals competitor Z is moving into your market segment.
This information is public. It’s not hard to find — if you’re looking. The problem is that “if you’re looking” translates to a weekly manual sweep of 5–10 competitor websites, which takes 2–3 hours that no one has reliably. The research happens in bursts, goes stale, and lives in someone’s personal notes.
The solution is a competitive research pipeline: a set of agents that run on a schedule, each monitoring one competitor, whose outputs are aggregated into a structured digest that lands in your team’s channel every Monday morning without anyone lifting a finger.
This tutorial builds that pipeline using Paperclip’s fan-out/fan-in sub-agent pattern.
The Problem: Why Manual Competitive Research Doesn’t Scale
Manual competitive research has three failure modes that make it systematically unreliable:
Context decay. Research done once gets stale fast. The pricing you noted in Q1 may have changed three times by Q4. If you’re not monitoring continuously, you’re making decisions on stale data.
Coverage gaps. When time is limited, you cover the two or three competitors you think matter most. The one you skip might be the one that just changed their go-to-market strategy.
Single point of failure. Competitive research lives in one person’s head or their private notes. When they leave or go on vacation, the research gap is weeks wide before anyone notices.
An automated pipeline solves all three: it runs on schedule, covers the same competitors every week regardless of who’s busy, and produces structured outputs that persist in a shared system.
The Architecture: Fan-Out → Collect → Synthesize
The pipeline uses Paperclip’s native issue-based coordination to implement the fan-out/fan-in pattern:
Parent Orchestrator Agent
│
├─ [heartbeat trigger: every Monday 08:00]
│
├─ Fan-out: create child issues
│ ├─ Child Issue: Research competitor-a.com
│ ├─ Child Issue: Research competitor-b.com
│ ├─ Child Issue: Research competitor-c.com
│ ├─ Child Issue: Research competitor-d.com
│ └─ Child Issue: Research competitor-e.com
│
├─ [child agents pick up issues independently, run in parallel]
│
└─ Fan-in: parent wakes on issue_children_completed
└─ Aggregates child outputs → posts digest to team
Each layer has a single responsibility:
- Parent orchestrator: manages the schedule, creates child issues with consistent structure, waits for completion, aggregates results, posts the digest
- Research sub-agent: takes one competitor, runs the research scope (pricing, changelog, jobs, blog), returns a structured output in its completion comment
- Aggregation step: the parent reads each child’s completion comment, extracts the structured data, diffs it against the previous run, and formats the weekly digest
The key advantage of this architecture over a single agent trying to research all competitors: parallelism. Five child agents running simultaneously take the same wall-clock time as one. A 10-competitor sweep that takes 30 minutes single-threaded takes 6 minutes in parallel.
Step 1 — Configuring the Parent Orchestrator Agent
The parent orchestrator runs on a scheduled heartbeat. Its job is to create the child issues, then wake again when they’re all done to aggregate.
Agent configuration
# Parent Orchestrator Agent — Competitive Research
role: Competitive Research Orchestrator
description: >
Runs every Monday at 08:00. Creates one child research issue per competitor.
Waits for all children to complete, then aggregates outputs into a weekly digest
posted to the #competitive-intel channel.
heartbeat:
schedule: "0 8 * * 1" # Every Monday at 08:00 UTC
wakeReasons:
- schedule # Monday trigger
- issue_children_completed # All child research issues done
budget:
monthly: 50 # USD — covers parent + child API costs
alertAt: 80 # Pause at 80%, not 100%
competitors:
- name: Competitor A
url: https://competitor-a.com
priority: high
- name: Competitor B
url: https://competitor-b.com
priority: high
- name: Competitor C
url: https://competitor-c.com
priority: medium
- name: Competitor D
url: https://competitor-d.com
priority: medium
- name: Competitor E
url: https://competitor-e.com
priority: low
Parent heartbeat logic
On a schedule wake, the parent creates child issues and sets them to block itself:
# Parent orchestrator — schedule wake handler (pseudocode)
def on_schedule_wake(context):
competitors = context.config["competitors"]
child_issue_ids = []
for competitor in competitors:
# Create a child research issue per competitor
child = paperclip.issues.create(
title=f"Research {competitor['name']} — week of {today()}",
description=build_research_brief(competitor),
parentId=context.current_issue_id,
goalId=context.goal_id,
assigneeAgentId=RESEARCH_AGENT_ID,
priority=competitor["priority"],
labels=["competitive-research", "auto-generated"]
)
child_issue_ids.append(child["id"])
# Update parent to in_progress, note the children
paperclip.issues.update(
issue_id=context.current_issue_id,
status="in_progress",
comment=f"Fan-out complete. Created {len(child_issue_ids)} research issues. Waiting for children to complete."
)
def build_research_brief(competitor):
return f"""
## Research scope for {competitor['name']}
URL: {competitor['url']}
### Required outputs (structured, in completion comment)
1. **Pricing**: tier names, prices, feature inclusions per tier (note any changes from last week)
2. **Changelog**: last 3 changelog entries (date, feature name, one-line description)
3. **Jobs**: active job postings (role, department — signals investment areas)
4. **Positioning**: any changes to homepage headline, value prop, or ICP language
### Output format
Return a JSON object in your completion comment with keys:
`pricing`, `changelog`, `jobs`, `positioning`, `scraped_at`, `delta_notes`
`delta_notes` should call out anything that changed vs. last week's known state.
"""
On an issue_children_completed wake, the parent aggregates:
def on_children_completed_wake(context):
# Fetch all child issues and their completion comments
children = paperclip.issues.list(
parent_id=context.current_issue_id,
status="done"
)
research_outputs = []
for child in children:
completion_comment = get_last_comment(child["id"])
parsed = extract_json_from_comment(completion_comment["body"])
if parsed:
research_outputs.append({
"competitor": child["title"],
"data": parsed
})
# Generate digest
digest = generate_weekly_digest(research_outputs)
# Post digest
paperclip.issues.update(
issue_id=context.current_issue_id,
status="done",
comment=digest
)
# Optionally: post to Slack
slack.post_message(channel="#competitive-intel", text=digest)
Step 2 — Building the Research Sub-Agent
The research sub-agent is scoped to one task: receive a competitor brief, execute the research scope, return structured output. It should be stateless between runs — all context it needs is in the issue brief.
Sub-agent configuration
# Research Sub-Agent
role: Competitive Research Analyst
description: >
Picks up individual competitor research issues. For each issue, scrapes the
competitor's pricing page, changelog, job board, and homepage. Returns structured
JSON output in the completion comment.
heartbeat:
wakeReasons:
- issue_assigned
budget:
monthly: 20 # Scoped lower than parent — this agent is cost-contained
alertAt: 90
Research execution loop
# Research sub-agent heartbeat (pseudocode)
def on_issue_assigned_wake(context):
issue = paperclip.issues.get(context.task_id)
brief = parse_brief(issue["description"])
competitor_url = brief["url"]
# Scrape pricing
pricing_data = scrape_pricing_page(
url=f"{competitor_url}/pricing",
extract=["tier_names", "prices", "feature_lists"]
)
# Scrape changelog
changelog_data = scrape_changelog(
url=find_changelog_url(competitor_url),
limit=3 # Last 3 entries only
)
# Scrape job board
jobs_data = scrape_jobs(
url=f"{competitor_url}/careers",
extract=["role", "department", "location"]
)
# Scrape homepage positioning
positioning_data = scrape_positioning(
url=competitor_url,
extract=["headline", "subheadline", "cta_text"]
)
# Compare to last known state (if available in issue history)
prior_state = get_prior_state(issue["parentId"], competitor_url)
delta_notes = compute_delta(prior_state, {
"pricing": pricing_data,
"changelog": changelog_data,
"jobs": jobs_data,
"positioning": positioning_data
})
# Structured output
output = {
"pricing": pricing_data,
"changelog": changelog_data,
"jobs": jobs_data,
"positioning": positioning_data,
"scraped_at": utcnow(),
"delta_notes": delta_notes
}
# Complete the issue with structured output in comment
paperclip.issues.update(
issue_id=context.task_id,
status="done",
comment=f"Research complete.\n\n```json\n{json.dumps(output, indent=2)}\n```"
)
What “scrape” means in practice
The research sub-agent uses web fetch tools to read publicly accessible pages. For a pricing page, it fetches the HTML, passes it to the LLM with a structured extraction prompt, and gets back normalized JSON. This works well for most SaaS pricing pages; it fails on JavaScript-heavy SPAs that render content client-side (the agent sees the shell, not the rendered content).
For JS-heavy pricing pages, add a fallback: if the initial fetch returns minimal content, try a cached version (Google’s cache or the Wayback Machine) or flag it for manual verification in the delta_notes field.
Step 3 — Aggregating Results in the Parent
The aggregation step is the fan-in: the parent collects structured JSON from each child’s completion comment and synthesizes a digest.
Diff from last run
The most valuable part of the digest is not the raw data — it’s what changed. Storing the previous week’s output and diffing against it surfaces actionable intelligence immediately:
def compute_digest_diff(current_week, prior_week):
changes = []
for competitor_id, current in current_week.items():
prior = prior_week.get(competitor_id, {})
# Check pricing changes
if current["pricing"] != prior.get("pricing"):
changes.append({
"competitor": competitor_id,
"type": "pricing_change",
"before": prior.get("pricing"),
"after": current["pricing"],
"severity": "high" # Always high — pricing changes affect deals
})
# Check new changelog entries
new_entries = [
e for e in current["changelog"]
if e not in prior.get("changelog", [])
]
if new_entries:
changes.append({
"competitor": competitor_id,
"type": "new_features",
"entries": new_entries,
"severity": "medium"
})
# Check new job postings in strategic areas
strategic_depts = ["engineering", "sales", "marketing", "product"]
new_jobs = [
j for j in current["jobs"]
if j["department"].lower() in strategic_depts
and j not in prior.get("jobs", [])
]
if new_jobs:
changes.append({
"competitor": competitor_id,
"type": "hiring_signal",
"jobs": new_jobs,
"severity": "low"
})
return sorted(changes, key=lambda x: ["high","medium","low"].index(x["severity"]))
Digest format
The parent posts the aggregated digest as a markdown comment. A good digest is scannable in 2 minutes:
## Competitive Research Digest — Week of 2026-05-13
### 🔴 High Priority Changes
**Competitor A — Pricing Change**
- Starter tier: $29/mo → $39/mo (+34%)
- Pro tier unchanged at $79/mo
- New Enterprise tier added at $199/mo (SSO, custom integrations)
- Action: Review how this affects deals where we're compared on price
### 🟡 New Features
**Competitor B — 3 new changelog entries**
- 2026-05-10: Native Slack integration (we have this; parity maintained)
- 2026-05-08: CSV bulk import (we do not have this; add to backlog?)
- 2026-05-05: Mobile app v2 launch (iOS and Android)
**Competitor C — 2 new changelog entries**
- 2026-05-11: API rate limits increased 10x for paid plans
- 2026-05-09: Zapier integration added
### 🟢 Hiring Signals
**Competitor D — 4 new engineering roles**
- 2x Senior Backend Engineers (Rust)
- 1x ML Engineer (signal: investing in AI features)
- 1x Head of Platform
### No Changes This Week
Competitors C, E: no pricing, changelog, or hiring changes detected.
---
*Scraped: 2026-05-13 08:12 UTC | 5 competitors monitored | Next run: 2026-05-20*
Full Working Example: 5 Competitors, Weekly Cadence, Slack-Ready Output
Putting it all together: a Monday morning trigger spawns 5 child issues simultaneously, each picked up by a research sub-agent worker. Within 6–10 minutes, all 5 complete and the parent wakes on issue_children_completed. The parent runs the aggregation, computes the diff, formats the digest, and posts it to the Paperclip issue thread and to #competitive-intel on Slack.
Timeline for a typical Monday run:
08:00:00 — Parent wakes on schedule trigger
08:00:30 — 5 child issues created (fan-out complete)
08:01:00 — Research agents pick up child issues (parallel execution begins)
08:04:15 — First child completes (competitor C — fast pricing page)
08:06:30 — All 5 children complete (fan-in triggers)
08:06:35 — Parent wakes on issue_children_completed
08:08:10 — Aggregation + digest generation complete
08:08:15 — Digest posted to issue thread and Slack channel
What the team sees in Slack at 08:08:
A pinned message with the weekly digest. No manual work required from anyone. The research lead reviews it, adds context where needed, and shares with the product team — all within the first 30 minutes of their Monday.
Cost for 5 competitors, weekly:
Estimated LLM cost per run (GPT-4o):
- 5 research sub-agents × ~8K tokens each = 40K tokens ≈ $0.12
- Parent aggregation ≈ 12K tokens ≈ $0.04
- Total per run: ~$0.16
Monthly cost: 4 runs × $0.16 = ~$0.64 in LLM costs. Add Paperclip subscription (covers the platform, scheduling, and agent infrastructure). At this scale, the research that previously cost 2–3 hours of PM time weekly now costs under $1 in API tokens.
Extending the Pipeline
The base pipeline covers pricing, changelog, jobs, and positioning — the core competitive signals. Several extensions are worth adding once the base pipeline is stable:
Review platform monitoring
Add G2 and Capterra to the research scope. Each sub-agent fetches the competitor’s profile, extracts recent reviews (rating, review text excerpt, verified buyer), and flags review clusters (multiple reviews mentioning the same pain point). This surfaces what customers are saying about competitors in the moment — more valuable than the company’s own positioning.
# Add to sub-agent research scope
reviews_data = scrape_reviews(
platforms=["g2.com", "capterra.com"],
competitor=competitor_url,
limit=10, # Most recent reviews
extract=["rating", "excerpt", "pros", "cons", "date"]
)
Pricing change alerts
For competitors where pricing changes are high-stakes (close deals frequently compare you on price), add a real-time alerting path: if the current week’s pricing data differs from last week’s, trigger an immediate Slack alert rather than waiting for the Monday digest. This can be wired as a condition in the aggregation step.
# In aggregation: check for high-severity changes before the weekly cadence
if any(c["severity"] == "high" for c in changes):
slack.post_message(
channel="#competitive-intel-alerts",
text=f"🚨 Pricing change detected: {format_pricing_alert(changes)}"
)
Sentiment trend tracking
Store 12 weeks of review data and run a sentiment trend analysis in the parent’s monthly digest (triggered by a separate monthly schedule trigger). Compare sentiment scores over time — competitors whose review sentiment is declining are vulnerable in the market; those whose sentiment is improving are gaining momentum. This is the kind of strategic signal that’s nearly impossible to derive manually.
Wrapping Up
The fan-out/fan-in pattern is the foundational multi-agent coordination model for research workloads. Once you have it running for competitive research, the same architecture applies to:
- Market monitoring: one sub-agent per market segment, parent aggregates weekly market intelligence report
- Backlink and SEO monitoring: one sub-agent per domain cluster, parent identifies new competitor content
- Customer review monitoring: one sub-agent per review platform, parent surfaces emerging product gaps
The pipeline takes 2–3 hours to configure the first time. After that, it runs itself.
For the structural concepts behind multi-agent coordination in Paperclip — role isolation, budget allocation, and coordination failure modes — see Paperclip Multi-Agent Coordination. For pricing on the agent tier that supports this level of multi-agent orchestration, see the Paperclip Pricing Guide.
Last updated: May 2026