---
name: system-migration-extraction
version: 1
description: Extract ALL credentials, IPs, connection methods, and configuration from legacy systems before migration/deletion.
triggers:
  - migrate from old system
  - extract credentials from legacy
  - system migration
  - 迁移数据
  - 提取凭证
  - 旧系统数据
  - before deleting old data
---

# System Migration & Data Extraction

When migrating from one system to another (e.g., OpenClaw → Hermes), you MUST comprehensively extract all important information before the old system can be deleted.

## The Lesson Learned

User had 11GB of old system data. AI suggested deleting it to free space. User correctly pointed out: "很多重要信息你都没有很好的迁移，比如所有龙虾实例的ip以及对应都连接方式" — many important details like server IPs, connection methods, and credentials were NOT migrated.

**Never assume migration is complete until user confirms.**

## Extraction Checklist

### 1. Server/Instance Information
- [ ] All server IPs (public + private/VPN)
- [ ] Hostnames and aliases
- [ ] SSH credentials (user, password, port)
- [ ] Tailscale/VPN IPs
- [ ] Hardware specs (cores, RAM, disk)
- [ ] Working directories
- [ ] Service ports
- [ ] Status (active/deprecated)

### 2. API Keys & Credentials
- [ ] LLM API keys (OpenAI, Anthropic, etc.)
- [ ] Vision model keys
- [ ] Search API keys (Tavily, Google, etc.)
- [ ] OCR service credentials
- [ ] Bot tokens (QQ, Discord, Telegram, etc.)
- [ ] OAuth tokens
- [ ] Database credentials

### 3. Network Configuration
- [ ] Proxy settings (SOCKS5, HTTP)
- [ ] VPN configurations
- [ ] Port mappings
- [ ] Firewall/security group rules
- [ ] DNS settings

### 4. User Information
- [ ] Names, aliases, nicknames
- [ ] Contact info (email, phone, QQ)
- [ ] School/work affiliation
- [ ] Current projects/tasks
- [ ] Preferences and rules

### 5. Application Configuration
- [ ] Model configurations (which model for what)
- [ ] TTS/voice settings
- [ ] Channel/platform configurations
- [ ] Automation rules and cron jobs

### 6. Path Mappings
- [ ] Old system paths → New system paths
- [ ] Config file locations
- [ ] Data directories
- [ ] Backup locations

## Extraction Process

1. **Inventory first**: List all config files, memory files, and data directories in old system
   ```bash
   sudo ls -la /path/to/old/system/memory/
   sudo ls -la /path/to/old/system/scene_blocks/
   ```

2. **Parallel extraction with delegate_task**: When there are multiple files (e.g., scene_blocks), use delegate_task to read them in parallel batches of 3:
   ```python
   # Example: 9 scene_block files → 3 parallel batches
   delegate_task(tasks=[
     {"goal": "Extract from file1.md", "toolsets": ["terminal", "file"]},
     {"goal": "Extract from file2.md", "toolsets": ["terminal", "file"]},
     {"goal": "Extract from file3.md", "toolsets": ["terminal", "file"]},
   ])
   ```
   Each subagent should output structured markdown with tables.

3. **Consolidate**: Merge subagent outputs into a single comprehensive document organized by category (instances, credentials, ports, etc.)

4. **User review**: Present to user for verification — they know what's important. User corrections like "虾咪也配置了QQ" mean your extraction missed something.

5. **Save permanently**: Store in `~/.hermes/important/` or equivalent persistent location with clear filename like `龙虾实例与凭证完整清单.md`

6. **Update memory pointer**: Add a memory entry pointing to the consolidated file so future sessions can find it

7. **Only then discuss deletion**: Never suggest deleting old data until extraction is verified complete

## Output Format

Create a structured markdown document with:
- Clear sections for each category
- Tables for credentials and connection info
- Explicit status indicators (✅ active, ⚠️ deprecated, ❌ offline)
- Last updated timestamp

## Content Deduplication

When migrating learning notes, logs, or cron-generated content, check for duplicates first:

```bash
# Count records by separator
sudo grep -c "^======" /path/to/notes.md

# Check chapter distribution
sudo grep -o "科目 - [^=]*" /path/to/notes.md | sort | uniq -c | sort -rn
```

If same chapters repeat many times (e.g., cron job running hourly on 3 chapters = 100+ duplicates), deduplicate before migrating:

```python
import re

def dedup_notes(filepath, outpath):
    with open(filepath, 'r', encoding='utf-8') as f:
        content = f.read()
    
    # Split by timestamp pattern【YYYY-MM-DD HH:MM】
    pattern = r'(【\d{4}-\d{2}-\d{2} \d{2}:\d{2}】[^\n]+)'
    parts = re.split(pattern, content)
    
    records = {}
    current_header = None
    current_content = ""
    
    for part in parts:
        if re.match(r'【\d{4}-\d{2}-\d{2}', part):
            if current_header:
                # Extract chapter name as key
                match = re.search(r'- (.+)$', current_header)
                if match:
                    chapter = match.group(1).strip()
                    records[chapter] = current_header + "\n" + current_content
            current_header = part
            current_content = ""
        else:
            current_content += part
    
    # Save last record
    if current_header:
        match = re.search(r'- (.+)$', current_header)
        if match:
            records[match.group(1).strip()] = current_header + "\n" + current_content
    
    # Write deduplicated file
    with open(outpath, 'w', encoding='utf-8') as f:
        f.write(f"# Notes (deduplicated)\n\nKept {len(records)} chapters\n\n")
        for chapter, content in records.items():
            f.write("="*55 + "\n" + content + "\n\n")
    
    return len(records)
```

**Key insight**: Ask user "要我检查是否有重复吗？" before blindly copying large note files. Deduplication can reduce 1.8MB → 39KB.

## Scene Memory Migration

OpenClaw's `memory-tdai/scene_blocks/` contains structured topic memories (e.g., "毕业论文与FedProxV2研究.md", "Android设备与ADB调试.md"). These are valuable context that should be migrated:

```bash
# List scene blocks
sudo ls -lh /root/.openclaw/memory-tdai/scene_blocks/

# Copy to migration directory
mkdir -p ~/.hermes/openclaw_migration/memory-tdai/scene_blocks
sudo cp /root/.openclaw/memory-tdai/persona.md ~/.hermes/openclaw_migration/memory-tdai/
sudo cp "/root/.openclaw/memory-tdai/scene_blocks/"*.md ~/.hermes/openclaw_migration/memory-tdai/scene_blocks/
```

Also check for `persona.md` — it contains user archetype and interaction preferences.

## Custom Skills Identification

Not all skills in workspace/skills/ are custom — many are installed from skillhub. Check for local modifications:

```bash
# Skills without .skillhub marker are local/modified
for skill in $(sudo ls /root/.openclaw/workspace/skills/); do
    if ! sudo test -f "/root/.openclaw/workspace/skills/$skill/.skillhub"; then
        echo "[LOCAL] $skill"
    fi
done
```

Prioritize migrating skills that:
1. Have custom scripts (e.g., `quark-ocr/scripts/quark_ocr.py`)
2. Contain user-specific configurations
3. Are referenced in user's workflows

## Local Memory System Alternative

When migrating from cloud-based memory services (e.g., mem0) that have request limits, consider setting up a local vector memory system. See `references/local-vector-memory.md` for:
- Chroma + embedding architecture
- Installation and usage
- API alternatives when primary embedding service is down

## Pitfalls

1. **Partial migration is worse than no migration** — user loses access to old system thinking data is migrated, but critical info is missing

2. **"I already migrated the important stuff"** — you probably didn't. Memory systems capture preferences, not infrastructure details like IPs and ports

3. **Credentials scattered across multiple files** — check `.env`, config files, memory files, scene blocks, etc.

4. **User corrections are signals** — when user says "你漏了X", that means your extraction was incomplete. Go back and do it properly.

5. **Don't suggest deletion prematurely** — disk space pressure is not an excuse to lose irreplaceable configuration data

7. **Search API keys live in plugin configs, not top-level** — Tavily, Brave, Exa, SearXNG keys are stored under `plugins.entries["openclaw-<name>"].config.webSearch.apiKey` in `openclaw.json`, not in the main `.env`. Always grep plugin entries specifically:
   ```bash
   sudo python3 -c "
   import json
   with open('/root/.openclaw/openclaw.json') as f:
       d = json.load(f)
   for k, v in d.get('plugins', {}).get('entries', {}).items():
       cfg = v.get('config', {})
       if 'webSearch' in cfg or 'apiKey' in cfg:
           print(k, json.dumps(cfg, indent=2))
   "
   ```
   Then add to `~/.hermes/.env` as `TAVILY_API_KEY`, `BRAVE_API_KEY`, etc.

8. **Verify files before sending** — when transferring files from user's PC, always verify MD5 matches after download. User had a case where AI sent an outdated file version to their advisor because the download grabbed a cached/old version instead of the current one.

9. **Credential typos in extracted docs** — even after careful extraction, typos happen (e.g., `MYNmym888` vs `MYNmyn888` — m vs n). When SSH login fails with "Permission denied" but you're sure the server is right, search the OpenClaw session history for the correct password:
   ```bash
   grep -h "服务器名.*密码\|password.*服务器\|ssh.*IP地址" ~/.hermes/openclaw_sessions_backup/memory_conversations/*.jsonl | head -10
   ```
   Historical conversations often contain the correct credentials that were actually used successfully.