ZeroClaw — Daemon

Layer 3 학습. 실제 소스 코드 기반. 파일: src/daemon/mod.rs

한 줄 정의

Daemon = 여러 컴포넌트를 하나의 프로세스 안에서 동시에 실행하고 감시하는 슈퍼바이저.

zeroclaw daemon
    │
    ├─ Gateway 컴포넌트        (HTTP 서버)
    ├─ Channels 컴포넌트       (Telegram, Discord, Slack …)
    ├─ Heartbeat 컴포넌트      (주기적 LLM 자율 작업)
    ├─ Scheduler 컴포넌트      (Cron 작업)
    └─ StateWriter             (daemon_state.json 5초마다 플러시)

run() — 전체 흐름

pub async fn run(config: Config, host: String, port: u16) -> Result<()> {
    // 1. 상태 기록 태스크 시작 (5초마다 JSON 플러시)
    handles.push(spawn_state_writer(config.clone()));

    // 2. Gateway 슈퍼바이저
    handles.push(spawn_component_supervisor("gateway", ...));

    // 3. Channel 슈퍼바이저 (채널이 설정된 경우에만)
    if has_supervised_channels(&config) {
        handles.push(spawn_component_supervisor("channels", ...));
    }

    // 4. Heartbeat 슈퍼바이저 (heartbeat.enabled = true 시)
    if config.heartbeat.enabled {
        handles.push(spawn_component_supervisor("heartbeat", ...));
    }

    // 5. Scheduler 슈퍼바이저 (cron.enabled = true 시)
    if config.cron.enabled {
        handles.push(spawn_component_supervisor("scheduler", ...));
    }

    // 6. SIGINT / SIGTERM 대기
    wait_for_shutdown_signal().await?;

    // 7. 모든 태스크 abort + 대기
    for handle in handles { handle.abort(); }
}

spawn_component_supervisor — 지수 백오프 재시작

fn spawn_component_supervisor<F, Fut>(
    name: &'static str,
    initial_backoff_secs: u64,
    max_backoff_secs: u64,
    mut run_component: F,   // FnMut() -> Future<Output = Result<()>>
) -> JoinHandle<()>

컴포넌트 실행
    │
    ├─ Ok(())  → "exited unexpectedly" 에러 기록 → backoff 리셋 → 재시작
    └─ Err(e)  → 에러 기록 → 재시작

재시작 간격:
  첫 실패:   initial_backoff (기본 2초)
  반복 실패:  2초 → 4초 → 8초 → ... → max_backoff (기본 60초)
  성공적 실행 후 재시작: backoff 리셋

핵심 설계: 컴포넌트가 Ok(())로 정상 종료해도 "예상치 못한 종료"로 간주하고 재시작. 데몬 컴포넌트는 무한 루프여야 하기 때문.

health::mark_component_ok/error/bump_restart로 상태 추적 → daemon_state.json에 기록.

시그널 처리

async fn wait_for_shutdown_signal() -> Result<()> {
    // SIGINT  → 종료
    // SIGTERM → 종료
    // SIGHUP  → 무시 (터미널/SSH 연결 끊겨도 데몬 유지)
}

SIGHUP 무시가 핵심 — SSH 세션이 끊겨도 데몬이 계속 실행됨. systemctl이나 nohup으로 실행했을 때도 안정적.

spawn_state_writer — 헬스 상태 파일

5초마다:
  health::snapshot_json()  →  daemon_state.json 에 기록

{
  "components": {
    "gateway":   { "status": "ok", "restart_count": 0, ... },
    "channels":  { "status": "ok", ... },
    "heartbeat": { "status": "error", "last_error": "...", ... },
    "scheduler": { "status": "ok", ... }
  },
  "written_at": "2026-03-22T10:00:00Z"
}

위치: {config_dir}/daemon_state.json zeroclaw status 명령이 이 파일을 읽어서 컴포넌트 상태를 표시.

Heartbeat 워커 — 자율 주기 작업

데몬의 독특한 컴포넌트. LLM이 스스로 정해진 작업을 주기적으로 실행.

실행 흐름

[N분마다 (기본 5분)]
    │
    ├─ collect_runnable_tasks() — 활성 태스크 수집
    │
    ├─ two_phase = true 이면:
    │   Phase 1: LLM에게 "어떤 태스크를 실행할까?" 물어봄
    │            → 선택된 태스크만 Phase 2로
    │
    └─ Phase 2: 각 태스크마다
            session_context 로드 (최근 대화 20개)
            crate::agent::run() → LLM 에이전트 실행
            응답 → 채널로 전달 (Telegram 등)
            실행 결과 → heartbeat_runs.json 기록

Adaptive interval

consecutive_failures가 늘수록 interval도 늘어남 (백오프)
high_priority 태스크 있으면 interval 짧게 유지
min_interval_minutes ~ max_interval_minutes 범위 내

Deadman switch

[heartbeat]
deadman_timeout_minutes = 60

→ 마지막 tick으로부터 60분 이상 경과하면:
  "⚠️ Heartbeat dead-man's switch: no tick in 60 minutes"
  → deadman_channel/to 또는 기본 채널로 알림

Session context 주입

{workspace}/sessions/{channel}_{to}.jsonl 파일에서
최근 20개 메시지 로드 → heartbeat 태스크 프롬프트 앞에 붙임

→ "지난 대화를 참고해서 다음 태스크를 실행해"
→ User/assistant 메시지 없으면 (heartbeat 응답만 있으면) 스킵

Heartbeat 배송 설정

[heartbeat]
enabled = true
interval_minutes = 30
target = "telegram"       # 배송 채널
to = "my_telegram_id"     # 수신자

# 선택 사항
two_phase = true           # Phase 1 LLM 판단 후 실행
adaptive = true            # 실패 시 interval 늘림
min_interval_minutes = 5
max_interval_minutes = 60
deadman_timeout_minutes = 120
load_session_context = true  # 최근 대화 컨텍스트 주입
max_run_history = 100       # 실행 이력 최대 개수

자동 감지: target/to 미설정 시 Telegram allowed_users[0]로 자동 감지. Discord/Slack/Mattermost는 수신자를 특정할 수 없어서 자동 감지 불가.

컴포넌트 간 관계

Daemon
  │
  ├─ Gateway        ← REST API, WebSocket, Webhook HTTP 서버
  │   └─ AppState   (Provider, Memory, Tools 공유)
  │
  ├─ Channels       ← Telegram/Discord/Slack long-running listener
  │   └─ process_channel_message() → run_tool_call_loop()
  │
  ├─ Heartbeat      ← 주기 타이머 → agent::run() → 채널 배송
  │
  └─ Scheduler      ← cron.rs → 예약된 작업 실행

Gateway와 Channels는 독립 프로세스가 아니라 같은 tokio 런타임 위의 태스크. 각자 별도의 Provider, Memory 인스턴스를 가짐 (run_gateway / start_channels 각각 초기화).

config.toml 설정

[reliability]
channel_initial_backoff_secs = 2   # 첫 재시작 대기
channel_max_backoff_secs = 60      # 최대 재시작 대기

[cron]
enabled = true

[heartbeat]
enabled = true
interval_minutes = 30

실행 명령

# 데몬 시작
zeroclaw daemon

# 헬스 상태 확인
zeroclaw status
# → daemon_state.json 읽어서 컴포넌트별 상태 출력

# 데몬 중단 (Gateway에 admin 요청)
zeroclaw gateway --stop
# → POST /admin/shutdown (localhost 전용)

ZeroClaw — Daemon

ZeroClaw — Daemon

한 줄 정의

run() — 전체 흐름

spawn_component_supervisor — 지수 백오프 재시작

시그널 처리

spawn_state_writer — 헬스 상태 파일

Heartbeat 워커 — 자율 주기 작업

실행 흐름

Adaptive interval

Deadman switch

Session context 주입

Heartbeat 배송 설정

컴포넌트 간 관계

config.toml 설정

실행 명령

관련