Tool Call 루프와 Multi-Agent 패턴

Tool Call의 본질

모델은 함수를 직접 실행하지 못한다. 대신 응답에 "이 함수를 이 인자로 호출해줘"라는 요청을 담아 돌려보낸다. 실행은 항상 클라이언트(우리) 몫이고, 결과를 다시 messages에 담아 보내줘야 모델이 다음 판단을 할 수 있다.

Tool Call 루프 구조

while True:
    response = api.call(messages)

    if response.finish_reason == "stop":
        break  # 모델이 최종 응답 완료

    if response.finish_reason == "tool_calls":
        for tool_call in response.tool_calls:
            result = execute(tool_call)
            messages.append(assistant_message)   # 모델 응답 추가
            messages.append(tool_result_message) # 실행 결과 추가
        # 루프 계속 → 모델이 결과 보고 다음 판단

흔한 버그

무한루프: 루프 안에서 response를 갱신하지 않으면, 루프 조건을 항상 같은 오래된 응답으로 평가하게 된다.
불필요한 이중 호출: 루프 종료 후 다시 API를 호출하면, 이미 완료된 응답을 한 번 더 만들게 된다.

Orchestrator-Agent 패턴

Single agent는 하나의 messages 배열이 계속 자라는 구조다. 주제가 섞이면 모델이 이전 맥락을 끌고 다녀서 노이즈가 생긴다.

Multi-agent는 agent마다 독립된 context를 가진다:

Orchestrator
    │
    ├── run_agent(task_A, tools) → result_A
    ├── run_agent(task_B, tools) → result_B
    └── aggregate(result_A, result_B) → 최종 출력

run_agent()는 독립된 messages 배열을 가진 Tool Call 루프다. 호출할 때마다 새 context에서 시작한다.

설계에서 핵심 결정 3가지

1. Context 공유 vs 격리

agent 간: 격리 (각자 독립된 맥락)
Orchestrator: agent 결과를 모두 수집해서 통합

2. Sequential vs Parallel

| 패턴 | 언제 | Python | |------|------|--------| | Sequential | 앞 결과가 다음 input이 될 때 | 순서대로 await | | Parallel | agent들이 독립적일 때 | asyncio.gather() |

다중 주제 검색처럼 서로 의존성이 없는 경우는 parallel이 자연스럽다.

3. 실패 처리 경계

agent 내부에서 1차 처리 (tool call 에러, timeout 등)
Orchestrator는 "이 agent 전체가 실패했다"만 알면 충분
세부 에러를 Orchestrator까지 올리면 orchestration 로직이 복잡해진다

run_agent() 추상화

def run_agent(task: str, tools: list, system_prompt: str = "") -> str:
    messages = [{"role": "user", "content": task}]
    if system_prompt:
        messages.insert(0, {"role": "system", "content": system_prompt})

    while True:
        response = client.chat.completions.create(
            model=MODEL, messages=messages, tools=tools
        )

        if response.choices[0].finish_reason == "stop":
            return response.choices[0].message.content

        # tool call 처리 후 messages에 추가
        ...

Orchestrator는 이걸 여러 번 호출하는 역할만 한다.

다음 단계

model_test에 적용할 때 결정할 포인트:

검색 agent들을 parallel로 돌릴지 sequential로 돌릴지
에러 처리 경계를 어디에 둘지
Orchestrator가 결과를 어떻게 통합할지 (단순 concatenation vs 모델에 한 번 더 위임)

→ 구체 설계는 설계하자 프로젝트에서

Sequential에서 Context는 정말 커지나?

Single agent sequential은 하나의 messages 배열에 모든 대화가 쌓여서 진짜로 커진다.

Multi-agent sequential은 다르다. 각 agent는 독립된 messages 배열로 시작하고, 이전 agent의 내부 사고 과정과 tool call 왕복은 버려진다. 다음 agent에게 넘어가는 건 압축된 결과만이다.

Agent A
  messages: [task_A + tool call 왕복...]
  → result_A (최종 출력만 추출)

Agent B
  messages: [result_A를 input으로, task_B...]  ← A의 내부는 없음
  → result_B

핵심 Trade-off: 다음 agent에게 뭘 얼마나 넘길 것인가

| 넘기는 것 | context 크기 | 정보 보존 | |-----------|-------------|----------| | 결론만 | 작음 | 손실 위험 | | 중간 추론 포함 | 중간 | 부분 보존 | | 전체 raw output | 큼 | 사실상 single agent |

판단 기준: 다음 agent가 앞 agent의 결론만 필요한가, 과정도 필요한가

task 성격이 이 결정을 이끈다. 설계 시 태스크별로 따져야 한다.