Coursework · Multi-Agent Systems

MAS

Ten-agent system for evaluating real-estate purchases in Aix-en-Provence. Six public APIs, a shared call budget, 185 tests. One week.

Context

The brief was structural, not domain-specific: at least six agents, at least four public data sources, traceable decisions, and a bounded number of API calls. We picked real-estate decision support for Aix-en-Provence — given a portfolio of purchase targets, decide which ones deserve deeper analysis, valuate the survivors from independent angles, and return a graded recommendation per property.

Team of three. I owned the code end-to-end — agents, orchestration, data preparation, tests. My teammates owned the report and the demo. Without their work this would have been a pile of Python no one could defend in front of a jury.

The problem

Real estate combines several signals: prior transactions, neighborhood demographics, environmental risk, geographic features, parcel boundaries. Each lives in a separate public API — DVF, INSEE, BAN, IGN, Géorisques — with its own format and its own rate limit. A useful pipeline reads all of them, weighs them, justifies every decision, and stays inside a 100-call per-portfolio budget. The interesting work is the orchestration discipline, not the agents themselves.

The approach

Ten specialized agents, communicating through typed Python dataclasses — no message bus, no JSON over the wire. The orchestrator schedules them by expected value: a local DVF pre-score first (zero API calls), then BAN address validation (fails fast on bad addresses), then full enrichment only on the survivors. A shared BudgetTracker, injected into every agent, decrements the call budget before any HTTP request. When the budget is gone, the orchestrator stops cleanly.

Every enriched property goes through two evaluators in parallel. EvaluateurA averages weighted DVF comparables; EvaluateurB fits a sectoral linear regression and adjusts the projected price by a socio-economic attractiveness index. An Arbitrage agent reconciles the two with explicit rules — confidence-weighted on agreement, alert and pondered on moderate divergence, prudent average on strong divergence. A Moderation agent then grades the result into one of four levels: information, weak recommendation, strong recommendation, or required human validation.

Result

161 unit tests cover models, agents, traceability, and the budget tracker against synthetic fixtures. 24 end-to-end tests run the full pipeline against the real BAN, IGN, and Géorisques APIs — no mocks, no recorded fixtures. The choice was deliberate: a mocked test suite passes when our code is wrong but the API contract has changed. With real network calls, the suite catches breaking changes in upstream APIs as part of every run.

The output is a typed RapportFinal: per-property decisions with justifications, risk and attractiveness scores, the full state of the API budget, and the identified best opportunity in the portfolio.

What we left out

Three things the pipeline does not do — on purpose.

EvaluateurA stays purely local. The original spec mentioned a DVF API fallback when local comparables are sparse; we widened the local search radius instead, keeping the call budget honest.

The 100-call budget lives in process memory, with no persistence by calendar date. A small store could fix that, but it didn't fit the one-week scope.

The entry point is a Jupyter notebook, not a CLI. Adequate for a class demo, not for a daily operator.