Coursework · Multi-Agent Systems
MAS
Ten-agent system for evaluating real-estate purchases in Aix-en-Provence. Six public APIs, a shared call budget, 185 tests. One week.
Context
The brief was structural, not domain-specific: at least six agents, at least four public data sources, traceable decisions, and a bounded number of API calls. We picked real-estate decision support for Aix-en-Provence — given a portfolio of purchase targets, decide which ones deserve deeper analysis, valuate the survivors from independent angles, and return a graded recommendation per property.
Team of three. I owned the code end-to-end — agents, orchestration, data preparation, tests. My teammates owned the report and the demo. Without their work this would have been a pile of Python no one could defend in front of a jury.
The problem
Real estate combines several signals: prior transactions, neighborhood demographics, environmental risk, geographic features, parcel boundaries. Each lives in a separate public API — DVF, INSEE, BAN, IGN, Géorisques — with its own format and its own rate limit. A useful pipeline reads all of them, weighs them, justifies every decision, and stays inside a 100-call per-portfolio budget. The interesting work is the orchestration discipline, not the agents themselves.
The approach
Ten specialized agents, communicating through typed Python
dataclasses — no message bus, no JSON over the wire. The
orchestrator schedules them by expected value: a local DVF
pre-score first (zero API calls), then BAN address validation
(fails fast on bad addresses), then full enrichment only on the
survivors. A shared BudgetTracker, injected into
every agent, decrements the call budget before any HTTP request.
When the budget is gone, the orchestrator stops cleanly.
Every enriched property goes through two evaluators in parallel.
EvaluateurA averages weighted DVF comparables;
EvaluateurB fits a sectoral linear regression and
adjusts the projected price by a socio-economic attractiveness
index. An Arbitrage agent reconciles the two with
explicit rules — confidence-weighted on agreement, alert and
pondered on moderate divergence, prudent average on strong
divergence. A Moderation agent then grades the
result into one of four levels: information,
weak recommendation, strong recommendation,
or required human validation.
Result
161 unit tests cover models, agents, traceability, and the budget tracker against synthetic fixtures. 24 end-to-end tests run the full pipeline against the real BAN, IGN, and Géorisques APIs — no mocks, no recorded fixtures. The choice was deliberate: a mocked test suite passes when our code is wrong but the API contract has changed. With real network calls, the suite catches breaking changes in upstream APIs as part of every run.
The output is a typed RapportFinal: per-property
decisions with justifications, risk and attractiveness scores,
the full state of the API budget, and the identified best
opportunity in the portfolio.
What we left out
Three things the pipeline does not do — on purpose.
EvaluateurA stays purely local. The original spec
mentioned a DVF API fallback when local comparables are sparse;
we widened the local search radius instead, keeping the call
budget honest.
The 100-call budget lives in process memory, with no persistence by calendar date. A small store could fix that, but it didn't fit the one-week scope.
The entry point is a Jupyter notebook, not a CLI. Adequate for a class demo, not for a daily operator.