Product 02 · Adaptive trading agentPaper-trade · v0

Mahoraga

Adapts in production. Never falls to the same regime twice.

Inception: Q2 2026
Status: Paper-trade · v0

Status quo

What Mahoraga replaces

Single-strategy trading bots that die the first time the regime changes against them. Static allocations that need a human to rebalance after the loss has already happened. Black-box ML agents that learn but can't show their work. Mahoraga's defining property is the opposite: every loss, every miscall, every drawdown is observed and shifts the next decision, and every shift is on the ledger.

The thesis

Mahoraga is named for the adaptive entity that cannot be killed by the same technique twice. The product carries the same property. A Gaussian-mixture regime classifier reads the market state in real time; a LinUCB contextual bandit selects the strategy arm best-suited to that regime given current context; an append-only adaptation ledger records every regime call, every arm switch, every parameter update. The regime that hurt the agent yesterday tilts its posterior away from the strategies that hurt in that regime. The adaptation is online, the exploration is confidence-bounded, and every decision is on the ledger.

In one diagram

The regime that hurt yesterday
is sampled less today.

The current market sits in one of four GMM-identified regimes (highlighted in gold below). The bandit's posterior for each regime / arm pair is updated by every closed trade, losses in a regime push the bandit away from the arms that produced them, the next time that regime is detected.

Vmahoraga.console / regime-space

0 adaptations · last 24hOnline

Current regime

Trending

Selected arm

idx_mom

Confidence

0.00

Exploration

UCB · bounded

Bandit posterior · weight by regimeSums to 1.00

Calm0.00

Dormant

Trending0.00

Current · idx_mom

Mean-reverting0.00

Suppressed · loss scar

Stressed0.00

Suppressed · loss scar

Recent decisionsPosterior shifts visible

14mtrendingidx_mom selectedconf 0.78
4dstressedgold-mr LOSS-2.1% · posterior -0.24
22dm-revertidx-mom LOSS-1.4% · posterior -0.18

What it does

6 capabilities, no overlap.

Loss-aware adaptation

Every closed loss updates the bandit's posterior for the regime / arm pair that produced it. The same arm that lost in a given regime is sampled less aggressively the next time that regime is detected. The agent does not get hit by the same combination twice.

GMM regime classifier

A Gaussian-mixture model identifies the prevailing market regime from a rolling 60-day feature window, realised vol, cross-sectional dispersion, term structure, correlation collapse. Re-fit daily; re-classified intraday.

LinUCB contextual bandit

For the identified regime, a contextual bandit selects the best-performing arm given current context. Exploration is bounded by an upper-confidence interval that widens during regime transitions and tightens as the agent accumulates evidence.

Append-only adaptation ledger

Every regime call, every arm switch, every weight update is hash-chained and exportable. Tamper-detectable on export; seven-year retention by default; PDF and CSV out.

Built-in risk controls

Per-arm drawdown caps, per-regime exposure ceilings, post-loss cooldown windows. The agent governs its own behaviour, there is no separate operator to override the controls mid-run.

Pluggable strategy arms (v0)

Initial library covers FX trend, gold mean-reversion, equity-index momentum, and a cash-defensive sleeve. Each arm implements a small Python protocol and runs through the same risk pipeline, no special privileges, no special exits.

Specifications

Spec sheet · v1.

Regime classifier: Gaussian Mixture (GMM); 4 regimes default, 3–7 configurable
Feature window: 60-day rolling; 24 indicators · re-fit daily
Strategy arms: 4; FX trend · gold MR · idx mom · cash
Bandit: LinUCB; contextual, confidence-bounded
Exploration bound: 1-σ UCB; configurable α · auto-widens in transition
Adaptation cadence: Daily re-fit · intraday re-select
Audit ledger: Hash-chained · append-only; tamper-detectable on export
Risk controls: Built-in; per-arm DD · cooldown · exposure cap
Status: Paper-trade · v0; live deployment target Q3 2026

Integration

What integration looks like.

Mahoraga is configured in Python. Strategy arms are pluggable; the risk controls and adaptation ledger are not.

mahoraga.config.py

python

# mahoraga.config.py
from mahoraga import (
    GMMRegimeClassifier,
    LinUCBAgent,
    RiskControls,
)

classifier = GMMRegimeClassifier(
    n_regimes=4,
    feature_window="60d",
    refit_cadence="1d",
    features=["realised_vol", "dispersion",
              "term_structure", "corr_collapse"],
)

agent = LinUCBAgent(
    classifier=classifier,
    strategy_arms=[
        "fx_trend",
        "gold_mr",
        "idx_mom",
        "cash_defensive",
    ],
    confidence_alpha=1.0,             # exploration bound
    risk=RiskControls(
        per_arm_drawdown_cap=2.0,     # % of NAV
        post_loss_cooldown="30m",
        max_exposure_per_regime=15.0, # % of NAV
    ),
)

agent.run()

Design principles

What we will not compromise.

01Never repeat the regime that hurt you
02Adapt the choice, not the constraints
03Confidence-bounded exploration, never random
04Every adaptation goes on the ledger

Questions

The ones we’re asked most.

If yours isn’t here, email vendraholdings@gmail.com.

What does “adapts” actually mean here, how is this different from re-training a model overnight?: Adaptation is per-decision, not per-batch. Each new observation (regime fix, trade outcome) updates the bandit's posterior immediately; the next strategy selection uses the updated posterior. No nightly retrain, no manual redeploy. The GMM re-fits daily, but the bandit responds within minutes.
Why a contextual bandit instead of full reinforcement learning?: LinUCB gives confidence-bounded exploration with closed-form updates, interpretable, sample-efficient, and provably bounded in regret. Full RL (PPO) would need orders of magnitude more data and would lose the closed-form interpretability we need for audit. PPO is on the v2 roadmap as a separate non-linear arm, not a replacement.
What happens if the GMM mis-classifies the regime?: The bandit's confidence interval widens during regime-transition periods, automatically reducing exploration aggression. Mahoraga's per-arm drawdown caps and cooldown windows catch the rest, if adaptation is wrong AND the loss reaches the cap, the arm is suspended before it compounds.
Can I see Mahoraga's regime calls in real time?: Every regime call, every strategy selection, every parameter update is in the adaptation ledger. Exportable as JSON or PDF; visualisable in the operator console as a stream alongside ACIE decisions.
Why isn't this live yet?: We require six months of paper-trade with targeted max drawdown under 4% before any client deployment. v0 is in month 3. Live deployment target: Q3 2026. Subscribers will be notified before activation.
Can I write my own strategy arms?: Yes. Strategy arms implement a small Python protocol (`propose_orders(state) -> List[Order]`). Custom arms run through the same built-in risk controls, no special privileges, no separate cooldown logic.
Will Mahoraga be available via copy-trade?: Yes, after live deployment. Subscribers will be able to copy Mahoraga-generated orders to their own broker account through the same copy-trade rail.