The problem

Hokm is a four-player, trick-taking card game with a trump suit and, crucially, hidden hands — you never see what your opponents and partner hold. That makes it a small but honest test bed for the hard part of game AI: deciding well under partial observability, where the optimal move depends on belief about unseen state, not just the board in front of you.

What I built

A reinforcement-learning agent, in PyTorch, that learns to play Hokm:

  • An encoding of the observable game state — your hand, the trump suit, cards played so far, whose turn it is.
  • A learned policy trained through self-play, so the agent improves against versions of itself rather than a fixed heuristic opponent.
  • Dynamic game-state logging, so a full match can be replayed and inspected move by move.
  • An interactive interface to play against the trained agent directly.
A reinforcement-learning loop under partial observability — observe the visible state, choose a card, play it, and learn from the outcome — with the policy trained against copies of itself.

Outcome

A playable agent and an end-to-end loop — environment, training, logging, and a human-facing UI — small enough to reason about completely, which was the point: a clean place to build intuition for RL under imperfect information.

Tech

Python, PyTorch.