Decide under partial observability · trained by self-play
believe
play card
outcome
update π
observe
hand · trump · played
the agent
policy net
PyTorch · pick a card
the world
environment
4 players · trump suit
reward → learn
trick & round
outcome
self-play
π trains against π