Decide under partial observability · trained by self-play believe play card outcome update π observe hand · trump · played the agent policy net PyTorch · pick a card the world environment 4 players · trump suit reward → learn trick & round outcome self-play π trains against π