Fundamentals of Game Theory and Reinforcement Learning in Modern Applications
Every rational person prefers more to less, and businesses are out to maximize profit while minimizing cost. Game theory provides the opportunity to model strategic behaviours underlining the interaction of agents in a game form.
Machine learning can primarily be divided into supervised, unsupervised, and reinforcement learning. Reinforcement learning is a unique type of machine learning where the goal is to select actions to maximize future rewards—unlike supervised and unsupervised learning, which focuses on function approximation and description, respectively.
This article is the second edition of my Game Theory and Artificial Intelligence series. The first article focused on the basic concepts needed to give you a quick start in both domains.
In part 2, we’ll dive deep down into some important concepts in game theory, which include arriving at an equilibrium, types of strategies, the meaning of equilibrium, how to ascertain the equilibrium in a strategic situation, and how the various agents move in a game. This will provide the necessary ingredients to understand important concepts as we consider this intersection. Reinforcement learning—its meaning, the elements of reinforcement learning, how reinforcement learning is characterized, and the trade-offs between exploration and exploitation.
Agenda
Introduction of game theory and |reinforcement learning
Game Theory:
- Type of strategies
- What is equilibrium
- Types of equilibrium
Reinforcement Learning:
- Elements of reinforcement learning
- How reinforcement problems are characterized
- The tradeoff between exploration and exploitation
Conclusion
Types of strategy:
A strategy is an action taken to arrive at an outcome that provides the best utility. A utility can be defined as the satisfaction derived from an outcome.

Types of Equilibrium
Equilibrium can be defined as the point of rest for players where they have no incentive to stray in their strategy.
Pure Nash equilibrium
In a pure Nash equilibrium strategy, players adopt a strategy that maximizes their payoff. A pure Nash equilibrium is a specification of a strategy for each player such that no player gains by changing his/her strategy, given that the other players don’t change their strategies. In other words, a pure strategy is the one that provides the maximum profit or the best outcome to players.
A couple of notes: Our focus is on individual deviation and not group; Nash equilibria are stable by nature. We consider a popular example of pure Nash equlibrium in game theory called Prisoners Dilemamn


From the diagram above both players work to achieve the best strategy. although, the best strategy would have been for both to declined , however, it ends up not being the best strategy because of the competitive nature of the game given that there is lack of trust between the prisoners s and also both of them are trying to avoid receiving the highest negative payoff. So both will prefer to confess rather than to declined. This makes it a better option for both , given the anticipated action of each other.The cell highlighted in red represents the equilibrium with the best payoff being negative.In this example every agents displays its best action to avoid regret.
Mixed strategy Nash equilibrium
In a mixed strategy, Nash equilibrium players adopt different strategies to obtain the possible outcome. Mixed strategy Nash equilibrium
A couple of notes: Players choose randomly among their options; and the best outcome is of mutual benefit to both players (forming) the mixed strategy.

We have two equilibria in the example above—when both player choices are similar same or they cooperate.
More Games

In the situation above (game), if they both make carry out a similar option like left/left or right/right the goalkeeper catches the penalty but if the player and goalkeeper choose differently for example left dive/right kick the player is happier.

i. Iterated Removal of Dominated Strategy (IRDS)
The process involved removing every dominated strategy
Note
I. There is no preference order of elimination.
II. The outcome will still be reached notwithstanding whether strategy one or two comes first

Reinforcement Learning
Learning from the interaction is vital and fundamental to all the theories of learning and intelligence.
Our approach to learning will be from the computation perceptive from the perspective of artificial intelligence where machines are designed with a capability to solve problems various kind of real-world problems in business, scientific and social problems which can be evaluated from mathematical analysis and computation experiment with the use of reinforcement learning.
Reinforcement learning can either be strong methods or weak methods. Modern research is focused on the general principles of learning, search, and decision making.

Elements of Reinforcement Learning
A reward is an immediate, primary and short time goal focus of the agent. The reward could either give a bad or good signal for the agent
A value function covers all the amount of value expected by an agent over the long run starting from present to future, it is the secondary goal of the agent and it involves a lot of rewards
A policy defines the behaviour of agents at a given time. It involves mapping the perceived state of the environment and the action taking in responding to those particular state by the agent The policy could be a simple function of look-up to able or even an extensive computation such as a search function
Model of the environment is optional because an agent can also learn in an environment that is not model-based by trial and error.
Reinforcement learning is a closed-loop system because the learning system current action influences future input.
Reinforcement problem is characterized by
1.Closed-loop
2. Not having direct instruction
3. Action have extended consequences
An agent must be able to sense the state of its environment and take a goal-driven action towards the state of the environment
Exploration and exploitation
Exploration helps the agent to discover new possibilities the agents benefit more is quests to get to new territories
In Exploitation the agent focus on the captured territories rather than taking another crazy adventure this gives more accurate reward with short time value.
Reinforcement learning is much more goal-directed than other types machine learning such as supervised learning and unsupervised learning because it focuses on learning how to map situation to action for a maximum numerical reward. A single agent system can learn to optimise behaviour based on trial error. However, most time the setting is a multi-agent setting and agents share the environment which makes it complex. The environment could either be static or dynamic. If Agents share similar reward signal (objective )the goal is to reach the global optimum with a lot of coordination but when agents share a conflicting or competitive reward signal (objective ) then the solution will be to look for an equilibrium using Nash equilibrium. Some examples of the multi-agent system include multi-robot set-ups, decentralized network routing, distributed load-balancing, electronic auctions, traffic control etc.
For further more on Reinforcement Learning check Deepmind Open course here Link and Introduction to Reinforcement Learning by Richard S. Sutton and Andrew G. Barto (2nd Edition).Link
You can also check more information on Game Theory here