Fundamentals of Game Theory and Reinforcement Learning in Modern Applications

Photo by Erik Mclean on Unsplash

Every rational person prefers more to less, and businesses are out to maximize profit while minimizing cost. Game theory provides the opportunity to model strategic behaviours underlining the interaction of agents in a game form.

Machine learning can primarily be divided into supervised, unsupervised, and reinforcement learning. Reinforcement learning is a unique type of machine learning where the goal is to select actions to maximize future rewards—unlike supervised and unsupervised learning, which focuses on function approximation and description, respectively.

This article is the second edition of my Game Theory and Artificial Intelligence series. The first article focused on the basic concepts needed to give you a quick start in both domains.

In part 2, we’ll dive deep down into some important concepts in game theory, which include arriving at an equilibrium, types of strategies, the meaning of equilibrium, how to ascertain the equilibrium in a strategic situation, and how the various agents move in a game. This will provide the necessary ingredients to understand important concepts as we consider this intersection. Reinforcement learning—its meaning, the elements of reinforcement learning, how reinforcement learning is characterized, and the trade-offs between exploration and exploitation.


Introduction of game theory and |reinforcement learning

Game Theory:

  • Type of strategies
  • What is equilibrium
  • Types of equilibrium

Reinforcement Learning:

  • Elements of reinforcement learning
  • How reinforcement problems are characterized
  • The tradeoff between exploration and exploitation


Types of strategy:

A strategy is an action taken to arrive at an outcome that provides the best utility. A utility can be defined as the satisfaction derived from an outcome.

Types of Equilibrium

Equilibrium can be defined as the point of rest for players where they have no incentive to stray in their strategy.

Pure Nash equilibrium

In a pure Nash equilibrium strategy, players adopt a strategy that maximizes their payoff. A pure Nash equilibrium is a specification of a strategy for each player such that no player gains by changing his/her strategy, given that the other players don’t change their strategies. In other words, a pure strategy is the one that provides the maximum profit or the best outcome to players.

A couple of notes: Our focus is on individual deviation and not group; Nash equilibria are stable by nature. We consider a popular example of pure Nash equlibrium in game theory called Prisoners Dilemamn

Prisoner’s dilemma

From the diagram above both players work to achieve the best strategy. although, the best strategy would have been for both to declined , however, it ends up not being the best strategy because of the competitive nature of the game given that there is lack of trust between the prisoners s and also both of them are trying to avoid receiving the highest negative payoff. So both will prefer to confess rather than to declined. This makes it a better option for both , given the anticipated action of each other.The cell highlighted in red represents the equilibrium with the best payoff being negative.In this example every agents displays its best action to avoid regret.

Mixed strategy Nash equilibrium

In a mixed strategy, Nash equilibrium players adopt different strategies to obtain the possible outcome. Mixed strategy Nash equilibrium

A couple of notes: Players choose randomly among their options; and the best outcome is of mutual benefit to both players (forming) the mixed strategy.

We have two equilibria in the example above—when both player choices are similar same or they cooperate.

More Games

In the situation above (game), if they both make carry out a similar option like left/left or right/right the goalkeeper catches the penalty but if the player and goalkeeper choose differently for example left dive/right kick the player is happier.


i. Iterated Removal of Dominated Strategy (IRDS)

The process involved removing every dominated strategy


I. There is no preference order of elimination.

II. The outcome will still be reached notwithstanding whether strategy one or two comes first

Reinforcement Learning

Learning from the interaction is vital and fundamental to all the theories of learning and intelligence.

Our approach to learning will be from the computation perceptive from the perspective of artificial intelligence where machines are designed with a capability to solve problems various kind of real-world problems in business, scientific and social problems which can be evaluated from mathematical analysis and computation experiment with the use of reinforcement learning.

Reinforcement learning can either be strong methods or weak methods. Modern research is focused on the general principles of learning, search, and decision making.

Elements of Reinforcement Learning

A reward is an immediate, primary and short time goal focus of the agent. The reward could either give a bad or good signal for the agent

A value function covers all the amount of value expected by an agent over the long run starting from present to future, it is the secondary goal of the agent and it involves a lot of rewards

A policy defines the behaviour of agents at a given time. It involves mapping the perceived state of the environment and the action taking in responding to those particular state by the agent The policy could be a simple function of look-up to able or even an extensive computation such as a search function

Model of the environment is optional because an agent can also learn in an environment that is not model-based by trial and error.

Reinforcement learning is a closed-loop system because the learning system current action influences future input.

Reinforcement problem is characterized by


2. Not having direct instruction

3. Action have extended consequences

An agent must be able to sense the state of its environment and take a goal-driven action towards the state of the environment

Exploration and exploitation

Exploration helps the agent to discover new possibilities the agents benefit more is quests to get to new territories

Photo by Manuel Meurisse on Unsplash

In Exploitation the agent focus on the captured territories rather than taking another crazy adventure this gives more accurate reward with short time value.

Photo by Amir Taheri on Unsplash

Reinforcement learning is much more goal-directed than other types machine learning such as supervised learning and unsupervised learning because it focuses on learning how to map situation to action for a maximum numerical reward. A single agent system can learn to optimise behaviour based on trial error. However, most time the setting is a multi-agent setting and agents share the environment which makes it complex. The environment could either be static or dynamic. If Agents share similar reward signal (objective )the goal is to reach the global optimum with a lot of coordination but when agents share a conflicting or competitive reward signal (objective ) then the solution will be to look for an equilibrium using Nash equilibrium. Some examples of the multi-agent system include multi-robot set-ups, decentralized network routing, distributed load-balancing, electronic auctions, traffic control etc.

For further more on Reinforcement Learning check Deepmind Open course here Link and Introduction to Reinforcement Learning by Richard S. Sutton and Andrew G. Barto (2nd Edition).Link

You can also check more information on Game Theory here




Data Scientist |Researcher |Data Analyst @Data Science Nigeria. Master STEM Educator|Financial Engineering (MSC)in view. Entrepreneur

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Simple Introduction to Machine Learning

Top 10 AI News of 2016

OpenAI Releases Full GPT-2; Baidu Q3 Revenues Rise; EMNLP Announces 2019 Best Papers

Fighting Fire & Floods With Smart Emergency Systems

A Look at Google’s Efforts to Earn Public Trust Through ML Fairness and Responsible AI

RE•WORK AI in Insurance Summit NYC 2019: AI Underwriting, Fraud Detection, and More

AI21 Labs’ Augmented Frozen Language Models Challenge Conventional Fine-Tuning Approaches Without…

What Makes NFTs Special?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Michael Olanipekun

Michael Olanipekun

Data Scientist |Researcher |Data Analyst @Data Science Nigeria. Master STEM Educator|Financial Engineering (MSC)in view. Entrepreneur

More from Medium

Reinforcement Learning: from trial & error to deep Q-learning

Break into AI and Robotics with these 3 top reinforcement learning cou

Reinforcement Learning

Reinforcement Learning Snake Algorithm