Continuous Function for Prisoners' Dilemma


The concept of a two-player non-zerosum game, commonly known as "Prisoner's Dilemma" has been written about extensively elsewhere, so this introduction will only outline the basic ideas. The termology of calling it a game with two players, is taken from the study of game theory, but the same ideas can occur in economic situations, or social situations, even international diplomacy.

Two Player Game with Discrete Choices

In its simplest form, each player has two possible choices, and they reveal their choices simultaneously. Their choices are to be COOPERATIVE or UNCOOPERATIVE and this results in four possible outcomes:

Player [B]
Players work together;
both players get the PEACE reward.
Player [A] gets the TEMPTATION reward;
Player [B] gets the SUCKER penalty.
Player [B]
Player [A] gets the SUCKER penalty;
Player [B] gets the TEMPTATION reward
Players go to war;
both players get the WAR penalty.

It should be noted that the action COOPERATIVE means being being cooperative with the other player. A full background on the story behind the Prisoners' Dilemma can be found elsewhere, and sometimes other names are given to the outcomes. In general though, we can put the four possible payoffs into a rank order from most desirable to least desirable:

Most desirable TEMPTATION Gain 10 points
Rewarding PEACE Gain 5 points
Penalty WAR Lose 15 points
Least desirable SUCKER Lose 20 points

The exact point values might change, depending on the circumstance, but the above point value were chosen for this simulation and (as plotted below) they give a linear transfer when converted into a continuous value function. A bit of study should make it obvious that if we start in a position were everyone is COOPERATIVE, then everyone gets the PEACE reward, which is not the best option for each individual, but it is the best situation for the group as a whole. One particular individual might consider switching her attitude to an UNCOOPERATIVE position, which in the short term will provide this one individual with the TEMPTATION payoff (twice the value!) but other individuals end up getting hit with a penalty as they are made into SUCKERs by this action (the worst possible penalty).

The group as a whole is worse off if individuals choose to go down this path so it is logical that some sort of retribution mechanism would exist to discourage the temptation. It should be clear that if some sort of collective entity did exist that was able to perfectly reward the virtuous and punish the deceivers, then we would not be playing a Prisoner's Dilemma game at all anymore, and the whole payoff structure would be different. In this simulation (and in most real-world situations) no godlike entity exists, and it is merely up to the individuals to devise a mechanism between themselves.

Many studies have already been done on the Iterated Prisoners' Dilemma (IPD) and Spatial Prisoner's Dilemma (SPD) using state machines or similar ideas.

Converting Discrete Choices to Continuous Values

It is a fairly simple process to think of the UNCOOPERATIVE action as an input value of 0.0 into a function, and the COOPERATIVE action as an input value of 1.0. Thus, the intermediate values between these two extremes can be interpolated and (for example) a value of 0.5 is midway between COOPERATIVE and UNCOOPERATIVE. The result is a function taking two input values, each of which is limited to the range from 0.0 to 1.0 and the following code snippit shows this expression written in the "C" language:

z = ( 0
  +     att_A     *     att_B     * PAYOFF_PEACE
  + ( 1 - att_A ) *     att_B     * PAYOFF_TEMPTATION
  +     att_A     * ( 1 - att_B ) * PAYOFF_SUCKER
  + ( 1 - att_A ) * ( 1 - att_B ) * PAYOFF_WAR

The following plots show the output surface of this function. Since the payoff table is symmetric, (i.e. it is a fair game) the payoff output of one player is merely flipped on the diagonal to give the payoff of the other player. More interesting is the collective payoff that is the some of the two players. The surface is flat in all cases (i.e. the function is linear) although different payoff weightings can change this. In the situation where it is a flat surface (i.e. current set of payoff weightings) we can calculate a constant first derivative for the individual (it is always -5) and also calculate a constant first derivative for the group (it is always 20). Thus, we have a situation where every incremental increase in attitude (from UNCOOPERATIVE towards COOPERATIVE) leaves the individual worse off, but the group end up better of by 4 times as much.

Surface Plot of PD Function

References and Further Reading