In one of his interesting articles, first appeared on Scientific American during March ’62 and re-printed in Mathematical Puzzles and Diversions #4, Martin Gardner introduced us in the world of the computer working with nothing but matchboxes and balls.
The original idea was formulated by a biologist, Donald Michie, who almost forty years ago built the prototype of “MENACE” (“Matchbox Educable Naughts & Crosses Engine”), a primordial computer made by 300 matchboxes that was able to learn TIC-TAC-TOE.
The working principle of the machine is simple and brilliant: a real play situation, after the move of our opponent, is represented on each matchbox (e.g. empty board, or a cross in a corner and a circle in the middle, etc.).
Each square of the playing board is represented by one of nine different colours.
Inside the matchbox, at the beginning of the game, there lie a number of balls equal to the number of squares that players may occupy on their next move; and each ball has the colour of one of those squares.

For instance, the matchbox representing this situation will start with the following five balls: yellow, green, red, blue and black.
MENACE plays with the cross, and always starts the game. Hence, the first matchbox represents the empty board, and has nine different coloured balls.
How does the elaboration work? MENACE learns step by step, by using random move first, and refining his playing technique by adopting winning strategies later. The working algorithm is the following:
1) take the matchbox that represents the current playing situation;
2) draw a random ball from it, put it aside the box, and perform the move associated to that ball;
3) make your move;
4) go to 1).

 

So far, so good. Although a bit random. However, the intriguing part lies in the training of the “machine”, once the game ends:
1) If MENACE loses, eliminate all the balls that have been put aside the boxes;
2) If MENACE draws, for each ball that lies aside each box, add one ball of the same colour, and put everything back into the matchbox;
3) If MENACE wins, for each ball that lies aside each box, add three balls of the same colour, and put everything back into the matchbox.

With reference to the picture presented above, any of the MENACE’s possible moves (playing with Crosses) differing from playing in the red square, will be “punished” by eliminating the relevant red ball; this actually teaches the game and the wrong moves “did you play blue? Bad boy! Don’t do that again!”. If MENACE plays in the red square, at first at random, it is likely that it will either draw or win; hence, the “play in the red square” strategy becomes, little by little, predominant, until it becomes the only feasible, given that situation (after some hundreds game, though).
History tells us that after a good 220 games against its designer, MENACE was a professional tic-tac-toe player. The first games were Michie’s hands down, but already with the 20th play, MENACE started to draw and learnt that the best possible opening was to play in a corner. After some more games, MENACE started to win and became the ultimate tic-tac-toe champion.

This simple “prize & punishment” learning approach inspired Gardner’s fantasy and those of some thousands readers. Very few of them had so much time and matchboxes to build a MENACE, therefore they started investigating on other games. Gardner developed the Hexapawn, a game played on a 3x3 chessboard; each player has one single row of chess pawns, and his goal is to bring one of them on the other side. The strategy of this game is easy but not trivial, and can be studied and learnt using “only” 24 “memory matchboxes”.

I got inspired by these things, and since I had only two matchboxes at home, I designed a project that is slighlty different in the hardware components: I built Nimbron 2000 – a poker-card running computer that learns how to play NIM. Actually, a simplified version of NIM, using only three rows of tokens.
The rules in short: there are three rows of three tokens each. Each player eliminates from the game from one to all tokens of a row, and passes. The player eliminating the last token loses the game. (Some players play NIM in an opposite way: the player taking the last token wins; however, that does not make big differences.)


In order to build your own NIMBRON 2000, cut the 20 sheets that you find here, and grab several poker decks. You will need at least four decks, with the same back (if at all possible). Place the twenty sheets on the table in the following order:

Actually, you could use the order that you prefer, but it looks symmetric and nice this way.

The three-digit combinations on the sheets present the possible situations that may arise during the game. For instance, 223 means “there are two rows with two tokens, and one with three”, and 002 means “there is only one row, with 2 tokens”. Obviously, the order of the rows is irrelevant, therefore 023 and 302 represent the same situation; for this reason, the three-digit codes have been order from lowest to highest, just to make them more readable.

Now, place near each sheet one poker card for each of the possible options. For example, place near this sheet, representing the starting position, a face-down deck of three cards: an ace, a two and a three.

Now, place near each sheet one poker card for each of the possible options. For example, place near this sheet, representing the starting position, a face-down deck of three cards: an ace, a two and a three.

Put three rows of three tokens on the table, and your computer is ready to play!
The best way to teach it the game, is allowing it to take the first move (an experienced NIM player knows that starting means winning, but NIMBRON 2000 does not know – yet).

The NIMBRON 2000 operating algorithm is similar to the one of MENACE:

1) Find the sheet that represents the current situation of the game;
2) Draw a card at random from the covered deck near the sheet, leave the card face up close to the deck, and perform the corresponding move;
3) Make your move;
4) Start over from point 1).

To train the machine, do the following things:
1) If NIMBRON loses, remove the last card you have turned face up;
2) If NIMBRON wins, add another card of the same type for each face up card, for each of the decks.

If NIMBRON 2000 ends up on a sheet without cards left, then it resigns: it knows that its opponent is going to win anyway, unless he is a lucky monkey.
On the other hand, if a sheet only contains one option and more than one card, NIMBRON2000 starts making fun of its opponent thinking it has already won the match. A lucky monkey.
This happens, for example, when the machine learns that starting from a 011 situation it may always win.

These instructions represent the training I performed, but you can feel free to change them in order to make the training quicker or slower and/or “human-like”.

For instance:
Quick training:

1) If NIMBRON loses remove only the last card turned
2) If NIMBRON wins, add three cards of the same type for each face up card, for each deck and shuffle.

Through a quick training a winning strategy may be identified faster and implemented in a more strict way.

Out-of-memory training:
1) If NIMBRON loses, shuffle the decks.
2) If NIMBRON wins, add two cards of the same type for each face up card, for each deck and shuffle.

With this type of training, no move is ever inhibited, albeit it becomes less and less likely to happen. However, NIMBRON requires more time to be trained.

The results of some of my experiments are illustrated in the following. The first five games were mine hands down. The second game was just two moves! (Starting from NIMBRON: 133, 033, 003, 001 I win!).
During the fifth game, NIMBRON was even more dull (033, 023, 022, 012, 002, 001 I win): from012 it could have played 001 and win.

Ninth game marks the first victory for NIMBRON: 233, 123, 122, 112, 111, 011, 001 it wins! Beginner’s luck.

After only two more matches, the machine wins again: 233, 223, 222, 122, 022, 012, 001 and wins! You may note that the strategy it employed is very similar to the other one it used to win.

Around the 30th game NIMBRON starts preferring another strategy, chosen at first at random: it may have find it interesting. It starts by removing all the tokens from one row, and wind 50% of the times.

After nearly 60 games it is almost invincible. Sometimes it makes a stupid mistake that still allows me to win in extremis, but it learns from its mistakes. That’s what NIMBRON is programmed to do: learning from mistakes.

All right, so you built your poker-card-computer, and it learnt how to beat you at NIM. At the end of the experiment you might wonder: was this all worth the effort? I just taught a game to some pieces of paper.

Actually the experiment may result a little bit obsolete. It would be definitely more interesting to have a PC program that learns board games by playing itself and learning from its own mistakes.

Or even better, this simple AI mechanic could be used as the core mechanism for a board game with an artificial opponent that learns, once again, from its own mistakes.

What if “AL CABOHNE” (the BOHNANZA solitary, by Uwe Rosenberg), a game that already has an algorithm simulating an opponent, could be trained to learn from mistakes?
Or wouldn’t it be interesting to have Knizia’s LORD OF THE RINGS become more evil as you keep challenging him?

NIMBRON 2000 stares at me and says “That’s be cool!”. Maybe I played it just a little too much.



Sommario www.davincigames.com www.davincigames.com Sommario Cover Story www.davincigames.com Read and Play Golden Oldies
GịCondoR!