A Superfast Solver

It's a command line program 
Watch this Newsletter, as BCSolve is currently being updated
with even more features, and the Article will expand with
a description of the program.
Obs: New information at the end of the text.
Version:  0.82 Release.   1. Nov. 2003 
BCSolve082.zip.  160KB The Program, and some data files 

This program is released to the public domain. You
are free to distribute it, as long as you distribute
this license with it. You may use it for any
noncommercial purpose. You may not modify it or
reverse engineer it. Any damage that it causes to your
computer or anything else is your responsibility.
Copyright 2003
Bruce Cropley
cropleyb@yahoo.com.au 

From Melbourne in Australia, Bruce Cropley has been making a breathtakingly
fast puzzle solver. This amazing program is currently supporting:
SOMA, Double SOMA, SOMA+Plus, and the Bedlam cube.
Although it is a DOS like, command line program, this BCSOLVE
will produce all solutions to a given puzzle in extremely short time.
The BCSOLVE manual:
Usage
Discrete packing puzzle solver.
Version 0.82, 1 November 2003
Author: Bruce Cropley (cropleyb at yahoo.com.au)
Usage: BCSolve082 [dQS] [s piece_set] [F sols] [ffigurefile] [m model]
d: Show startup debugging (initial # moves per piece)
Q: Don't show solutions found. (for testing)
S: Show solution statistics
s: [SEDB]: Use which piece set
SOMA, Extended SOMA, Double SOMA, Bedlam
Usually defaults using the number of cubelets in the figure.
F: sols: Calculate a given number of solutions. Default is all.
f: figurefile: Solve a figure described in the figure file.
Default is a cube. (Soma and Bedlam)
m: Model to solve within the file.
Defaults to the first in the file.
The solver just searches for the first line starting
with / that contains the given model name.
Example: BCSolve082 ss fcubit3.fig >cubit3.res
Included in the zip file are a few sample figure template files; these
are example inputs to the f argument.
It uses the number of cubelets in the figure to
determine which piece set to use:
 27: Soma
 40: Soma plus
 54: Double soma
 64: Bedlam
Anything other number and it will currently refuse to solve it. I plan to eventually offer support for any combination of pieces, as apparently does ISHINO Keiichiro's solver.
Note that BCSolve only copes with figures where the entire piece set is used.
Here's a win32 executable for the BCSolve for you. It was originally built with g++ under Linux, was ported first to Mac OS 10.2 and is currently also built with g++ for windows machines.
About Solvers
These are the 5 Soma solvers that we currently know about:
1. Bundgaard/McFarren's QBASIC program: Courtney made the
algorithms, Thorleif the graphics and user interface.
This is possibly the slowest solver on the list
(1.5 seconds per figure on the average), but it's QBASIC?
and the figures are easy to show in the web.
2. Ross Canning's Windows solver: Slow (1
sec./fig.), but very easy to use, and small in size
to boot. Have the most amazing and easy graphical
user interface.
3. Bob Nungester's Windows solver: Faster (.5
sec/fig.). NOTE: all of the above are timed for a
SINGLE solution.
4. Sivy Farhi's DOS solver: Very fast. It
can solve multiple solutions in a second.
5. Bruce Cropley's BCSolve: Extremely fast, written in GNU C++
using binary bit flipping to achieve a blazing speed.
Solving multiple solutions in less than 1 second.
About BCSolve
Bruce writes (interspersed with snippets of conversation with Thorleif and Courtney):
I wrote BCSolve starting in late 2000. When I was a teenager my family played with a Soma puzzle, which I still have. I bought a Bedlam puzzle and was unable to solve it after 3 weeks of concerted effort. Being a professional software engineer, I had the idea of writing a program to put it back into the original shape. I realised that the problem was computationally very demanding and spent a couple of weeks writing an optimised by inflexible and hacky solver to just solve the Bedlam cube problem. I got it to find all 19186 solutions in a time of around 16 hours, and then forgot about it for a few years.
In August 2003, I started working on the solver again. I had the idea of using it as a way of practicing what I preach with regards to good software implementation. That is, writing test code first, refactoring, doing regular version control checkins, and other such things. For more information about my inspiration for quality software, have a look at XProgramming.com and www.extremeprogramming.org. I was also keen to write something really FAST because I was (and still am) considering consulting work speeding up software.
BCSolve doesn't have any graphics yet. It is written in ObjectOriented C++ and writes out a textual representation of all the solutions it finds.
It finds 11520 SOMA cube solutions (including all rotations and reflections) and writes them to a file (~850K) in 0.56s on my 2.4GHz Pentium 4 running Windows XP with 256MB RAM. I am currently working on trimming the solutions so that each unique solution type is found only once. This is proving very tricky. *grin*
It finds solutions to the "Bedlam" cube puzzle at a rate of around 16.5 solutions per second on the same machine. In the Bedlam puzzle, there are 13 pieces  12x5cubes and 1x4cubes. Each piece is different and there are 19186 unique solutions.
So how does it work?
Fast machine operations for exploring the search space
In C/C++ you can do a bitwise AND and bitwise OR something like this:
int a=0x06;
int b=0x14;
int a_or_b = ab; // == 0x16
int a_and_b = a&b; // == 0x04
These bitwise operations will usually convert directly to a single instruction in machine code. I didn't actually need to use any assembly code. C++, being a compiled language, converted my bitwise operations into fast machine instructions.
Now if you represent each piece placement as a 32 bit or 64 bit integer (1 bit per cubelet), and you also represent the current occupancy of the entire cube as an integer of the same size, you can use very fast low level machine instructions to place and unplace pieces from the current figure state, and to check the validity of a given piece placement.
Bits and arbitrary figures
Let's assume we want to solve any arbitrary figure, and assume we
allocate 1 bit for each small cube. We still have to formulate
connection rules, telling which bits are connected, and thus which are
available for positioning by each puzzle piece.
Initially I was only solving a Bedlam cube, so the algorithm that maps each cube into a bit of the integer only had to handle that case.
At that stage, I represented the 4*4*4 cube as
64 bits (in GNU C this is an "unsigned long long"
native type) and converted each cubelet like so:
Cubelet(x,y,z) > 1 << (x + 4*y + 16*z)
for x,y,z in [0,3]
Given:
 PiecePosition: a 64 bit integer, with 4 bits set at "1"
 Structure: another 64bitinteger, with "1"s for occupied cubelets in the current figure state
PiecePosition BITWISEAND Structure NOTEQUALS 0
THEN we can't put the piece in that position.
Note that only one of the cubelets has to collide to cause a problem.
However it doesn't really matter what the mapping
is between the cubes in the figure and the bits.
The solver only really has to do conversions between 3d and bits when mapping each possible piece position to a (bit string == integer) figure occupancy initially, and the reverse when it finds solutions.
All the collision detection, placement and unplacement code just operates on the (integer) occupancies.
The way I've coded it is that missing cubes in the figure don't get allocated bits.
Anything outside the figure is just not considered during the search, only when we're initially working out all the possible positions for each piece.
Algorithmic Optimisations
Each time you place a piece, what can you then tell about what is possible?
Each piece can be placed in many positions.
But once we have placed a piece, that reduces
the set of options for each remaining piece.
We want to minimise the number of branches that
we take at each node of the search tree.
If we cut down the options remaining at each level,
there is less to do at the later pieces if we get
that far.
Now another thing we want to do is minimise the
wasted time on searching when it is possible to
tell that there are no solutions from a given
position. The sooner we can tell that we are on a dead end branch the better.
Parity Optimisation
BCSolve doesn't use the parity optimisation yet, that could
yield substantial improvements for some piece sets such as Double SOMA. I'm not sure how much it would help with the Bedlam cube as there are two +3/3
pieces, 12 +1/1 pieces and 1 0 piece.
I guess that you can only tell that there
can be no solutions if you get further away from
the desired parity (Bedlam: 0) than it is possible
to return to with the remaining pieces.
Double SOMA optimisation
Whenever you have two or more identical pieces,
they could be swapped in a given solution.
We probably only want one such solution. One way
of ensuring that you only get one of each such
solution is to examine an ordering of the piece
occupancies before placement.
For example, if the positions are allocated a
number (eg board occupancy bit string), and
we have 2 identical piece:
Piece 1 positions
3, 5, 6
Piece 2 positions
3, 5, 6
Then we can say we only want the solutions where
piece 1 is numerically below piece 2.
ie we don't want any solution such as:
piece 1: 5, piece 2: 3
You can generalise this to N identical pieces.
In the code, when you're going through the piece 2
positions, you would just say:
if piece2.position <= piece1.position
move on to next piece2 position.
This should reduce the number of positions that
you have to recurse into for each second piece
by a factor of 2.
2^{7} = 128, so it should speed it up by about that.
All Cubelets Coverable Optimisation
At each level of the search, BITWISEOR up all the remaining positions of all the remaining pieces. BITWISEOR this with the current figure state. If there are any bits missing from the figure solution state, then there is no way that we can possibly solve the figure from here.
All Pieces Placeable Optimisation
This optimisation of the BCSolve algorithm relies upon knowing that all the pieces given are in the solution. If any of the pieces cannot be placed anywhere at any level of the search, the figure is defined to be not solvable from there, and that branch is abandoned.
Piece Must Cover Optimisation
After we've chosen a piece to place but before we start iterating through all it's possible positions:
 work out the cubelet coverage of all the remaining valid positions of all the OTHER unplaced pieces.
 The cubelets that are left MUST be covered by this piece.
 If a given position of the chosen piece doesn't cover ALL these cubelets then it cannot result in a solution, so move on to the next position.
I've actually applied this to all the remaining pieces at each level of the search, as that should reduce their moves for all subsequent levels of the search.
Display characters
I have a function which sets up a set of SOMA pieces. I pass it a character which is the first piece letter. It then uses ASCII addition to calculate each successive piece character.
So  how does your program label the pieces:
In SOMA (1234567 VLTZABP)
1,2,3,4,5,6,7
In SOMAplus (V,L,T,Z,A,B,P,D,I,Q,S)
0,1,2,3,4,5,6,7,8,9,$
In DoubleSOMA 1,2,3,4,5,6,7,V,L,T,Z,A,B,P
First set: Second set:
a,b,c,d,e,f,g ; A,B,C,D,E,F,G
(1,2,3,4,5,6,7);(1,2,3,4,5,6,7)
Nov 2004: New information.
I haven't been spending any time on my solver
over the last year. In that time, we've moved house
again, had another child and been flat out at
work as usual.
In January I made another optimisation to the solver
that sped it up by about 35% on my Mac. I haven't
tried it on my PC. I might write it up on my weblog.
The change was to the core function that partitions
the possible moves into possible and invalid moves.
In case you're interested, the optimisation went
something like this:
Old code:
for each possible move:
check if it is possible after making a new move
if (possible)
put it in array of new possible moves
else
put it in array of new impossible moves
copy new possible moves to start of array
copy new impossible moves to end of array
New code:
for each possible move:
check if it is possible after making a new move
if (possible)
put it in back in same array at the start
else
put it in array of new impossible moves
copy new impossible moves to end of array
The reason that this speeds it up so much is that it
uses less cache RAM.
By not using one of the temporary
arrays, we hit the same slot again soon after the
previous contents are moved out.
It is amazing how much faster L1 and L2 cache is than
main memory  it is comparable to the difference
between main memory and disk.
Remember to try Bruce's WebLog here.
Submitted by Bruce Cropley <cropleyb at yahoo.com.au>
Edited by Thorleif Bundgaard <thorleif@fambundgaard.dk>
BACK to news index