The background:

The Spectrum consists of a Z80 running at 3.54 MHz, a 256x192 bitmapped
display, a simple sound chip, and 48k.  The C64 consists of a 6510
running at 1MHz, a sophisticated sound chip, 64k, and a sophisticated
video chip, offering 16-colors, text modes, custom character modes, 
320x200 bitmap modes, 160x200 multicolor bitmap modes, sprites,
raster interrupts, blah blah blah and so on.

While it is clear that the 64 will outperform the Spectrum on tasks
related to text, sprites, etc. it was not clear what sort of performance
ratio existed for straight calculations and applications involving
bitmapped graphics: for example, 3D graphics programs.

A number of people claimed that the Spectrum would significantly outperform
the C64 on general calculations and 3D graphics programs; these claims
were based primarily on the 3.54:1 MHz ratio and personal bias, but
were certainly claims worth investigating.  

I began looking for insight into three main questions:

1. How do the Z80 and 6502 differ, and what is the Z80 approach to solving
   problems/how does it differ from the 6502 approach?

2. How would the Spectrum perform on the kinds of programs that I write?

3. Is there any merit to the earlier claims about the clear superiority
   of the Spectrum a) in general and b) specifically for 3D graphics?

Some technical discussion eventually ensued, and a number of code snippets
were compared.  Cycles were counted, opinions were declared, but
through it all a few things became apparent, and new insights were
gained.

General conclusions:

1. The people who were loudly extolling the virtues of the Spectrum over
   the 64 not only did not understand the broad issues (i.e. have any 
   practical 6502 or C64 programming experience to compare with), they did 
   not even understand the specific issues (e.g. practical experience 
   in doing 3D graphics, or drawing lines).

2. The typical cycle ratios are around 3:1.

   Seven programs have been considered: slow multiply, block mem transfer,
   substring search, three line routines, and the fast multiply.  The slow
   multiply runs at 2:1.  The non-LDIR memory copy runs at 3:1.  The
   substring search typically runs at 3:1.  The line routine runs at
   3:1, with unrolling bringing it down to 2.7:1.  In practical use
   (e.g. a matrix multiply) the fast multiply runs at > 3:1.

   From this I conclude that typical cycle ratios will be 3:1, in particular
   for the kinds of programs I write.  Obviously, some algorithms will
   do better, and others will do worse.

Conclusions to my specific questions:

1. The Z80 is based around its registers.  Algorithms which fit entirely
   within the registers do very well, especially for 16-bit applications.
   Memory access is generally done indirectly via (HL), which tends to
   favor sequential memory access.  The stack pointer is 16-bits and may 
   be used for useful things.  Branching and jumping are generally
   slow, as are direct memory address and absolute numbers, and indexing
   is fairly dissimilar.  The Z80 has a number of specialized instructions
   which are used in a variety of tasks, and lacks the range of addressing
   modes offered by the 65xx.

   The 65xx is based around fast access to memory, in particular zero
   page, and its index registers.  Algorithms which involve scattered
   memory accesses do very well, as do programs which make heavy
   use of branching and subroutines.  The ability to add, compare, etc.
   directly from memory (ADC $C002 : CMP $D020) means that algorithms
   involving large amounts of variables, tables, pointers, etc. will
   perform much better on the 65xx.  Absolute operations (ADC #$21)
   are significantly faster (2 cycles on 6510 vs. 7 on Z80).  Algorithms
   involving relatively few variables bog down in comparison.

   Thus, Z80 algorithms try to fit all the variables into registers,
   reduce memory access to sequential or page-aligned accesses (so that
   HL may be used), try to avoid branching/decision making, and try to
   use specialized instructions like DJNZ.
   6510 algorithms make heavy use of the index registers, zero-page
   and absolute operations, don't mind lots of branching/decision making,
   and try to avoid too many operations on variables.

2. For things like 3D graphics, the Spectrum probably has a 10%-20%
   speed advantage.  While nontrivial, it is by no means decisive.  For 
   programs involving any text, sprites, etc. the Spectrum will clearly 
   suffer.

3. You gotta be kidding.  As if. :)


Finally, the numbers.  All code may be found at 
		http://stratus.esam.nwu.edu/~judd/fridge/

			Shootout at the 0K Corral
			-------------------------

						(Cycles)
				Z80/Spectrum  	  C64	      ratio
				------------	  ---	      -----
8x8->16 shift&add multiply	  357/385	160/216/248	1.6-2.2

Block mem copy			  39*x    	  13		3
				  21*x [1]			1.62

Substring search [2] [3]  init:	  29		  4
	  successful compare:	  57		  19		3
	  advance next substr:	  46+21*x	  11+9*x	 [4]
	  advance and loop:	  61		  15		4.07
	  compare last char:	  40		  18		2.22

Line routine [5]		  73/111 [6]	  30/33		2.92 [7]
						  29/37 [8]
				  24/72 (21/68)	  6/30 (5/28)	2.7 (2.7)
				  49/71		  21/25		2.6

Fast multiply [9]		  100 (+7/3/27)	  43/25 [10]	2.3-2.6/4.2-4.4

Sprites
String print			  None offered [11]

Notes:
1. Using LDIR on Z80.

2. Given a list of null-terminated strings, find a particular string.
   The substring search involves four main processes: successful character
   compare, compare of last character, advance to next substring on mismatch,
   and advance pointer and loop for next string.  Init refers to initial
   setup (trivial).  "Advance to next substring" includes unsuccessful
   character compare.  "Advance and loop" counts cycles up to normal compare
   loop.

3. The C64 version is 28 bytes.  A 64 program would change this problem slightly
   to improve performance (strings terminated with inverted dextral
   character instead of null, etc.)

4. x=number of characters advanced.  Ratios are 4.2, 3.35, 3.03, 2.87, 2.77, 
   2.7, 2.65, 2.61 for x=0,1,2,...

5. Three separate Spectrum line routines were offered.  All comparisons
   are of equivalent routines/algorithms.  The three routines are:
	- slope<1, looped
	- slope<1, unrolled across x-pixels, counting by columns
	  (more optimized version of the above)
	- slope>1, unrolled across x-pixels
   Spectrum times are Alvin's strange "average" cycles.

6. Using Ian Collier's revised algorithm.

7. "Average" cycle times consider one step in the x-direction followed
   by one step in x and y.

8. Looped version, slope>1 (no Spectrum version offered).

9. Spectrum times are modified by +7 if a-b<0, +3 if a+b>255, and +27
   if placed in a subroutine (17 for CALL, 10 for RET).  Cycle ratios
   assume inlined routine.

10. For multiplication of constant*vector (i.e. matrix multiply, or projection)
    C64 version is 43 cycles for first multiply and 25 cycles for successive
    multiplies.  Thus, ratios are around 3.1 and 3.4 for two and three
    successive multiplies.

11. Spectrum will choke, badly.