The background: The Spectrum consists of a Z80 running at 3.54 MHz, a 256x192 bitmapped display, a simple sound chip, and 48k. The C64 consists of a 6510 running at 1MHz, a sophisticated sound chip, 64k, and a sophisticated video chip, offering 16-colors, text modes, custom character modes, 320x200 bitmap modes, 160x200 multicolor bitmap modes, sprites, raster interrupts, blah blah blah and so on. While it is clear that the 64 will outperform the Spectrum on tasks related to text, sprites, etc. it was not clear what sort of performance ratio existed for straight calculations and applications involving bitmapped graphics: for example, 3D graphics programs. A number of people claimed that the Spectrum would significantly outperform the C64 on general calculations and 3D graphics programs; these claims were based primarily on the 3.54:1 MHz ratio and personal bias, but were certainly claims worth investigating. I began looking for insight into three main questions: 1. How do the Z80 and 6502 differ, and what is the Z80 approach to solving problems/how does it differ from the 6502 approach? 2. How would the Spectrum perform on the kinds of programs that I write? 3. Is there any merit to the earlier claims about the clear superiority of the Spectrum a) in general and b) specifically for 3D graphics? Some technical discussion eventually ensued, and a number of code snippets were compared. Cycles were counted, opinions were declared, but through it all a few things became apparent, and new insights were gained. General conclusions: 1. The people who were loudly extolling the virtues of the Spectrum over the 64 not only did not understand the broad issues (i.e. have any practical 6502 or C64 programming experience to compare with), they did not even understand the specific issues (e.g. practical experience in doing 3D graphics, or drawing lines). 2. The typical cycle ratios are around 3:1. Seven programs have been considered: slow multiply, block mem transfer, substring search, three line routines, and the fast multiply. The slow multiply runs at 2:1. The non-LDIR memory copy runs at 3:1. The substring search typically runs at 3:1. The line routine runs at 3:1, with unrolling bringing it down to 2.7:1. In practical use (e.g. a matrix multiply) the fast multiply runs at > 3:1. From this I conclude that typical cycle ratios will be 3:1, in particular for the kinds of programs I write. Obviously, some algorithms will do better, and others will do worse. Conclusions to my specific questions: 1. The Z80 is based around its registers. Algorithms which fit entirely within the registers do very well, especially for 16-bit applications. Memory access is generally done indirectly via (HL), which tends to favor sequential memory access. The stack pointer is 16-bits and may be used for useful things. Branching and jumping are generally slow, as are direct memory address and absolute numbers, and indexing is fairly dissimilar. The Z80 has a number of specialized instructions which are used in a variety of tasks, and lacks the range of addressing modes offered by the 65xx. The 65xx is based around fast access to memory, in particular zero page, and its index registers. Algorithms which involve scattered memory accesses do very well, as do programs which make heavy use of branching and subroutines. The ability to add, compare, etc. directly from memory (ADC $C002 : CMP $D020) means that algorithms involving large amounts of variables, tables, pointers, etc. will perform much better on the 65xx. Absolute operations (ADC #$21) are significantly faster (2 cycles on 6510 vs. 7 on Z80). Algorithms involving relatively few variables bog down in comparison. Thus, Z80 algorithms try to fit all the variables into registers, reduce memory access to sequential or page-aligned accesses (so that HL may be used), try to avoid branching/decision making, and try to use specialized instructions like DJNZ. 6510 algorithms make heavy use of the index registers, zero-page and absolute operations, don't mind lots of branching/decision making, and try to avoid too many operations on variables. 2. For things like 3D graphics, the Spectrum probably has a 10%-20% speed advantage. While nontrivial, it is by no means decisive. For programs involving any text, sprites, etc. the Spectrum will clearly suffer. 3. You gotta be kidding. As if. :) Finally, the numbers. All code may be found at http://stratus.esam.nwu.edu/~judd/fridge/ Shootout at the 0K Corral ------------------------- (Cycles) Z80/Spectrum C64 ratio ------------ --- ----- 8x8->16 shift&add multiply 357/385 160/216/248 1.6-2.2 Block mem copy 39*x 13 3 21*x [1] 1.62 Substring search [2] [3] init: 29 4 successful compare: 57 19 3 advance next substr: 46+21*x 11+9*x [4] advance and loop: 61 15 4.07 compare last char: 40 18 2.22 Line routine [5] 73/111 [6] 30/33 2.92 [7] 29/37 [8] 24/72 (21/68) 6/30 (5/28) 2.7 (2.7) 49/71 21/25 2.6 Fast multiply [9] 100 (+7/3/27) 43/25 [10] 2.3-2.6/4.2-4.4 Sprites String print None offered [11] Notes: 1. Using LDIR on Z80. 2. Given a list of null-terminated strings, find a particular string. The substring search involves four main processes: successful character compare, compare of last character, advance to next substring on mismatch, and advance pointer and loop for next string. Init refers to initial setup (trivial). "Advance to next substring" includes unsuccessful character compare. "Advance and loop" counts cycles up to normal compare loop. 3. The C64 version is 28 bytes. A 64 program would change this problem slightly to improve performance (strings terminated with inverted dextral character instead of null, etc.) 4. x=number of characters advanced. Ratios are 4.2, 3.35, 3.03, 2.87, 2.77, 2.7, 2.65, 2.61 for x=0,1,2,... 5. Three separate Spectrum line routines were offered. All comparisons are of equivalent routines/algorithms. The three routines are: - slope<1, looped - slope<1, unrolled across x-pixels, counting by columns (more optimized version of the above) - slope>1, unrolled across x-pixels Spectrum times are Alvin's strange "average" cycles. 6. Using Ian Collier's revised algorithm. 7. "Average" cycle times consider one step in the x-direction followed by one step in x and y. 8. Looped version, slope>1 (no Spectrum version offered). 9. Spectrum times are modified by +7 if a-b<0, +3 if a+b>255, and +27 if placed in a subroutine (17 for CALL, 10 for RET). Cycle ratios assume inlined routine. 10. For multiplication of constant*vector (i.e. matrix multiply, or projection) C64 version is 43 cycles for first multiply and 25 cycles for successive multiplies. Thus, ratios are around 3.1 and 3.4 for two and three successive multiplies. 11. Spectrum will choke, badly.