Article 70333 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!math.ohio-state.edu!howland.erols.net!news.maxwell.syr.edu!rill.news.pipex.net!pipex!server1.netnews.ja.net!warwick!yama.mcc.ac.uk!simonc
From: simonc@jumper.mcc.ac.uk (Cookie)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: spectrum faster in 3D
Followup-To: comp.sys.cbm,comp.sys.sinclair
Date: 25 Jun 1997 21:50:30 GMT
Organization: Sirius Cybernetics Corporation
Lines: 16
Message-ID: <5os3r6$fju@yama.mcc.ac.uk>
References: <33A9DD07.153E@skynet.be> <5oraon$sre@news.acns.nwu.edu>
NNTP-Posting-Host: jumper.mcc.ac.uk
X-Newsreader: TIN [version 1.2 PL2]
Xref: news.acns.nwu.edu comp.sys.cbm:70333 comp.sys.sinclair:40512


: If you would like to test the merits of the claim, feel free to contribute
: some code to the coding challenges.  Specifically, a fast multiply routine
: needs to be demonstrated, and a fast line routine would help the Spectrum
: cause out quite a bit.

I would suspect that the reason that the Speccy version of Carrier Command
was in 3D, and the C64 version in 2D, was because it was using solid
polygon routines, rather than wireframe drawing. Therefore the amount of
memory needed to shift came into play (assuming that the guys who
programmed it didn't want to mess around with nifty demo-coding tricks).

I can't be sure though. How many pixels could you fit per byte on the C64
display?

Simon


Article 70350 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: spectrum faster in 3D
Date: 26 Jun 1997 02:08:40 GMT
Organization: Northwestern University, Evanston, IL
Lines: 97
Message-ID: <5osiv8$eqv@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5oraon$sre@news.acns.nwu.edu> <5os3r6$fju@yama.mcc.ac.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:70350 comp.sys.sinclair:40524

Hola Simon :)

In article <5os3r6$fju@yama.mcc.ac.uk>, Cookie <simonc@jumper.mcc.ac.uk> wrote:
>
>: If you would like to test the merits of the claim, feel free to contribute
>: some code to the coding challenges.  Specifically, a fast multiply routine
>: needs to be demonstrated, and a fast line routine would help the Spectrum
>: cause out quite a bit.
>
>I would suspect that the reason that the Speccy version of Carrier Command
>was in 3D, and the C64 version in 2D, was because it was using solid
>polygon routines, rather than wireframe drawing. Therefore the amount of

Heh, I just took a look at C64 Carrier Command today -- one thing's
for sure: it is QUITE lame :).  But that tells me a lot more about
the programmers than the computer.

Actually, I have a slew of games from the late 80's that are just horrible,
and basically straight conversions of some PC/Amiga game (Ultima VI comes to
mind, as does Harmony).  We're talking puke city here.  (I bought Carrier 
Command for the Amiga around, hmmm, 1990 probably).

I'm not sure about the relative Speccy/C64 capabilities regarding
filled polygons.  By that I mean that I can see it going either way. :)

>memory needed to shift came into play (assuming that the guys who
>programmed it didn't want to mess around with nifty demo-coding tricks).

Well, I'm no demo coder; I'm just a simple mathematician, and I really
don't know many tricks.  All my code does stuff the long way :).  The
magic is all contained within the mathematics/implementation.  I can't
tell you about the various demos and such that feature "3D" (in quotes
because I suspect most of the time it is more illusion than calculation), 
but I can tell you about polygonamy.

Polygonamy renders pattern-filled polygons to the hires bitmap (320x200)
screen (the pattern is an 8x8 pattern).  It of course depends on how much
is going on, but when an object is full-screen it runs at, hmm, 8-10 fps 
I think.  When things are 1/3-1/2 sized it really hauls.  (And, for what 
it's worth, on a SuperCPU it runs full-screen at a completely unusable
>128 fps :).

(All polygonamy is is a little 3D world you can walk around in, with
several rotating 3D objects populating the landscape.  The purpose
was to test out some ideas, and see just how fast a C64 could do
"real" solid 3D objects, like for a game.)

I've been rederiving all the equations and algorithms recently and there
is of course some room for improvement.  I've also been thinking about how
I'd write a Z80/Spectrum fill routine.  The Spectrum has a nice advantage
in the way the bitmap is layed out for doing fills, since it just takes
an INC to get to the next column.  The polygonamy routine fills at
four cycles/column (the maximum possible fill speed).  In principle
I would expect the Spectrum to fill at around 11 cycles/column.

On the downside, the Speccy has to do some cumbersome checking when
moving through rows, and without convenient access to multiple
tables... well, if I restricted my routines to 256x200, or
multicolor 160x200, or used character bitmaps... I couldn't say offhand
which would win -- probably a slight edge to the Z80, because of some
ancillary calculations that also need to take place, but without seeing 
some code it is of course hard to estimate.  Any Spectrum users in
the Chicago area who own Carrier Command? :)

The point being: I scoff at the idea that a good 3D Carrier Command couldn't
have been done on the 64 :).  A C64 is quite capable of doing zippy solid
polygons.

On a vaguely related subject, I found Carrier Command, like Elite, to be 
very disappointing -- a game that could have been exceptional with just
a few extra additions.  In CC, at least on my Amiga, there was one and
only one island map, and it was a map that really only allowed for
one fairly obvious strategy.  Just a simple thing like different
maps would have made such a huge difference, but no such luck.  What
a bummer.

(In Elite, the game was really fun until I realized it had no point,
that there were really only 2-3 commodities of value, etc.  Tape drive 
disease strikes again -- how cool it would have been to be able to
walk around space stations, or have really different planets so that
there was some reason to visit them, or to be able to fly different
ships, etc.  Yeah, I killed the boa, too :).

>I can't be sure though. How many pixels could you fit per byte on the C64
>display?

8 bits/byte -> 8 pixels/byte :)

40 columns x 25 rows -> 320x200 resolution in hires.  Multicolor mode
uses bit pairs to get four colors per 8x8 cell, which of course
halves the horizontal resolution to give 160x200.

	evetS-

>Simon




Article 70435 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: spectrum faster in 3D
Date: 27 Jun 1997 03:52:07 GMT
Organization: Northwestern University, Evanston, IL
Lines: 59
Message-ID: <5ovdd7$hfe@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5os3r6$fju@yama.mcc.ac.uk> <5osiv8$eqv@news.acns.nwu.edu> <33b5a33a.4541898@news.demon.co.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:70435 comp.sys.sinclair:40589

In article <33b5a33a.4541898@news.demon.co.uk>,
Damien Burke <damien@jetman.d.c.u> wrote:
>On 26 Jun 1997 02:08:40 GMT, judd@merle.acns.nwu.edu (Stephen
>Judd) wrote:
>
>>On a vaguely related subject, I found Carrier Command, like Elite, to be 
>>very disappointing -- a game that could have been exceptional with just
>
>Strange. As I recall, the ST version had different maps for each
>island...

It certainly had different maps for each island, but that's not the
strategy aspect -- I mean the map containing the island chains.
As I recall there were a bunch of islands down at the bottom, where
you start, and a bunch up near the top, where he starts, and a single
line of islands in-between.  So you'd take over your islands, and
he'd take over his, you'd meet somewhere in the middle, and that was
that.

So, for instance, there wasn't even the opportunity to cut off lines
of supply, or progress through the islands in a different way, etc.
The game was reduced to an action game.

(As to taking islands, I found there was really only one worthwhile
way to do that too -- send in the mantas, kill the airdrome, kill
the missle launchers, send in the walrus.  Always the same.  A little
more balance, variety, or challenge would have been wonderful. :)

>>(In Elite, the game was really fun until I realized it had no point,
>
>Become 'Elite' is the point.

Not much of a point, eh? :)  When games become a matter of stamina,
I don't think they're so much fun any more.

>>that there were really only 2-3 commodities of value, etc.  Tape drive 
>>disease strikes again -- how cool it would have been to be able to
>>walk around space stations, or have really different planets so that
>>there was some reason to visit them, or to be able to fly different
>>ships, etc.  Yeah, I killed the boa, too :).
>
>Hence the missions. Surely the 64 version had them??

As I mentioned, it had the boa, and it also had the tribbles, but nothing 
else.  Besides, the missions were something you had to just wait for, and not
something you could actively seek out, which was also aggravating.

And stop calling me Shirley.

(And it always bugs me when there are lots of useless features in a game,
mostly because I always wish they would have put the effort into something
interesting or useful.  For Elite, a prime example is the fuel scoop,
or all the commodities there's no point trading in, all the planets there's
no point in visiting, etc. etc.  Heh, been wanting to get that off my
chest for a while :).

-S

> //// Damien Burke (replace d.c.u in address with demon.co.uk if replying)


Article 70459 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!ais.net!news.maxwell.syr.edu!dispatch.news.demon.net!demon!thespian.demon.co.uk!thespian.demon.co.uk!starglider
From: The Starglider <starglider@thespian.demon.co.uk>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: spectrum faster in 3D
Date: Fri, 27 Jun 1997 10:07:37 +0100
Organization: Starglider Productions
Distribution: world
Message-ID: <HY5lvGAZL4szEw0E@thespian.demon.co.uk>
References: <33A9DD07.153E@skynet.be> <5os3r6$fju@yama.mcc.ac.uk>
 <5osiv8$eqv@news.acns.nwu.edu> <33b5a33a.4541898@news.demon.co.uk>
 <5ovdd7$hfe@news.acns.nwu.edu>
NNTP-Posting-Host: thespian.demon.co.uk
X-NNTP-Posting-Host: thespian.demon.co.uk [194.222.59.42]
MIME-Version: 1.0
X-Newsreader: Turnpike Version 3.00 <FSTrNTYmwKhND08sycjZGzM2eH>
Lines: 69
Xref: news.acns.nwu.edu comp.sys.cbm:70459 comp.sys.sinclair:40601

In article <5ovdd7$hfe@news.acns.nwu.edu>, Stephen Judd
<judd@merle.acns.nwu.edu> writes
>
>It certainly had different maps for each island, but that's not the
>strategy aspect -- I mean the map containing the island chains.
>As I recall there were a bunch of islands down at the bottom, where
>you start, and a bunch up near the top, where he starts, and a single
>line of islands in-between.  So you'd take over your islands, and
>he'd take over his, you'd meet somewhere in the middle, and that was
>that.
On the spectrum version, the islands were all fairly evenly laid out, so
it was harder to keep a chain going. So you could actually completely
miss each other.
>
>So, for instance, there wasn't even the opportunity to cut off lines
>of supply, or progress through the islands in a different way, etc.
>The game was reduced to an action game.
I can see the problem there. What version were you playing again?
>
>(As to taking islands, I found there was really only one worthwhile
>way to do that too -- send in the mantas, kill the airdrome, kill
>the missle launchers, send in the walrus.  Always the same.  A little
>more balance, variety, or challenge would have been wonderful. :)
>
>>>(In Elite, the game was really fun until I realized it had no point,
>>
>>Become 'Elite' is the point.
>
>Not much of a point, eh? :)  When games become a matter of stamina,
>I don't think they're so much fun any more.
Well, it was a great task to become Elite. I think it is the only game
that has a true "Virtual" universe.
>
>>
>>Hence the missions. Surely the 64 version had them??
>
>As I mentioned, it had the boa, and it also had the tribbles, but nothing 
>else.  Besides, the missions were something you had to just wait for, and not
>something you could actively seek out, which was also aggravating.
But you had to do well and keep trying to get the missions.  So, in a
way, you are seeking them out, as the computer will decide when you have
advanced enough to do one.
>
>And stop calling me Shirley.
>
>(And it always bugs me when there are lots of useless features in a game,
>mostly because I always wish they would have put the effort into something
>interesting or useful.  For Elite, a prime example is the fuel scoop,
>or all the commodities there's no point trading in, all the planets there's
>no point in visiting, etc. etc.  Heh, been wanting to get that off my
>chest for a while :).
The fuel scoop is VERY useful! Especially in an anarchy zone. I usually
made a STACK of cash just by flying around and taking out pirates,
collecting their cargo afterwards.

Also when I wasn't allowed into a spacestation and I was low on fuel, I
could always rely on my scoop to get the @#*! outta there!

Surely that was use enough?

-- 
             **************The Starglider****************
             *  E-Mail:starglider@thespian.demon.co.uk  *
             * Web site:http://www.thespian.demon.co.uk * 
             *   NVG UPDATES:nvg@thespian.demon.co.uk   *
             *           UIN Number:1773852             *     _WW_
             * WWPAGER:http://wwp.mirabilis.com/1773852 *    /_  _\
             ********************************************   | O  O |
___________________________________________________________oOO_\/_OOo___________


Article 70426 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.luc.edu!chi-news.cic.net!feeder.chicago.cic.net!howland.erols.net!newsfeed.internetmci.com!news.tinet.ie!newsmaster@tinet.ie
From: "Dragon" <dlawless@tinet.ie>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: spectrum faster in 3D
Date: 26 Jun 1997 18:38:24 GMT
Organization: Telecom Eireann
Lines: 12
Message-ID: <01bc8260$9a9d4280$29e2869f@dragon>
References: <33A9DD07.153E@skynet.be> <5oraon$sre@news.acns.nwu.edu> <5os3r6$fju@yama.mcc.ac.uk>
NNTP-Posting-Host: p41-as1.dubadl.tinet.ie
X-Newsreader: Microsoft Internet News 4.70.1155
Xref: news.acns.nwu.edu comp.sys.cbm:70426 comp.sys.sinclair:40582

Hi,

the main reason 3D was hardly ever done on the C64 had much more to do with
the structure of its screen display than the speed of its processor. The
speccy resembles the PC in that it had a very fast bpp display. This is
great for 3D (less writes to memory to plot a pixel) but crap for
scrolling. The Amiga had the same problem as the C64 when Doom clones came
out on it. You had to do 8 times the memory writes on an Amiga to plot a
pixel than you would on a PC. Luckily for the C64 nearly all games in the
80's were sprite based scrollers so it had the advantage over the speccy in
that regard.



Article 70416 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!gatech!howland.erols.net!europa.clark.net!dispatch.news.demon.net!demon!jetman.demon.co.uk!not-for-mail
From: damien@jetman.d.c.u (Damien Burke)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: spectrum faster in 3D
Date: Thu, 26 Jun 1997 21:01:04 GMT
Organization: Ha! None!
Message-ID: <33b3d5e6.291781@news.demon.co.uk>
References: <33A9DD07.153E@skynet.be> <5oraon$sre@news.acns.nwu.edu> <5os3r6$fju@yama.mcc.ac.uk> <01bc8260$9a9d4280$29e2869f@dragon>
Reply-To: damien@jetman.d.c.u
NNTP-Posting-Host: jetman.demon.co.uk
X-NNTP-Posting-Host: jetman.demon.co.uk [194.222.120.157]
X-Newsreader: Forte Agent 1.0/32.390
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 14
Xref: news.acns.nwu.edu comp.sys.cbm:70416 comp.sys.sinclair:40579

On 26 Jun 1997 18:38:24 GMT, "Dragon" <dlawless@tinet.ie> wrote:

>Hi,
>
>the main reason 3D was hardly ever done on the C64 had much more to do with
>the structure of its screen display than the speed of its processor. The
>speccy resembles the PC in that it had a very fast bpp display.

Bwahahaha. Come and play with the Speccy's screen layout and see
why I'm laughing :)
-- 
 //// Damien Burke (replace d.c.u in address with demon.co.uk if replying)
//// Spectrum pages: http://www.jetman.demon.co.uk/speccy/
//// New to this group? Read this: http://www.jetman.demon.co.uk/speccy/faq/


Article 70436 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: spectrum faster in 3D
Date: 27 Jun 1997 04:17:47 GMT
Organization: Northwestern University, Evanston, IL
Lines: 48
Message-ID: <5ovetb$i1v@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5oraon$sre@news.acns.nwu.edu> <5os3r6$fju@yama.mcc.ac.uk> <01bc8260$9a9d4280$29e2869f@dragon>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:70436 comp.sys.sinclair:40590

In article <01bc8260$9a9d4280$29e2869f@dragon>,
Dragon <dlawless@tinet.ie> wrote:
>Hi,
>
>the main reason 3D was hardly ever done on the C64 had much more to do with
>the structure of its screen display than the speed of its processor. The

Nah.  But there are a few required prerequisites:

	1) You have to understand what you are doing,
	2) You have to be motivated, because it requires QUITE a lot more
	   effort to code it up (assuming you don't want to bake bread
	   between frame updates),
	3) You have to be at least a little bit insane, and naive enough
	   to think you can actually accomplish this.

Note that #3 can in large part overcome #2 and #1.

(The biggest limitation is far and away the processor)

>speccy resembles the PC in that it had a very fast bpp display. This is

Oh my :)  Ye know not that of which ye speak :).

It's a pretty stiff competition as to which is the more cumbersome layout,
the C64 or the Speccy :).

>great for 3D (less writes to memory to plot a pixel) but crap for
>scrolling. The Amiga had the same problem as the C64 when Doom clones came
>out on it. You had to do 8 times the memory writes on an Amiga to plot a
>pixel than you would on a PC. Luckily for the C64 nearly all games in the

Although both the Spectrum and C64 use bitplanes, they use a single bitplane.
Compare with an Amiga, which can have up to six (or eight) bitplanes.

I think having a bitplane is actually a very large advantage when doing
lines, fills, etc.  This is because you can plot 8-16 pixels at a time, if
you're smart about how you design the algorithms.  It's only when you
get into texture mapping that having a 1-1 correspondence of pixels with
memory locations (ala VGA) comes in really handy.

Take a look at Carrier Command on the Amiga (or probably ST, too)
sometime -- I don't think any 8MHz PC is going to do that.

(BTW -- one other disadvantage the Amiga has is that the CHIP memory
bus still runs at 7.8MHz or whatever).

	-Steve


Article 71732 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!ais.net!europa.clark.net!newsfeeds.sol.net!news.pagesat.net!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (multiply)
Date: Fri, 18 Jul 1997 15:36:35 -0600
Organization: Calgary Free-Net
Lines: 81
Message-ID: <5qonj8$rf4@ds2.acs.ucalgary.ca>
References: <33A9DD07.153E@skynet.be> <33C4CD48.41C6@wi.leidenuniv.nl> <5q4jpr$4cu@news.acns.nwu.edu> <33C67EF8.3BA9@chebucto.ns.ca> <11516.imc@comlab.ox.ac.uk>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <11516.imc@comlab.ox.ac.uk>
Xref: news.acns.nwu.edu comp.sys.cbm:71732 comp.sys.sinclair:41739


Well, I lost the version of the fast multiply that Ian wrote a while back,
but as I recall, it loaded the squares in and did a 16 bit subtraction.
This version is a little faster as the subtractions are done directly from
memory rather than loading and then subtracting.

Here's a recap, mostly so that you can follow my thinking and correct any
possible errors I've made.

a*b = [(a+b)/2]^2 - [(a-b)/2]^2

There is a problem when (a+b)/2 is calculated because if (a+b) is odd, we
lose a bit.  Notice also that (a+b) and (a-b) are odd at the same time.
If (a+b)/2 and (a-b)/2 are understood to represent the integer part only,
then the formula becomes:

a*b = [(a+b)/2]^2 - [(a-b)/2]^2 { + b }
where b is added only if (a+b) or (a-b) is odd.

Now it's necessary to know which number is "b" (the smaller one); the
following routine makes no assumptions about anything:


enter:	D=H=MSB of table of squares (least significant byte)
	    (most sig byte is stored in next 256 byte block)
	E=b  ; if L<E, they will be swapped
	L=a
exit:	AC=E*L
	D,H unchanged

MULT	LD A,L		4
	SUB E		4	; A=(a-b)
	JR NC,noswap	7/12	; if carry, then a&b need swapping
	EX DE,HL	4	; swap a&b
	CPL		4	; negate - NEG not used because
	ADD A,#01	7	; I need the carry flag reset

noswap	RRA		4
	LD L,A		4	; L=(a-b)/2
	JR C,addb	7/12	; if carry, (a-b) is odd 
	ADD A,E		4	; A=(a-b)/2+b=(a+b)/2
				; * checked all cases a&b odd/even
	LD E,A		4	; E=(a+b)/2
	LD A,(DE)	7	; A=[(a+b)/2]^2, lsb
	SUB (HL)	7	; subtract [(a+b)/2]^2, lsb
	LD C,A		4	; C=[(a+b)/2]^2 - [(a+b)/2]^2, lsb
	INC D		4	; point to table of squares, msb
	INC H		4
	LD A,(DE)	7	; do same but for msb
	SBC A,(HL)	7	; with carry if borrow above
	DEC D		4	; restore to table of squares, lsb
	DEC H		4	; for next multiply
	RET		10

addb	LD C,E		4	; C=b
	ADD A,E		4
	LD E,A		4	; E=(a+b)/2
	LD A,(DE)	7	; as above
	ADD A,C		4	; but add b this time to lsb
	SUB (HL)	7	; SUB is a 9 bit subtraction using carry
	LD C,A		4
	INC D		4
	INC H		4
	LD A,(DE)	7
	SBC A,(HL)	7
	DEC D		4
	DEC H		4
	RET		10

Time:

   91
+  10   if swap of a&b required
+  13   if (a+b) is odd
+  10   if counting RET - return from subroutine
-  12   if I know which is a&b


Alvin




Article 71809 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (multiply)
Date: 21 Jul 1997 01:08:23 GMT
Organization: Northwestern University, Evanston, IL
Lines: 47
Message-ID: <5qucq7$mah@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <33C67EF8.3BA9@chebucto.ns.ca> <11516.imc@comlab.ox.ac.uk> <5qonj8$rf4@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71809 comp.sys.sinclair:41791

In article <5qonj8$rf4@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht <albrecht@freenet.calgary.ab.ca> wrote:
>
>Well, I lost the version of the fast multiply that Ian wrote a while back,
>but as I recall, it loaded the squares in and did a 16 bit subtraction.
>This version is a little faster as the subtractions are done directly from
>memory rather than loading and then subtracting.

Cool, looks good (at least, it looks like it will work :).  Just a few 
timing comments:

>enter:	D=H=MSB of table of squares (least significant byte)
>	    (most sig byte is stored in next 256 byte block)
>	E=b  ; if L<E, they will be swapped
>	L=a
>exit:	AC=E*L
>	D,H unchanged
>
>MULT	LD A,L		4
>	SUB E		4	; A=(a-b)
>
>Time:
>
>   91
>+  10   if swap of a&b required
>+  13   if (a+b) is odd
>+  10   if counting RET - return from subroutine

Absolutely must RET be counted, along with the CALL (17 cycles!  For reference,
JSR on a 6510 is 6 cycles): it is part of the routine!  The initialization
of H and D should probably also be included: +3 cycles (+11 for the loads,
but -8 if you were to get rid of the end DECs), so not such a big deal.

One of the beauties of the 6510 version is that it is so tiny -- 23 bytes --
so it is suicidal to not inline it.  If you were to inline your version,
the first RET would be replaced by a JP (also 10 cycles, right?), and the second
could presumably disappear (so the odd case would be +3 cycles basically).

So: if a subroutine, then 118-141 cycles.  If inlined, 101-107 cycles.
So, basically, 2.3:1 compared with the 45 cycles routine (and 2.5:1 when
compared with the 41 cycle version which I rarely use).  Subroutine
makes it 3:1.

Not too shabby.

-S



Article 71969 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.luc.edu!uchinews!uwvax!uwm.edu!vixen.cso.uiuc.edu!howland.erols.net!cpk-news-hub1.bbnplanet.com!news.bbnplanet.com!baron.netcom.net.uk!netcom.net.uk!server3.netnews.ja.net!news.ox.ac.uk!news
From: imc@ecs.ox.ac.uk (Ian Collier)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (multiply)
Date: 21 Jul 1997 17:00:20 GMT
Organization: Oxford University Computing Laboratory, UK
Lines: 88
Message-ID: <11531.imc@comlab.ox.ac.uk>
References: <33A9DD07.153E@skynet.be> <33C67EF8.3BA9@chebucto.ns.ca> <11516.imc@comlab.ox.ac.uk> <5qonj8$rf4@ds2.acs.ucalgary.ca>
NNTP-Posting-Host: boothp2.ecs.ox.ac.uk
X-Local-Date: Monday, 21st July 1997 at 6:00pm BST
Xref: news.acns.nwu.edu comp.sys.cbm:71969 comp.sys.sinclair:41909

In article <5qonj8$rf4@ds2.acs.ucalgary.ca>, "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca> wrote:
>Well, I lost the version of the fast multiply that Ian wrote a while back,
>but as I recall, it loaded the squares in and did a 16 bit subtraction.
>This version is a little faster as the subtractions are done directly from
>memory rather than loading and then subtracting.

Also, you used a trick which I should have remembered which is to have
separate tables for the LSB and MSB of the result.

>a*b = [(a+b)/2]^2 - [(a-b)/2]^2
>
>There is a problem when (a+b)/2 is calculated because if (a+b) is odd, we
>lose a bit.  Notice also that (a+b) and (a-b) are odd at the same time.
>If (a+b)/2 and (a-b)/2 are understood to represent the integer part only,
>then the formula becomes:
>
>a*b = [(a+b)/2]^2 - [(a-b)/2]^2 { + b }
>where b is added only if (a+b) or (a-b) is odd.

But if you write it as a*b = [(a+b)^2]/4 - [(a-b)^2]/4 then you don't
have this problem and I can't help feeling the result would be quicker,
though I haven't found a quicker version yet.  (The table of course
contains x^2/4 for 0<=x<512.)

>addb    LD C,E          4	; C=b
>        ADD A,E         4
>        LD E,A          4	; E=(a+b)/2
>        LD A,(DE)       7	; as above
>        ADD A,C         4	; but add b this time to lsb
>        SUB (HL)        7	; SUB is a 9 bit subtraction using carry

I don't get that.  A 9 bit subtraction using carry?

Suppose the problem is to find 55*52.  Then (a+b)/2 is 53 and (a-b)/2
is 1. The LSB of the calculation then goes like this: get 53^2 -> 249;
add 52 -> 45 (and a carry); subtract 1 -> 44 (and no carry).  The high
byte of the result is 10.  The program gives 2604 when it should have
been 2860.

Here is the amended version of my routine, given your entry conditions
except that the table now goes up to (511^2)/4, arranged with the entries
for 256-511 after those for 0-255.

MULT    LD   A,L        4
        SUB  E          4       ; A=(a-b)
        JR   NC,noneg   7/12    ; if carry then A=(b-a)
        NEG             8
noneg   LD   L,A        4
        ADD  A,E        4
        ADD  A,E        4       ; A=(a+b) MOD 256
        LD   E,A        4
        JR   NC,small   7/12
        INC  D          4       ; if a+b is in the range 256-511, point
        INC  D          4       ; to the next block of the table
small   LD   A,(DE)     7       ; A=[(a+b)^2]/4, LSB
        SUB  (HL)       7       ; subtract [(a-b)^2]/4, LSB
        LD   C,A        4       ; C=a*b, LSB
        INC  D          4       ; point to table of squares, MSB
        INC  H          4
        LD   A,(DE)     7       ; as above, subtracting in the borrow flag
        SBC  A,(HL)     7
        LD   B,A        4       ; B=a*b, MSB  (*)
        DEC  H          4
        LD   D,H        4       ; restore to table of squares, LSB
        RET             10

(*) You appeared to miss this out so add 4 to your timings.

>   91
>+  10   if swap of a&b required
>+  13   if (a+b) is odd
>+  10   if counting RET - return from subroutine
>-  12   if I know which is a&b

   100
+    3  if swap of a&b required
+    3  if (a+b) > 255
+   10  if counting RET.

About even in the average case, I would say, except that yours doesn't
work. <g>  Feel free to show that mine doesn't work either. :-)

Having said that, I realise that there is a mistake in my routine.  I have
left it in to see whether you can spot it.  It will cost 4 cycles to fix,
unless we swap some registers in the entry conditions.
-- 
---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html


Article 72011 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (multiply)
Date: 24 Jul 1997 22:03:56 GMT
Organization: Northwestern University, Evanston, IL
Lines: 60
Message-ID: <5r8jgc$e6p@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <11516.imc@comlab.ox.ac.uk> <5qonj8$rf4@ds2.acs.ucalgary.ca> <11531.imc@comlab.ox.ac.uk>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:72011 comp.sys.sinclair:41987

In article <11531.imc@comlab.ox.ac.uk>, Ian Collier <imc@ecs.ox.ac.uk> wrote:
>In article <5qonj8$rf4@ds2.acs.ucalgary.ca>, "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca> wrote:
>
>>addb    LD C,E          4	; C=b
>>        ADD A,E         4
>>        LD E,A          4	; E=(a+b)/2
>>        LD A,(DE)       7	; as above
>>        ADD A,C         4	; but add b this time to lsb
>>        SUB (HL)        7	; SUB is a 9 bit subtraction using carry
>
>I don't get that.  A 9 bit subtraction using carry?
>
>Suppose the problem is to find 55*52.  Then (a+b)/2 is 53 and (a-b)/2
>is 1. The LSB of the calculation then goes like this: get 53^2 -> 249;
>add 52 -> 45 (and a carry); subtract 1 -> 44 (and no carry).  The high
>byte of the result is 10.  The program gives 2604 when it should have
>been 2860.

Apparently I have been sleeping on the job!  The idea is sound, but
of course the implementation is flawed, and in a way that I myself
have screwed up on.  (That is, doing 16-bit operations, in this
case addition and subtraction, out of order).

>
>Here is the amended version of my routine, given your entry conditions
>except that the table now goes up to (511^2)/4, arranged with the entries
>for 256-511 after those for 0-255.
>
>MULT    LD   A,L        4
>        SUB  E          4       ; A=(a-b)
>        JR   NC,noneg   7/12    ; if carry then A=(b-a)

>        NEG             8
>noneg   LD   L,A        4
>        ADD  A,E        4

Oops :)

>>   91
>>+  10   if swap of a&b required
>>+  13   if (a+b) is odd
>>+  10   if counting RET - return from subroutine
>>-  12   if I know which is a&b
>
>   100
>+    3  if swap of a&b required
>+    3  if (a+b) > 255
>+   10  if counting RET.

Let's not forget good 'ol +3 for initializing H and D :).

>
>About even in the average case, I would say, except that yours doesn't
>work. <g>  Feel free to show that mine doesn't work either. :-)
>
>Having said that, I realise that there is a mistake in my routine.  I have
>left it in to see whether you can spot it.  It will cost 4 cycles to fix,
>unless we swap some registers in the entry conditions.
	
	evetS-


Article 72084 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (multiply)
Date: 25 Jul 1997 20:06:03 GMT
Organization: Northwestern University, Evanston, IL
Lines: 141
Message-ID: <5rb0vb$ci6@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5qonj8$rf4@ds2.acs.ucalgary.ca> <11531.imc@comlab.ox.ac.uk> <5r8jgc$e6p@news.acns.nwu.edu>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:72084 comp.sys.sinclair:42064

In article <5r8jgc$e6p@news.acns.nwu.edu>, Stephen Judd <sjudd@nwu.edu> wrote:
>In article <11531.imc@comlab.ox.ac.uk>, Ian Collier <imc@ecs.ox.ac.uk> wrote:
>
>Let's not forget good 'ol +3 for initializing H and D :).

It is, of course, really quite silly for me to harp on this.  What I was
aiming for was to put the routine in some context (and in the following
context, the "prepare H and D for next multiply" method would probably
be better).

An extremely nice feature of the 6510 fast multiply is that once the
pointers are intialized, they stay initialized.  Recall that it looks
like

	STA ZP1		;3 cycles each
	STA ZP3
	NEG		;6 cycles
	STA ZP2
	STA ZP4
	LDA (ZP1),Y	;f(a+y)
	SEC
	SBC (ZP2),Y	;f(y-a)
	STA temp
	LDA (ZP3),Y	;high bytes
	SBC (ZP4),Y
			;On exit, A=high byte, temp=low byte

And, if I count cycles correctly this time, I get 43 cycles. :)

The small bonus is that the high bytes never need re-initialization.  The
large bonus is that the same pointers can be used for different Y, which
means that vector operations are improved considerably.  Whereas the
first multiply takes 43 cycles, if only Y is changed then subsequent 
multiplies take 25 cycles.

The most obvious example is a matrix multiply, e.g. a 3-d rotation.  Let

	    [A11 B12 C13]	    [x0]
	M = [D21 E22 F23]	P = [y0]
	    [G31 H32 I33]	    [z0]

For the sake of simplicity, let's assume numbers are 8-bits and the result 
of MP is 16-bits (three 16-bit numbers).  Multiplying by columns is clearly
the way to go on the 6510; at the second column, the algorithm looks like

	newx = newx + B12*y0
	newy = newy + E22*y0
	newz = newz + H32*y0

with similar operations for x0 and z0:

	LDA y0							3
	BEQ :SKIP	;Skip if zero				2

	LDY B12							3
	>>> MULT,temp1	;43 cycle multiply -> temp1,A = lo,hi 
			;Sets up ZP pointers
	TAY							2
	LDA temp1						3
	CLC							2
	ADC NEWXLO	;Running sum				3
	STA NEWXLO						3
	TYA							2
	ADC NEWXHI						3
	STA NEWXHI	;NEWXLO,NEWXHI will be transformed X	3

	LDY E22
	>>> QMULT,temp1 ;Quick 25 cycle multiply, ZP already set up!
	>>> ADD,NEWY	;Same as above -- 21 cycles

	LDY H32
	>>> QMULT,temp1
	>>> ADD,NEWZ
:SKIP

(So total time to multiply+add a column is 5+67+49+49=170 cycles)

Another place I use the multiply routine is to rotate 16-bit signed
numbers, where each matrix element is -1..1 (times 64).  So, the
thing needs to do an 8x16 multiplication, and divide the result by 64
(result is 16-bits, but intermediate calculation is 24 bits):

	(AL+256*AH) * B / 64 -> A,temp3 = lo,hi

	(fix up signs)
	LDA B
	BEQ :SKIP

	LDY AH		;256*AH*B				3+2+3
	>>> MULT,temp2						43
	STA temp3	;Upper 16-bits

	LDY AL		;+ AL*B					3+3
	>>> QMULT,temp1						25
	CLC		;temp1, A = lo,hi = low 16-bits
	ADC temp2	;add to upper 16-bits
	BCC :CONT	;Add carry if necessary
	INC temp3						8/12

:CONT	ASL temp1	;Divide by 64
	ROL
	ROL temp3
	ASL temp1
	ROL
	ROL temp3						24
						     Total time=114/118.
	LDX SIGN	;Fix up sign if necessary
	BEQ :POS
	Take 16-bit 2's complement
:POS	CLC		;Add to running sum
	ADC sumlo
	STA sumlo
	LDA temp3
	ADC sumhi
	STA sumhi
:SKIP

Well, I actually just post these for general interest's sake -- surely
the Z80 guys are getting pooped by now :)  I know I am!  But I do
wonder one thing: what do you do when you run out of registers?
For instance, I seem to recall that the multiply routine used up
all the regs on one side of the chip -- what happens in a case like
the above, where you need three bytes for the result, then to add
the result to something else?

I ask the question in the sense that I'm pretty impressed with the
ability of Z80 guys to cope with adversity, and hence wonder how
this type of problem is solved.

>>Having said that, I realise that there is a mistake in my routine.  I have
>>left it in to see whether you can spot it.  It will cost 4 cycles to fix,
>>unless we swap some registers in the entry conditions.

There is actually a very subtle error in the routine I posted.  Fair is
fair, so I'll tell you :).  If the first number (A) is zero, it fails.
This is because it takes the 2's complement of 0 and gets... 0!  But
the tables are constructed such that 2's-complement 0 = -256, so it
dies.  Make sense?  Probably not.  On the plus side, if it's zero then 
the entire calculation may be skipped :).

	evetS-


