Article 71065 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.luc.edu!chi-news.cic.net!howland.erols.net!news.mathworks.com!arclight.uoregon.edu!news.bc.net!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Sun, 6 Jul 1997 20:31:48 -0600
Organization: Calgary Free-Net
Lines: 199
Message-ID: <5ppkh5$rla@ds2.acs.ucalgary.ca>
References: <33A9DD07.153E@skynet.be> <5os3r6$fju@yama.mcc.ac.uk> <01bc8260$9a9d4280$29e2869f@dragon> <33B577AF.7351@skynet.be> <5p66j0$471@news.acns.nwu.edu> <33ba2ee9.321664@NEWS.DEMON.CO.UK> <5pmm0e$1cm2@ds2.acs.ucalgary.ca>
Reply-To: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5pmm0e$1cm2@ds2.acs.ucalgary.ca>
Xref: news.acns.nwu.edu comp.sys.cbm:71065 comp.sys.sinclair:40957


On Sat, 5 Jul 1997, Alvin R. Albrecht wrote:

First an errata:

> BRESENHAM
> ---------

In the overhead, change SLA A to ADD A,A.
This reduces cycle time in overhead to 32 cycles.

The 'LD A,B; EXX' combination has been moved to an earlier spot
so that the screen address in HL is in the active set should an
'INC L' need to be done to go to next 8-pixel column on screen.

... snip ...

> cont	EXX		4	; get at alternate set
> 	RRC B		8	; rotate pixel position

*	LD A,B		4	; A=pixel
*	EXX		4	; get back set holding screen address

> 	JP NC,noincx	10/10	; if passed through carry, need to INC L
> 	INC L		4
> noincxOR (HL)		7	; or in screen contents
> 	LD (HL),A	7	; write pixel
> 	EX AF,AF'	4	; A=d, with sign flag recovered
> 	DJNZ forlp	13/8	; keep going until dx=0

Cycle times are: 32 cycles overhead, 117/79 per pixel.

> DDA Version
> -----------

Is OK. 

> **** The overhead in calculating the fraction part of dy/dx (a special
> case division) is not shown here - I'll figure this out and post it
> tomorrow.

I need C=dy/dx, 8 bits fraction part which can be found using a
16 bit divide 256*dy/dx and storing the least significant byte of result
in C.  I did this and it turns out the overhead is ~500 cycles,
meaning this version is only faster than the Bresenham version above
if more than something like 90 pixels are plotted.

So, I tried a version using tables.

(i) dx is allowed in this domain:  0<dx<=255
256*dy is 8 bits only with lsb=0.

Since dy<dx and 0<=dy<192 (max screen resolution), number of locations in
table is Sum(1 to 192)+192*(256-192)=31009 bytes.  A wee bit large to
be practical and probably leaving strange bits of RAM free.

(ii) Break up the 256*dy into nybbles.

Suppose dy=AB (concatenation of nybbles A & B)
Then dy/dx=16*A/dx + B/dx
And 256*dy/dx=16*256*A/dx + 256*B/dx

We are guaranteed that the MSB of the division is 0, but when looking
up each term separately, there may be a carry from the fraction part 
(bits to the right of LSB) which could add up to 1-2 pixels error
on full width line.  So make the table 2 bytes per entry consisting
of the LSB of the division above and an 8 bit fraction part.  After the
sum, we want to store the LSB in C.

The table has at most 16*256=4096 entries and uses 8192 bytes.  If this
is unacceptable, you could extend this method to 2 bit pairs rather than
nybbles.

enter:  B=dx, C=dy
        E=current pixel position in byte (eg 01000000)
        HL=current screen memory address
        SP=sitting in table of y screen addresses

DRAW2	LD (temp),BC	20	; store BC in temp
	EXX		4
	LD BC,(temp)	20	; get it in alternate set

	LD L,B		4	; L=dx
	LD A,#0F	7	; get LS nybble of dy
	AND C		4
	ADD A,A		4	; double it
	ADD A,divtbl	7	; MSB of beginning of tables
	LD H,A		4
	LD E,(HL)	7	; D=LSB(256*"B"/dx)
	INC H		4	; E=8 bits fraction
	LD D,(HL)	7	; "B" is nybble name from above, not reg.

	LD A,#F0	7	; get MS nybble of dy
	AND C		4
	RLCA		4	; rotate it to bits 0-4
	RLCA		4
	RLCA		4
	RLCA		4
	ADD A,A		4	; double it
	ADD A,divtbl	7	; MSB of beginning of tables
	LD H,A		4
	LD A,(HL)	7	; A=8 bits fraction of:
	INC H		4
	LD H,(HL)	7	; H=LSB(256*"A"/dx)
	LD L,A		4	; "A" is not reg, but nybble name

	ADD HL,HL	11	; multiply this by 16
	ADD HL,HL	11
	ADD HL,HL	11
	ADD HL,HL	11
	ADD HL,DE	11	; add in other nybble result
	LD A,H		4	; keep MSB (8 bits fraction of dy/dx)
	EXX		4
	LD C,A		4

> 	XOR A		4	;(part of overhead)

Overhead: 227 cycles.


Storage:  2 bytes for temp and 8k for divtbl (divtbl must start on
a 256 byte boundary).

Cycle time:  227 overhead + 71/99 cycles per pixel.

This version is faster than the Bresenham when somewhere between 10 & 24
pixels need to be plotted (function of slope).


> 2.  The PLOT

...
> Should be tweaked for each line draw routine above so that the
> correct registers end up with the correct variables.
...

> PLOT	ADD HL,HL	11	; HL=2*y
> 	LD SP,addrtbl	10	; bottom of screen address table
> 	ADD HL,SP	11	; calculate index
> 	LD SP,HL	6	; SP points to current screen address
> 	POP HL		10	; get current screen address, x=0
> 	LD A,#07	7	; get bits 0..2 of x coord
> 	AND C		4	; they identify pixel position in byte

* Moved pixel pos to E rather than B for table below
> 	LD E,A		4	; E=pixel position in byte (0-7)

> 	LD A,C		4	; get bits 3..7 into bits 0..5
> 	RRCA		4	; they identify 8 bit pixel group
> 	RRCA		4
> 	RRCA		4
> 	AND #1F		7	; keep bits 0..5 (column)
> 	OR L		4	; OR into screen address
> 	LD L,A		4

* Here I've eliminated the loop used to find pixel position.

	LD D,pixtbl	7	; DE=location in pix table
	LD A,(DE)	7	; A=pixel pos

108 Cycles.

If you want to plot initial pixel, add:

	LD r,A		4	; move pix position to another register
	           		; which is necessary anyway since
	          		; both DRAWs require it in another reg.
	OR (HL)		7	; OR in screen contents
	LD (HL),A	7	; write pixel

126 Cycles total.

The two tables required:

a. Screen addresses (vertical resolution=192 pixels, so uses
2*192=384 bytes.  No restriction on its location - ie can cross
any 256 byte boundary)

> *** POP causes SP to move upward in memory so when next y coord is
> popped off stack, assuming dy>0, requires top of table to correspond
> to screen addresses at top of screen.  So y=0 refers to bottom of screen.
 
> addrtbl	DEFW bottom screen address, x=0
>         DEFW next to bottom, x=0
>         DEFW .... for each y coord

b. Pixel table

pixtbl    DEFB #01, #02, #04, #08, #10, #20, #40, #80

***pixtbl lies on the start of a 256 byte boundary so is located
at 0xYY00


Alvin


Article 71091 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!rutgers!news.sgi.com!howland.erols.net!rill.news.pipex.net!pipex!join.news.pipex.net!pipex!dispatch.news.demon.net!demon!delos.dra.hmg.gb!server1.netnews.ja.net!lyra.csx.cam.ac.uk!news.ox.ac.uk!news
From: imc@ecs.ox.ac.uk (Ian Collier)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 7 Jul 1997 14:37:52 GMT
Organization: Oxford University Computing Laboratory, UK
Message-ID: <11410.imc@comlab.ox.ac.uk>
References: <33A9DD07.153E@skynet.be> <5p66j0$471@news.acns.nwu.edu> <33ba2ee9.321664@NEWS.DEMON.CO.UK> <5pmm0e$1cm2@ds2.acs.ucalgary.ca>
NNTP-Posting-Host: boothp2.ecs.ox.ac.uk
X-Local-Date: Monday, 7th July 1997 at 3:37pm BST
Lines: 87
Xref: news.acns.nwu.edu comp.sys.cbm:71091 comp.sys.sinclair:40977

In article <5pmm0e$1cm2@ds2.acs.ucalgary.ca>, "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca> wrote:
>BRESENHAM
>---------

>Restrictions:

>dy>=0, dx>0, dy/dx<=1, dy<128, |dy-dx|<128, line lies in screen
>If necessary, break the line into two parts which can be drawn without
>repeat of the overhead and without a slope discontinuity.

Easier said than done...

If you divide d, incrE and incrNE by 2 it doesn't affect the algorithm (if
you round 1/2(2dy-dx) down) and then since you know that 0 <= dy <= dx,
dx-dy is a positive 8-bit number and I think everything turns out OK.

BTW, do you ever use C in the inner loop of this version?  If not then
you can save on EXXs by using it to hold the pixel mask.

Let's revise this version then.  The following version may plot slightly
different pixels but is still a smooth line from (x0,y0) to (x0+dx,y0+dy).
There were one or two trivial errors on your comments, apart from the EXX
thing which you corrected yourself.

Restrictions: dy>=0, dx>0, dy/dx<=1, line lies in screen.

enter: B=dx, D=dy
       HL=current screen address
       C=current pixel position within screen byte
       SP=sitting in table of y screen addresses
          (which would make calling this routine rather difficult;
          we could solve this by using IX, then at the start of the
          routine save SP and LD SP,IX)

Draw1b  LD   E,B        4    ; save copy of dx
        LD   A,B        4
        SRL  A          8    ; d=dx/2

forlp   SUB  D          4    ; d=d-dy
        JR   NC,east   12/7  ; if d>=0 goto east

neast   ADD  A,E        4    ; else d=d+dx
        EX   AF,AF'     4    ; save d
        LD   A,L        4    ; current x coordinate in bits 0-4 of L
        AND  $1F        7    ; save bits 0-4
        POP  HL        10    ; get address of next y coordinate, x=0
        OR   L          4    ; or in x coordinate
        LD   L,A        4
        JP   cont      10

east    EX   AF,AF'     4    ; save d

cont    RRC  C          8    ; rotate pixel position
        JP   NC,noincx 10/10 ; if passed through carry, need to INC L
        INC  L          4
noincx  LD   A,C        4    ; A=pixel
        OR   (HL)       7    ; or in screen contents
        LD   (HL),A     7    ; write pixel
        EX   AF,AF'     4    ; A=d
        DJNZ forlp     13/8  ; keep going until dx=0

There is an 11 cycle overhead [the djnz has an overhead of -5 cyles] and
it takes 111 cycles to plot a NE pixel and 73 cycles to plot an east pixel
(hmm, I thought the saving would be more than that).

>DDA Version
>-----------

>This one calculates the fraction part of dy/dx and adds it to a running
>variable as the x coordinate is increased by one.  If a carry is set, it's
>time to move to the next y coordinate.  Max error is 2^(-9)*256=0.5 pixel
>when line is drawn across the entire screen.

A 0.5 pixel error has me slightly worried, in case it rounds up to one pixel
and the line ends up in the wrong place.  Even if you can show that this
isn't the case, the line might end up lop-sided due to this error.

>	XOR A		(part of overhead)
>forlp	ADD A,C		4	; A=A+dy/dx
>	LD D,A		4	; save A in D
>	JP NC,noincy	10/10	; if carry, have moved to next y coord

You will want A to start off at a half (i.e., 128) rather than zero,
in order to distribute the NE moves evenly through the line.
-- 
---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html


Article 71097 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 7 Jul 1997 18:05:57 GMT
Organization: Northwestern University, Evanston, IL
Lines: 146
Message-ID: <5prb65$6i1@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5p66j0$471@news.acns.nwu.edu> <33ba2ee9.321664@NEWS.DEMON.CO.UK> <5pmm0e$1cm2@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71097 comp.sys.sinclair:40988

In article <5pmm0e$1cm2@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht <albrecht@freenet.calgary.ab.ca> wrote:
>
>:-) Sorry, it's been a busy couple of weeks.  I'll start by seeing what I
>can do in the next hour or so:
>
>1.  Fast Draw

Hmmmm... actually I was hoping that you would code up the fast multiply
and string print routines, since Ian had already posted some line code.
Still, it was nice to see the trick of pulling the rows off of the stack --
the screen rows aren't quite so barfy after all.

>BRESENHAM
>---------
>
>Restrictions:
>
>dy>=0, dx>0, dy/dx<=1, dy<128, |dy-dx|<128, line lies in screen
>
>If necessary, break the line into two parts which can be drawn without
>repeat of the overhead and without a slope discontinuity.
>
>DRAW1	LD A,C		4	; A=dy
>	SUB B		4	; A=dy-dx
>	SLA A		8	; A=2*(dy-dx)
...

The real question at this point is: why all of this gobbledygook
of restrictions and doubling stuff up when all you really want to
do is initialize the counter to dx/2?  You would do better to
emulate the ROM routine.  The entire line drawing algorithm looks
like:

	a=dx/2
loop
	a=a+dy
	if a>dx then a=a-dx:y=y+1
	plot x,y

All you have to do is add dy/dx together a bunch of times until
the sum is greater than one, at which point it is time to increment y.
As was indicated earlier, you can save the a>dx check by just
subtracting dy instead and checking for a<0.

By the way, you made an error in your cycle timings -- AND #$1F
is seven cycles, not six.  Also, I take it as a given that no
interrupts will occur during the routine :).

>There is a 40 cycle overhead and it takes 117 cycles to plot a NE pixel
>and 79 cycles to plot an east pixel.

>DDA Version
>-----------
>
>This one calculates the fraction part of dy/dx and adds it to a running

Yes, these types of methods are possible, but as you indicate they require
some horrid overhead.  Moreover, the only savings over the normal algorithm
is the a=a-dx step -- a single subtraction.  So a good line algorithm
ought to look like the "fast" routine, with an added subtraction
(and perhaps an EXX or two).

>This one takes 71 cycles to plot if y coordinate not changed and
>99 cycles per pixel otherwise.
>
>Either way, tough for you to beat I think.

My boy, if there's one thing you must learn it's to do your homework.
Like so many of its predecessors, the comment is simply foolish.

In this case, I have already indicated several line routines in previous
postings.  The Elite line routine runs at about 28 cycles/pixel.
My own routine runs at 17 cycles/51 cycles for normal/inc y steps.
By using a trivial (i.e. very short) unrolling those numbers become
10 cycles/40 cycles, respectively.

7:1 cycle ratios are something I can live with.

The routines are where they have always been: on my web page, and in
C=Hacking.

I also thought you might be interested in seeing the 6510 equivalent
of your own code:

LOOP	ADC DY		;a=a+dy
	BCC :CONT
	SBC DX		;if a>0 then a=a+dx  (a started at -dx/2)
	DEY		;y=y-1
:CONT	STA TEMP
	LSR BITP
	BCS FIXCOL	;Advance pointer to next column
	LDA BITP
	ORA (POINT),Y
	STA (POINT),Y
	LDA TEMP
	DEX
	BNE LOOP

Or, if we Steveify it a little,

LOOP	ADC #DY		2
	BCC :CONT	3/2
	SBC #DX		  2
	DEY		  2
:CONT	STA TEMP	3
	LDA BITP,X	4
	BMI FIXCOL	2/3
	ORA POINT,Y	4
	STA POINT,Y	4
	LDA TEMP	3
	DEX		2
	BNE LOOP	3

So, 30 cycles normal, 33 for steps in y-direction.  In general,
running at around the anticipated 3:1.

(I have always found that the 17/50 routine provides a very noticable 
speed increase in general use, though, which is why I always use
it instead).

Well, that takes care of 1/8 of the lines one might draw. :)
(Or, since I'm sure we're being smart about how we draw lines, 1/4).

Just for fun, here's the above routine for lines with dy/dx>1

LOOP	ADC #DX		2
	BCC :CONT	3/2
	SBC #DY		  2
	LSR BITP	  5
	BCS FIXCOL	  2/3
:CONT	STA TEMP	3
	DEY		2
	LDA BITP	3
	ORA POINT,Y	4
	STA POINT,Y	4
	LDA TEMP	3
	DEX		2
	BNE LOOP	3

29/37 cycles.

Now all we need is a fast multiply (actually a matrix multiply), and
the essence of a 3D program is complete.

	evetS-


Article 71137 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!howland.erols.net!rill.news.pipex.net!pipex!tank.news.pipex.net!pipex!feed1.news.innet.be!INbe.net!news.belnet.be!news-zh.switch.ch!surfnet.nl!highway.leidenuniv.nl!news.wi.leidenuniv.nl!not-for-mail
From: "W.A.vanLoo" <wvloo@wi.leidenuniv.nl>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Tue, 08 Jul 1997 12:19:02 +0200
Organization: Dept. of Mathematics & Computer Science, Leiden University, Leiden, the Netherlands
Lines: 15
Message-ID: <33C21416.41C6@wi.leidenuniv.nl>
References: <33A9DD07.153E@skynet.be> <5p66j0$471@news.acns.nwu.edu> <33ba2ee9.321664@NEWS.DEMON.CO.UK> <5pmm0e$1cm2@ds2.acs.ucalgary.ca> <5prb65$6i1@news.acns.nwu.edu>
NNTP-Posting-Host: ind302fc.wi.leidenuniv.nl
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.01 (X11; I; IRIX 5.3 IP22)
Xref: news.acns.nwu.edu comp.sys.cbm:71137 comp.sys.sinclair:41017

Hi guys!

I was reading your arguments about the fastest line routine

I'll have to say I'm sorry , but mine sure is a lot faster (I'm a demo
coder) , I'll post the routine on this newsgroup in a few days (maybe
this afternoon)

Check it out!

(there is no comment in the routine , you'll have to do without it.)

Bye!

Werner


Article 71151 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 8 Jul 1997 19:04:41 GMT
Organization: Northwestern University, Evanston, IL
Lines: 66
Message-ID: <5pu309$96o@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5pmm0e$1cm2@ds2.acs.ucalgary.ca> <5prb65$6i1@news.acns.nwu.edu> <33C21416.41C6@wi.leidenuniv.nl>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71151 comp.sys.sinclair:41039

In article <33C21416.41C6@wi.leidenuniv.nl>,
W.A.vanLoo <wvloo@wi.leidenuniv.nl> wrote:
>Hi guys!
>
>I was reading your arguments about the fastest line routine
>
>I'll have to say I'm sorry , but mine sure is a lot faster (I'm a demo
>coder) , I'll post the routine on this newsgroup in a few days (maybe
>this afternoon)
>
>Check it out!
>
>(there is no comment in the routine , you'll have to do without it.)
>
>Bye!
>
>Werner

Ah, very cool, please do post it!

I only have two questions in advance:

1) Is the routine useful for anything other than demos?

and

2) Are they "real" lines or "fake" lines?  That is, can it draw lines of
   any slope or is it restricted to a subset of line slopes?

For example, I think the fastest kind of routine is

	LDA #$80
	STA COLUMN1,Y
	LDA #$40
	STA COLUMN1,Y
	INY
	LDA #$20
	...
	LDA #$01
	STA COLUMN1,Y
	LDA #$80
	STA COLUMN2,Y
	INY

and so on, i.e. a completely unrolled loop for a variety of line slopes.
(Stick an RTS into the right spot and jump into it at the right spot
to start and end at certain points).  So now it's down to 6-8
cycles/pixel.  It also only works in a character map, only draws a
subset of all possible lines (and doesn't really draw them correctly),
and completely fills memory.

On the plus side, by posting that routine it would have been serious 
brown-trousers time for the Spectrum crowd :).

(Although I think these routines have their place, I don't think they
are so useful for general applications, are fairly inelegant, and get 
far too close to precalculation for my tastes -- it's a fair question
to ask whether the above is a line routine or an animation, and, when
you get down to it, there's nothing so nifty as an aligned paragraph)

The weirdest line idea I ever heard was to use an interrupt to step
the pointer in the y-direction.

Anyways, I look forwards to your routines!

	evetS-


Article 71176 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 9 Jul 1997 07:47:43 GMT
Organization: Northwestern University, Evanston, IL
Lines: 38
Message-ID: <5pvfmv$qhr@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5prb65$6i1@news.acns.nwu.edu> <33C21416.41C6@wi.leidenuniv.nl> <5pu309$96o@news.acns.nwu.edu>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71176 comp.sys.sinclair:41056


Oh dear... sometimes I amaze even myself with my stupidity.


In article <5pu309$96o@news.acns.nwu.edu>, Stephen Judd <sjudd@nwu.edu> wrote:
>
>For example, I think the fastest kind of routine is
>
>	LDA #$80

Fix #1: ORA COLUMN1,Y

>	STA COLUMN1,Y
	...

Possible fix #2: combine all the pixels into a single blit, i.e.

	LDA #$38
	ORA COLUMN2,Y
	STA COLUMN2,Y
	INY
	LDA #$06
	ORA COLUMN2,Y
	STA COLUMN2,Y
	INY
	LDA #$01
	...

and plot the endpoints separately, maybe with similar shifted routines.

Anyways, these routines thus are around 10 cycles/pixel maximum -- the 
minimum of my compact routine -- minus overhead of course.

The other comments stand, I think.

Now maybe I will finally be able to go to sleep :).

-S


Article 71189 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!howland.erols.net!europa.clark.net!chippy.visi.com!news-out.visi.com!news.IAEhv.nl!news.cs.utwente.nl!news.nic.utwente.nl!surfnet.nl!highway.leidenuniv.nl!news.wi.leidenuniv.nl!not-for-mail
From: "W.A.vanLoo" <wvloo@wi.leidenuniv.nl>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Wed, 09 Jul 1997 14:24:43 +0200
Organization: Dept. of Mathematics & Computer Science, Leiden University, Leiden, the Netherlands
Lines: 44
Message-ID: <33C3830B.446B@wi.leidenuniv.nl>
References: <33A9DD07.153E@skynet.be> <5pmm0e$1cm2@ds2.acs.ucalgary.ca> <5prb65$6i1@news.acns.nwu.edu> <33C21416.41C6@wi.leidenuniv.nl> <5pu309$96o@news.acns.nwu.edu>
NNTP-Posting-Host: ind302ba.wi.leidenuniv.nl
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.01 (X11; I; IRIX 5.3 IP22)
Xref: news.acns.nwu.edu comp.sys.cbm:71189 comp.sys.sinclair:41083

Stephen Judd wrote:
> 

> Ah, very cool, please do post it!
> 
> I only have two questions in advance:
> 
> 1) Is the routine useful for anything other than demos?
> 

Yes , it is , but it can only handle a 16*16 char area. (ie 128*128
pixels)

> and
> 
> 2) Are they "real" lines or "fake" lines?  That is, can it draw lines of
>    any slope or is it restricted to a subset of line slopes?
>

Yes , they are real lines. It takes about 30 cycles to draw a pixel if I
remember correctly.
 
> On the plus side, by posting that routine it would have been serious
> brown-trousers time for the Spectrum crowd :).
>

At the moment I can't find it , I think I have it lying around my
parent's home (I'm a student , so I'm living on my own during the
weekdays , but I still have about 90% of my disks lying at my parents
place)
 
> The weirdest line idea I ever heard was to use an interrupt to step
> the pointer in the y-direction.
>

Erm , my one works with the Bresenham routine , but it is optimized to
the fullest -> if I can remember correctly I even thought of a way to
remove 3 cycles , but I haven't made it yet. I'll try adding it during
the weekend , then I'll post the total routine.


Hoiop!

Werner


Article 71262 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 11 Jul 1997 06:28:11 GMT
Organization: Northwestern University, Evanston, IL
Lines: 98
Message-ID: <5q4jpr$4cu@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5pu309$96o@news.acns.nwu.edu> <33C3830B.446B@wi.leidenuniv.nl> <33C4CD48.41C6@wi.leidenuniv.nl>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71262 comp.sys.sinclair:41162

In article <33C4CD48.41C6@wi.leidenuniv.nl>,
W.A.vanLoo <wvloo@wi.leidenuniv.nl> wrote:
>Hi Guys!

Hi Werner! :)

>WEER
>         LDA ($60),Y		5
>         ORA TAB3,X		4
>         STA ($60),Y		5
>NIEUWX
>         INX			2
>
>         LDA D			3
>         BPL NIEUWXY2		2
>
>         ADC DY		3
>         STA D			3
>
>         TXA			2
>         AND #$07		2
>         BEQ REKEN		2
>
>         CPX X2		3
>         BCC WEER		3

So, looks to be about 39 cycles.

>WEER2
>         LDA ($60),Y  ;PLOT!!		5
>         ORA TAB3,X			4
>         STA ($60),Y			5
>NIEUWY
>         NOP				2  (INY/DEY)
>
>         LDA D				3
>         BPL NIEUWXY			2
>
>         CLC				2
>         ADC DX			3
>         STA D				3
>
>         CPY YC2			3
>         BNE WEER2			3

And this one is 35 cycles.

Almost as fast as the routines I posted -- pretty good!

>It could be a little bit faster if you did the main part like this
>
>lda tab3,x
>bmi -> new row routine

Right.  Or, just put a zero in the table at row crossings, and use a BEQ.

You could also improve it by (always) using X as the counter.  In my
routines I generally force all lines to move from left to right, so
that there are only four cases to deal with.  BLARG uses self-modifying
code for the INY/DEY, so there are just two routines: one for lines which
move in the x-direction (|slope|<1) and ones which move in the y-direction
(|slope| > 1).

dim4 uses my nifty chunky line strategy, which when slightly unrolled gives the
10/40 cycle times I mentioned.  Note that all my line algorithms and such
are on my web page and also strewn among the pages of C=Hacking.

The Elite line routine unrolls the thing a little bit to look like

	LDA #$80
	EOR ($60),Y
	STA ($60),Y
	...
	LDA #$40
	EOR ($60),Y
	STA ($60),Y

and so on -- thus it removes the bit calculations to get 28 cycles/pixel.


Now, I have a question for anyone interested: What is the Bresenham
algorithm?

I always assumed it was the strategy I use (along with just about everyone
who sits down to think up an algorithm), but I keep seeing these 2DX-DY
kinds of things that really mystify me.  What's the thinking behind the
algorithm?

(I've also heard of Bresenham circle algorithms, but have never seen
one of those either).

-S

>Hope to have kicked some butt ,

ouch! :)

>Werner van Loo , WVL/XeNoN


Article 71335 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.luc.edu!chi-news.cic.net!hookup!usc!howland.erols.net!europa.clark.net!dispatch.news.demon.net!demon!bullseye.news.demon.net!demon!newsgate.unisource.nl!surfnet.nl!highway.leidenuniv.nl!news.wi.leidenuniv.nl!not-for-mail
From: "W.A.vanLoo" <wvloo@wi.leidenuniv.nl>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Fri, 11 Jul 1997 17:49:15 +0200
Organization: Dept. of Mathematics & Computer Science, Leiden University, Leiden, the Netherlands
Message-ID: <33C655FB.41C6@wi.leidenuniv.nl>
References: <33A9DD07.153E@skynet.be> <5pu309$96o@news.acns.nwu.edu> <33C3830B.446B@wi.leidenuniv.nl> <33C4CD48.41C6@wi.leidenuniv.nl> <5q4jpr$4cu@news.acns.nwu.edu>
NNTP-Posting-Host: ind302ac.wi.leidenuniv.nl
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.01 (X11; I; IRIX 5.3 IP22)
Lines: 8
Xref: news.acns.nwu.edu comp.sys.cbm:71335 comp.sys.sinclair:41238

Hi Stephen!

I'll post the bresenham routines on this newsgroup next week!
(also the circle-routines)

Bye!

Werner.


Article 71310 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!rutgers!usenet.logical.net!news.radio.cz!nntp.uio.no!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!ix.netcom.com!tor-nn1.netcom.ca!news
From: George Taylor <aa601@chebucto.ns.ca>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Fri, 11 Jul 1997 15:44:08 -0300
Organization: Netcom Canada
Lines: 77
Message-ID: <33C67EF8.3BA9@chebucto.ns.ca>
References: <33A9DD07.153E@skynet.be> <5pu309$96o@news.acns.nwu.edu> <33C3830B.446B@wi.leidenuniv.nl> <33C4CD48.41C6@wi.leidenuniv.nl> <5q4jpr$4cu@news.acns.nwu.edu>
Reply-To: aa601@chebucto.ns.ca
NNTP-Posting-Host: hal-ns3-18.netcom.ca
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-NETCOM-Date: Fri Jul 11  2:44:58 PM EDT 1997
X-Mailer: Mozilla 3.01 (Win95; I)
Xref: news.acns.nwu.edu comp.sys.cbm:71310 comp.sys.sinclair:41203

Stephen Judd wrote:
> dim4 uses my nifty chunky line strategy, which when slightly unrolled gives the
> 10/40 cycle times I mentioned. 

And don't forget, this idea was mine :)
 
> Now, I have a question for anyone interested: What is the Bresenham
> algorithm?
> 
> I always assumed it was the strategy I use (along with just about everyone
> who sits down to think up an algorithm), but I keep seeing these 2DX-DY
> kinds of things that really mystify me.  What's the thinking behind the
> algorithm?
> 
> (I've also heard of Bresenham circle algorithms, but have never seen
> one of those either).
> 

Yes, that has puzzled me also.  I had it explained to me once, but
I can't recall it right now.  Here are the bresenham routines:
(at least some variation)
/* x0,y0 centre of circle
c color of circle
im pointer to planar bitmap (actually 320x200x256 VGA) usually 0xA000
This starts at 0,r and calculates 1/8 of the circle, going down right,
stopping before x=y */
void circle(im,x0,y0,r,c)
  Image *im;
  int x0,y0,r,c;
{
  register int x=-1,y=r+1;
  do
    { r += ((++x - ( (r >= 0) ? --y : 0) )<<1) + 1;
      wr_pixel(im,x0+x,y0+y,c);
      wr_pixel(im,x0+y,y0+x,c);

      wr_pixel(im,x0+y,y0-x,c);
      wr_pixel(im,x0+x,y0-y,c);

      wr_pixel(im,x0-x,y0-y,c);
      wr_pixel(im,x0-y,y0-x,c);

      wr_pixel(im,x0-y,y0+x,c);
      wr_pixel(im,x0-x,y0+y,c); }
   while (y > x);
}
/* and semi translated to basic:
x=-1
y=r+1
loop:
x=x+1
r=r+x+1
if r>=0 then y=y-1:r=r-2*y
plot8(x,y,c)
if y>x then goto loop: */

Judd version:
a=r/2
x=r
y=0
loop:
y=y+1
a=a-y
if a<=0 then x=x-1:a=a+x
plot8(x,y)
if y<x then loop:

They are semi similar and seem to be plotting in different directions,
but I don't really understand them right now..
This wasn't the text file I was really looking for..
also I have a full bresenham conic section drawer:
// CONIC  2D Bresenham-like conic drawer.
//	 CONIC(Sx,Sy, Ex,Ey, A,B,C,D,E,F) draws the conic specified
//	 by A x^2 + B x y + C y^2 + D x + E y + F = 0, between the
//	 start point (Sx, Sy) and endpoint (Ex,Ey).
It's quite long tho.. kinda makes sense, except a lot more
accumulator type variables being added.


Article 71368 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 12 Jul 1997 20:49:40 GMT
Organization: Northwestern University, Evanston, IL
Lines: 26
Message-ID: <5q8ql4$ji8@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <33C4CD48.41C6@wi.leidenuniv.nl> <5q4jpr$4cu@news.acns.nwu.edu> <33C67EF8.3BA9@chebucto.ns.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71368 comp.sys.sinclair:41294

In article <33C67EF8.3BA9@chebucto.ns.ca>,
George Taylor  <aa601@chebucto.ns.ca> wrote:
>Stephen Judd wrote:
>> dim4 uses my nifty chunky line strategy, which when slightly unrolled gives the
>> 10/40 cycle times I mentioned. 
>
>And don't forget, this idea was mine :)

I believe what you are thinking of is the idea of inserting an rts into
an unrolled thing.  Although polygonamy used this technique, the unrolled
chunky line is a totally different thing.  Polygonamy used it for fills, 
not for drawing lines (pg doesn't draw lines, actually).  Moreover, correct 
me if I'm wrong, but that particular idea of yours came from Last Traktor, 
right?

The unrolling here is across eight x-steps, and no more.

Incidentally, I derived the chunky idea in my sleep, at least, while I
was half-asleep :).  Actually, these days, I seem to get most of my good
ideas while waking up.  I used to get them in the shower.  Strange.
The only problem is that sometimes they are a little, erm... weird,
i.e. make sense at the time in the way some dreams make sense at the
time :).

-S


Article 71342 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.luc.edu!uchinews!uwvax!uwm.edu!sol.net!newsfeeds.sol.net!europa.clark.net!news.maxwell.syr.edu!news.bc.net!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Fri, 11 Jul 1997 06:49:50 -0600
Organization: Calgary Free-Net
Lines: 112
Message-ID: <5q5a8o$fl6@ds2.acs.ucalgary.ca>
References: <33A9DD07.153E@skynet.be> <5pmm0e$1cm2@ds2.acs.ucalgary.ca> <5prb65$6i1@news.acns.nwu.edu> <33C21416.41C6@wi.leidenuniv.nl> <5pu309$96o@news.acns.nwu.edu> <33C3830B.446B@wi.leidenuniv.nl> <33C4CD48.41C6@wi.leidenuniv.nl>
Reply-To: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <33C4CD48.41C6@wi.leidenuniv.nl>
Xref: news.acns.nwu.edu comp.sys.cbm:71342 comp.sys.sinclair:41249


Here we go:  one last line draw routine.  What's new:  I've adopted
Stephen's plot in a byte approach and I've dropped the stack as a source
for screen addresses, opting instead to calculate them.


enter:	A=dx/2
	B=dx/8=counter
	C=char line counter *new*
	DE=screen address offset to next char line+#0100 *new*
	HL=screen address offset to next block+#0100 *new*
	B'=dx
	C'=dy
	D'=scan line counter *new*
	HL'=screen address
	E'=(HL')=screen byte there before line drawn

DRAW	EXX		4

	SUB C		4	; A=A-dy
	JR C,fixy5	7/12	; if A<0 next y pixel
back5	SET 5,E		8	; plot bit 5

	SUB C		4	; These are all the same.
	JR C,fixy6	7/12	;   Really.
back6	SET 6,E		8	; But for different bits, o'course

	SUB C		4	; Bit 7 is special because I must
	JR C,fixy7	7/12	; write the byte to screen and 
back7	SET 7,E		8	; advance to the next column.
	LD (HL),E	7	; write byte to screen
	INC L		4	; go to next column
	LD E,(HL)	7	; get screen byte already there

	SUB C		4
	JR C,fix0	7/12
back0	SET 0,E		8

...etc for bits 1-3 and finally...

	SUB C		4
	JR C,fix4	7/12
back4	SET 4,E		8

; End of 8 pixel loop

	EXX		4	; get me the reg. set with dx/8 counter
	DJNZ DRAW	13/8	; dec by 1, loop if not done

	LD (HL),E	7	; plot byte at end (overhead)

Each fixyN is identical:

fixyN	ADD A,B		4	; A=A+dx
	LD (HL),E	7	; write byte to screen
	DEC H		4	; move up one scan line
	LD E,(HL)	7	; get screen byte there
	DEC D		4	; if D=0 then we have moved into the
	JP NZ,backN	10/10	;   next char line or block

	EXX		4	; get the set with char line counter
	DEC C		4	; if C=0 then we have moved into the
	JR Z,nxtblk	7/12	;   next block
nxtscan	PUSH DE		11	; push address offset for next char line
	EXX		4	; get back normal set with HL=screen addr
	POP DE		10	; get offset
	ADD HL,DE	11	; add offset to screen address
	LD E,(HL)	7	; get screen byte
	LD D,8		7	; update scan line counter
	JP backN	10	

nxtblk	PUSH HL		11	; push offset for next block
	LD C,8		7	; update char line counter
	EXX		4	; get back normal set with HL=screen addr
	POP DE		10	; get offset
	ADD HL,DE	11	; add offset to screen address
	LD E,(HL)	7	; get screen byte there
	LD D,8		7	; update scan line counter
	JP backN	10


I need 9 copies of this to make it work.

Timings:

- An x pixel is plotted in 19 cycles.
- One time out of 8 (after plotting bit 7), there is an additional
18 cycles to plot byte and update column.
- Therefore an x pixel takes 19+18/8=21.25 cycles

- Will move up a block 3 out of 192 times
- Will move up a char line 24-3=21 out of 192 times
- Will move up a scan line 192-3-21=168 out of 192 times
- Therefore a y pixel on average is plotted in 168/192*41 + 21/192*116 +
3/192*128 + 19 = 69.6 cycles

- Loop overhead takes 21 cycles.  By unrolling, I eliminate this.

Time to plot the line is 21.2 for East and 69.6 for NEast.
Compared to your 10 E & 40 NE version, the ratios occur in 2.1:1 to
1.74:1,  making the Spectrum from 1.66 to 2.03 times faster than the C64
at drawing lines.

Now we await the SID version.


Alvin


Article 71379 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 13 Jul 1997 00:52:58 GMT
Organization: Northwestern University, Evanston, IL
Lines: 51
Message-ID: <5q98ta$pmd@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <33C3830B.446B@wi.leidenuniv.nl> <33C4CD48.41C6@wi.leidenuniv.nl> <5q5a8o$fl6@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71379 comp.sys.sinclair:41301

In article <5q5a8o$fl6@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht <albrecht@freenet.calgary.ab.ca> wrote:
>
>
>Here we go:  one last line draw routine.  What's new:  I've adopted
>Stephen's plot in a byte approach and I've dropped the stack as a source

Good; I hoped that someone would clue in to the bit unrolling ala Elite.
Since Braben and Bell seemed to be Z80 guys, I assumed the Elite routine
was pretty much translated code.

But...

>DRAW	EXX		4
>
>	SUB C		4	; A=A-dy
>	JR C,fixy5	7/12	; if A<0 next y pixel
>back5	SET 5,E		8	; plot bit 5
>
>	SUB C		4	; These are all the same.
>	JR C,fixy6	7/12	;   Really.
>back6	SET 6,E		8	; But for different bits, o'course

How were you planning on terminating the line?  Or is this supposed
to draw lines within +/-8 pixels?

Not exactly a line routine I'd want to use for a game...

(Might make for an interesting demo effect though).

If I were to do a chunky 6510 equivalent, it would look like

	SBC #blah	2
	BCC fixy1	2

four cycles.  But I wouldn't want to do that :).

Incidentally, it is possible to rewrite my 10/40 routine as an 8/43 routine. 
I also take it as obvious that if speed were the real concern, a 10/40 (note
to self: do taxes early this year) routine would be user for lines with 
slope <1/2, and an e.g. 28 cycle routine for lines with slope >1/2.  So, 
minimum of 8-10 or so cycles and a max of, hmmm... 28-32.

(And yes, I'm ignoring things like moving through columns, which
might add a cycle to the average; and yes, the 8 cpp routine has
a slightly more involved setup.)

So, why don't you fix up the routine to draw lines correctly, and
then we can compare.

-S


Article 71241 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.luc.edu!chi-news.cic.net!news.maxwell.syr.edu!news.bc.net!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Wed, 9 Jul 1997 21:45:35 -0600
Organization: Calgary Free-Net
Lines: 163
Message-ID: <5q1lvu$13gm@ds2.acs.ucalgary.ca>
References: <33A9DD07.153E@skynet.be> <5p66j0$471@news.acns.nwu.edu> <33ba2ee9.321664@NEWS.DEMON.CO.UK> <5pmm0e$1cm2@ds2.acs.ucalgary.ca> <5prb65$6i1@news.acns.nwu.edu>
Reply-To: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5prb65$6i1@news.acns.nwu.edu>
Xref: news.acns.nwu.edu comp.sys.cbm:71241 comp.sys.sinclair:41133


On 7 Jul 1997, Stephen Judd wrote:

> Hmmmm... actually I was hoping that you would code up the fast multiply
> and string print routines, since Ian had already posted some line code.
> Still, it was nice to see the trick of pulling the rows off of the stack --
> the screen rows aren't quite so barfy after all.

Hey, they're coming - it's just that the little time I have to spend
reading news is stretched to the limit by this thread :).

> The real question at this point is: why all of this gobbledygook
> of restrictions and doubling stuff up when all you really want to

Yep, I thought a good place to start would be the classic routines.  The
Bresenham & DDA are straight out of a text.  One hour of writing & coding
leaves little time for thinking.

> By the way, you made an error in your cycle timings -- AND #$1F
> is seven cycles, not six.  Also, I take it as a given that no
> interrupts will occur during the routine :).

You can do away with the stack method at a penalty of 12 cycles or so but
we can't have that!
 
> >Either way, tough for you to beat I think.
 
> My boy, if there's one thing you must learn it's to do your homework.
> Like so many of its predecessors, the comment is simply foolish.

Hey, lighten up Steve.  You're a hands on guy so I'm playing on that
plane.
 
> In this case, I have already indicated several line routines in previous
> postings.  The Elite line routine runs at about 28 cycles/pixel.

This I have to apologize to you for.  I don't always have time to go back
and read all the posts when I miss a week or so.  This time, you were
practically begging for code, so I spent time writing rather than reading.
Sorry.

> My own routine runs at 17 cycles/51 cycles for normal/inc y steps.
> By using a trivial (i.e. very short) unrolling those numbers become
> 10 cycles/40 cycles, respectively.

I had a peek at your line routine - you build up a byte before actually
plotting it - not a bad idea.  I'll see if this can be extended to
advantage in a z80 version.

In your 17/51 cycle time, I assume the 51 cycles includes the actual write
of the byte?  Would you perhaps restate that 17 cycle figure as a function
of how many bits are set in the byte?

> 7:1 cycle ratios are something I can live with. > 

Well, the first attempt is not always the best.  In the 2nd hour of
working on the problem I came up with this:

    enter:  C=dy
            B=dx/8+1 (see below)=counter
            E=dx
            A=dx/2
            HL=screen address
            SP=in address table as last time
           * D used as temp below

    DRAW1   SUB C		4	; sub dy
            JR C,fixy0		7/12	; if carry, next y coord
    back0   SET 0,(HL)		15	; plot pixel in bit 0

            SUB C		4	; sub dy
            JR C,fixy1		7/12	; if carry, next y coord
    back1   SET 1,(HL)		15	; plot pixel in bit 1

            SUB C		4	; sub dy
            JR C,fixy2		7/12	; if carry, next y coord
    back2   SET 2,(HL)		15	; plot pixel in bit 2

        .... etc ...  repeated for each bit position

            SUB C		4	; sub dy
            JR C,fixy7		7/12	; if carry, next y coord
    back7   SET 7,(HL)		15	; plot pixel in bit 7

            INC L		4	; next column
            DJNZ DRAW1		13/8	; do until ctr=0

            (..END..)

The "fixyN" are 8 identical subroutines.  They are separated so that
you know where to jump back to when done.

    fixyN   ADD A,E		4	; add dx
            LD D,A		4	; store A in temp
            LD A,#1F		7	; mask for column
            AND L		4	; keep column
            POP HL		10	; next y coord address
            OR L		4	; OR in x coord column
            LD L,A		4
            LD A,D		4	; recover A
            JP backN		10

This requires a little explanation.  The counter has been initialized to
dx/8 rather than dx.  This means it only checks whether the loop is
complete following every 8 pixels plotted.  The 3 bits shifted out
determine where in the routine the draw should be entered (at all "SUB 
C" points).  Entering at a place other than the beginning means that the
leftover number of pixels after dx>>3 will be plotted.  This also means
that counter=dx/8+1.  Now, there is also the matter that the first point
plotted will not necessarily be in the bit position indicated above.  My
brain is a little foggy right now, but I think that means 8*8=64 copies
total.  That's 8896 bytes - awful!  It may be better to accept a small
overhead in writing a header that modifies the code and stick to 8 copies.

This works out to 26 + 17/8 (the "INC L; DJNZ" are done once per 8 pixels)
= 28.1 cycles per x pixel.  To plot a new y coord, add 5 (successful JR in
DRAW routine) + 51=56 cycles.

This 28.1/84.1 and your 17/51 gives a ratio of 1.65:1/1.65:1.  A 1.65:1
ratio is something I can live with :-).  I'm not done yet:  I have another
idea or two and I'll try your idea as well.

> I also thought you might be interested in seeing the 6510 equivalent
> of your own code:
 
> Or, if we Steveify it a little,
> 
> LOOP	ADC #DY		2
> 	BCC :CONT	3/2
> 	SBC #DX		  2
> 	DEY		  2
> :CONT	STA TEMP	3
> 	LDA BITP,X	4
> 	BMI FIXCOL	2/3
> 	ORA POINT,Y	4
> 	STA POINT,Y	4
> 	LDA TEMP	3
> 	DEX		2
> 	BNE LOOP	3
> 
> So, 30 cycles normal, 33 for steps in y-direction.  In general,
> running at around the anticipated 3:1.

Actually, no.   The Y coordinate cannot simply be decremented to move up
the bitmap (if it could, I'd simply DEC H rather than fool with the
stack).  You'll have to look up the lsb/msb from two tables and copy
them into the "ORA POINT(,Y)" & "STA POINT(,Y)" instructions.  The time
for a 6502 to do this is quite large in comparison with the z80's POP.
Also when fixing the column (you omitted above) you'll have to reload the 
LSB of both instructions above.  You're looking
at <2:1 ratios on average here.

> Now all we need is a fast multiply (actually a matrix multiply), and
> the essence of a 3D program is complete.

Coming, and I see Ian's started already, but I want to see how fast I can
make the draw first and compare that to your unrolled version.


Alvin


Article 71257 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.luc.edu!chi-news.cic.net!howland.erols.net!newspump.sol.net!sol.net!spool.mu.edu!newsspool.sol.net!munnari.OZ.AU!news.mel.connect.com.au!harbinger.cc.monash.edu.au!news.cs.su.oz.au!metro!metro!seagoon.newcastle.edu.au!cc.newcastle.edu.au!ecbm
From: "Bruce R. McFarling" <ecbm@cc.newcastle.edu.au>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Thu, 10 Jul 1997 19:26:53 +1000
Organization: The University of Newcastle
Lines: 27
Message-ID: <Pine.PMDF.3.95.970710192206.675995039C-100000@cc.newcastle.edu.au>
References: <33A9DD07.153E@skynet.be> <5p66j0$471@news.acns.nwu.edu> <33ba2ee9.321664@NEWS.DEMON.CO.UK> <5pmm0e$1cm2@ds2.acs.ucalgary.ca> <5prb65$6i1@news.acns.nwu.edu> <5q1lvu$13gm@ds2.acs.ucalgary.ca>
NNTP-Posting-Host: cc.newcastle.edu.au
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5q1lvu$13gm@ds2.acs.ucalgary.ca>
Xref: news.acns.nwu.edu comp.sys.cbm:71257 comp.sys.sinclair:41153

On Wed, 9 Jul 1997, Alvin R. Albrecht wrote:

>... 
> In your 17/51 cycle time, I assume the 51 cycles includes the actual write
> of the byte?  Would you perhaps restate that 17 cycle figure as a function
> of how many bits are set in the byte?

	6502 loops almost always *have* to write the byte, so writing is
normally included.  Not to say that I'm answering this specific question,
tho.

> You'll have to look up the lsb/msb from two tables and copy
> them into the "ORA POINT(,Y)" & "STA POINT(,Y)" instructions.
	Huh?  Where can you get a 6502 with an addr(,Y) address mode?
I'll have to say that it's a great idea: index the address off a zero page
location indicated in the Y register is one of the address modes that the
6502 doesn't have.

	And I'm happy to see code coming out from both sides. It's
informative information (as opposed to the more common type of
information).

Virtually,

Bruce R. McFarling, Newcastle, NSW
ecbm@cc.newcastle.edu.au


Article 71266 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 11 Jul 1997 07:11:58 GMT
Organization: Northwestern University, Evanston, IL
Lines: 28
Message-ID: <5q4mbu$4q0@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5prb65$6i1@news.acns.nwu.edu> <5q1lvu$13gm@ds2.acs.ucalgary.ca> <Pine.PMDF.3.95.970710192206.675995039C-100000@cc.newcastle.edu.au>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71266 comp.sys.sinclair:41167

In article <Pine.PMDF.3.95.970710192206.675995039C-100000@cc.newcastle.edu.au>,
Bruce R. McFarling <ecbm@cc.newcastle.edu.au> wrote:
>On Wed, 9 Jul 1997, Alvin R. Albrecht wrote:
>
>>... 
>> In your 17/51 cycle time, I assume the 51 cycles includes the actual write
>> of the byte?  Would you perhaps restate that 17 cycle figure as a function
>> of how many bits are set in the byte?
>
>	6502 loops almost always *have* to write the byte, so writing is
>normally included.  Not to say that I'm answering this specific question,
>tho.

Well, the idea behind the chunky plotting is that something like

	LDA bitp
	ORA blah
	STA blah

is very time consuming, and very wasteful since often there will be 
several pixels set in a particular byte.  So what it does is calculate
the full byte before plotting it.  So you get a bunch of faster
calculations (the 17 cycles), but a somewhat longer calculation when
the actual byte needs plotting (the 51 cycles), which happens when
you either pass through a column or inc/decrement the y-coordinate.

-S


Article 71265 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 11 Jul 1997 07:03:56 GMT
Organization: Northwestern University, Evanston, IL
Lines: 108
Message-ID: <5q4lss$4o2@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5pmm0e$1cm2@ds2.acs.ucalgary.ca> <5prb65$6i1@news.acns.nwu.edu> <5q1lvu$13gm@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71265 comp.sys.sinclair:41166

In article <5q1lvu$13gm@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht <albrecht@freenet.calgary.ab.ca> wrote:
>On 7 Jul 1997, Stephen Judd wrote:
>
>> Hmmmm... actually I was hoping that you would code up the fast multiply
>> and string print routines, since Ian had already posted some line code.
>
>Hey, they're coming - it's just that the little time I have to spend
>reading news is stretched to the limit by this thread :).

Priorities, man, priorities!  Surely we can come up with some little
things to cut out of the daily routine to make more time for this critically
important thread!  For instance, do you have a job?  Do you eat?  Sleep?

(Luckily, of course, I have for some unknown reason saved all of the
posts, so no fear!  They can be read at leisure over the next few
years or so.)

>> The real question at this point is: why all of this gobbledygook
>> of restrictions and doubling stuff up when all you really want to
>
>Yep, I thought a good place to start would be the classic routines.  The

Well, take the question as, "What is all of this gobbledygook that always
seems to accompany a Bresenham routine?"

i.e. what in the world are these textbooks suggesting (and why)?

>Bresenham & DDA are straight out of a text.  One hour of writing & coding

I guess I don't know what DDA is, either.

>> is seven cycles, not six.  Also, I take it as a given that no
>> interrupts will occur during the routine :).
>
>You can do away with the stack method at a penalty of 12 cycles or so but
>we can't have that!

Yep, I agree, it's far too useful, and I don't think the Spectrum would
have a soundtrack (or some other interrupt-driven event) going (since,
of course, you need a decent sound chip for that :).

>> >Either way, tough for you to beat I think.
> 
>> My boy, if there's one thing you must learn it's to do your homework.
>> Like so many of its predecessors, the comment is simply foolish.
>
>Hey, lighten up Steve.  You're a hands on guy so I'm playing on that

* Steve starts floating upwards

>plane.
> 
>> In this case, I have already indicated several line routines in previous
>> postings.  The Elite line routine runs at about 28 cycles/pixel.
>
>This I have to apologize to you for.  I don't always have time to go back

Bah, don't start getting all mushy on me, or I might have to start being
nice or pleasant or something.

>In your 17/51 cycle time, I assume the 51 cycles includes the actual write
>of the byte?  Would you perhaps restate that 17 cycle figure as a function
>of how many bits are set in the byte?

Well, think 17 cycles for each bit, plus an extra 34 cycles.  And yes,
the reason the 51 is so high (and the 17 is so low) is because I keep
pushing off the actual plotting.

>Well, the first attempt is not always the best.  In the 2nd hour of
>working on the problem I came up with this:

Well, I did put a condition in the challenge that said "A routine
you might use in a game", which this one (like the 2-10 cycle routine
I posted) really doesn't meet of course (but which the 10/40 routine
does meet).

Still, they are fun to think of, eh? :)

>> I also thought you might be interested in seeing the 6510 equivalent
>> of your own code:
> 
>> Or, if we Steveify it a little,
>> 
>> LOOP	ADC #DY		2
>> 	BCC :CONT	3/2
>> 	SBC #DX		  2
>> 	DEY		  2
>> :CONT	STA TEMP	3
>> 	LDA BITP,X	4
>> 	BMI FIXCOL	2/3
>> 	ORA POINT,Y	4
>> 	STA POINT,Y	4
>> 	LDA TEMP	3
>> 	DEX		2
>> 	BNE LOOP	3
>> 
>> So, 30 cycles normal, 33 for steps in y-direction.  In general,
>> running at around the anticipated 3:1.
>
>Actually, no.   The Y coordinate cannot simply be decremented to move up

What I meant of course was the C64 equivalent of your algorithm.  The
purpose of this challenge was to test the assertion that the Spectrum
could do things like draw lines many times faster than the 64.

-S


Article 71358 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.luc.edu!chi-news.cic.net!su-news-hub1.bbnplanet.com!cpk-news-hub1.bbnplanet.com!news.bbnplanet.com!news.maxwell.syr.edu!news.bc.net!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Sat, 12 Jul 1997 10:29:12 -0600
Organization: Calgary Free-Net
Lines: 69
Message-ID: <5q8bg4$f10@ds2.acs.ucalgary.ca>
References: <33A9DD07.153E@skynet.be> <5pmm0e$1cm2@ds2.acs.ucalgary.ca> <5prb65$6i1@news.acns.nwu.edu> <5q1lvu$13gm@ds2.acs.ucalgary.ca> <5q4lss$4o2@news.acns.nwu.edu>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5q4lss$4o2@news.acns.nwu.edu>
Xref: news.acns.nwu.edu comp.sys.cbm:71358 comp.sys.sinclair:41286


On 11 Jul 1997, Stephen Judd wrote:

> Well, take the question as, "What is all of this gobbledygook that always
> seems to accompany a Bresenham routine?"
 
> i.e. what in the world are these textbooks suggesting (and why)?

I suspect that the Bresenham and this version are equivalent.  Why they
left it in that form, I don't know:  maybe it's better suited for drawing
anti-aliased lines (the decision variable, d, is proportional to the error
in the pixel approximation.  I think the constant of proportionality is
constant for all lines - that's something I'd have to go over the
derivation to find out).

And you can extend it to other conic sections.
 
> >Bresenham & DDA are straight out of a text.  One hour of writing & coding

> I guess I don't know what DDA is, either.

DDA = digital differential analyser (I think).  A fancy term concocted to
scare away the lay people.  It actually means that you're using
the differentials dy,dx rather than deltay & deltax.

> >> is seven cycles, not six.  Also, I take it as a given that no
> >> interrupts will occur during the routine :).

> >You can do away with the stack method at a penalty of 12 cycles or so but
> >we can't have that!

> Yep, I agree, it's far too useful, and I don't think the Spectrum would
> have a soundtrack (or some other interrupt-driven event) going (since,
> of course, you need a decent sound chip for that :).

:-)  The new version doesn't touch the stack so that I can still play
interrupt driven music on my 2068's AY chip.

> Well, I did put a condition in the challenge that said "A routine
> you might use in a game", which this one (like the 2-10 cycle routine
> I posted) really doesn't meet of course (but which the 10/40 routine
> does meet).

The latest version 21.25/69.6 does meet the criteria.
 
> Still, they are fun to think of, eh? :)

Fun?  Stressful I think.  I am merrily working on a 30 cycle version
that's "going to kick some butt" then you come along and say yours is
10/40.  I work on a 20 cycle version and bring it to completion and
another bloke wants to use the SID (!) to draw lines.  Then another wants
to unroll the mother of all loops and "plot" jsr's and rts's to achieve
21/24.  Aargh!

> What I meant of course was the C64 equivalent of your algorithm.  The
> purpose of this challenge was to test the assertion that the Spectrum
> could do things like draw lines many times faster than the 64.

I thought you meant:  what if the Spectrum had a 6502 then your code would
look like this and you'd get a 3:1 ratio as expected.  In actual fact, the
code wouldn't work :).  I'm still not clear what you meant, but no matter.
Remember that cycle ratios of > 3.54:1 need to be achieved to make the C64
faster than the Spectrum.


Alvin


Article 71380 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 13 Jul 1997 01:40:58 GMT
Organization: Northwestern University, Evanston, IL
Lines: 70
Message-ID: <5q9bna$qiq@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5q1lvu$13gm@ds2.acs.ucalgary.ca> <5q4lss$4o2@news.acns.nwu.edu> <5q8bg4$f10@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71380 comp.sys.sinclair:41302

In article <5q8bg4$f10@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht <albrecht@freenet.calgary.ab.ca> wrote:
>On 11 Jul 1997, Stephen Judd wrote:
>
>> Well, take the question as, "What is all of this gobbledygook that always
>> seems to accompany a Bresenham routine?"
> 
>> i.e. what in the world are these textbooks suggesting (and why)?
>
>I suspect that the Bresenham and this version are equivalent.  Why they
>left it in that form, I don't know:  maybe it's better suited for drawing

Well, as usual I just get the feeling that it's a complicated way of
doing something simple.

>> Well, I did put a condition in the challenge that said "A routine
>> you might use in a game", which this one (like the 2-10 cycle routine
>> I posted) really doesn't meet of course (but which the 10/40 routine
>> does meet).
>
>The latest version 21.25/69.6 does meet the criteria.

Except that you wouldn't put it in a game :).

Also, it's only half a routine.  Slope>1 is a slightly different ballgame,
as you've probably anticipated.

>10/40.  I work on a 20 cycle version and bring it to completion and
>another bloke wants to use the SID (!) to draw lines.  Then another wants
>to unroll the mother of all loops and "plot" jsr's and rts's to achieve
>21/24.  Aargh!

Well, I think "something you could use in a game" is a good boundary
condition, and the reason I put it in is to avoid these pissing contests 
that result in a really fast routine that has an extraordinarily limited
practical application, if any (although they are sometimes of theoretical/
academic interest).

(BTW, some of my own routines fall into that category, although I have
avoided them in these comparisons.  That is, they're great for a demo,
but don't cut it in the real world.  I will let the philosophers out
there try to think about how a computer can be considered a real world :).

>> What I meant of course was the C64 equivalent of your algorithm.  The
>> purpose of this challenge was to test the assertion that the Spectrum
>> could do things like draw lines many times faster than the 64.
>
>I thought you meant:  what if the Spectrum had a 6502 then your code would

Nope.  The Spectrum display layout is not well-suited to the 6502.

>code wouldn't work :).  I'm still not clear what you meant, but no matter.

Well, like I said: those were C64 implementations of the methods your
code used to calculate lines.  They also demonstrate that a 30ish cpp 
routine is a very reasonable benchmark for the C64, for lines in both 
the x- and y- direction.

>Remember that cycle ratios of > 3.54:1 need to be achieved to make the C64
>faster than the Spectrum.

Remember also that the claims being made some months back were not
"The C64 is faster than the Spectrum at drawing lines" but rather
"The Spectrum's fast Z80 gives it an enormous advantage for drawing
lines."  (Along with mathematical calculations, software sprites,
and so on.)

We shall see.

-S


Article 71509 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!howland.erols.net!cpk-news-hub1.bbnplanet.com!news.bbnplanet.com!baron.netcom.net.uk!netcom.net.uk!server3.netnews.ja.net!news.ox.ac.uk!news
From: imc@ecs.ox.ac.uk (Ian Collier)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 15 Jul 1997 12:37:57 GMT
Organization: Oxford University Computing Laboratory
Lines: 75
Message-ID: <11489.imc@comlab.ox.ac.uk>
References: <33A9DD07.153E@skynet.be> <33C3830B.446B@wi.leidenuniv.nl> <33C4CD48.41C6@wi.leidenuniv.nl> <5q5a8o$fl6@ds2.acs.ucalgary.ca>
NNTP-Posting-Host: gruffle.comlab.ox.ac.uk
X-Modified-on: Tuesday, 15th July 1997 at 1:37pm BST
X-Local-Date: Tuesday, 15th July 1997 at 1:19pm BST
Xref: news.acns.nwu.edu comp.sys.cbm:71509 comp.sys.sinclair:41479

In article <5q5a8o$fl6@ds2.acs.ucalgary.ca>, "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca> wrote:
>enter:  A=dx/2
>        B=dx/8=counter
>        C=char line counter *new*
>        DE=screen address offset to next char line+#0100 *new*
>        HL=screen address offset to next block+#0100 *new*
>        B'=dx
>        C'=dy
>        D'=scan line counter *new*
>        HL'=screen address
>        E'=(HL')=screen byte there before line drawn

Hey, you didn't use the A' register! :-)

>fixyN   ADD A,B         4       ; A=A+dx
>        LD (HL),E       7       ; write byte to screen
>        DEC H           4       ; move up one scan line
>        LD E,(HL)       7       ; get screen byte there
>        DEC D           4       ; if D=0 then we have moved into the
>        JP NZ,backN     10/10   ;   next char line or block
>        EXX             4       ; get the set with char line counter
>        DEC C           4       ; if C=0 then we have moved into the
>        JR Z,nxtblk     7/12    ;   next block
>nxtscan PUSH DE         11      ; push address offset for next char line
>        EXX             4       ; get back normal set with HL=screen addr
>        POP DE          10      ; get offset
>        ADD HL,DE       11      ; add offset to screen address
>        LD E,(HL)       7       ; get screen byte
>        LD D,8          7       ; update scan line counter
>        JP backN        10      
>nxtblk  PUSH HL         11      ; push offset for next block
>        LD C,8          7       ; update char line counter
>        EXX             4       ; get back normal set with HL=screen addr
>        POP DE          10      ; get offset
>        ADD HL,DE       11      ; add offset to screen address
>        LD E,(HL)       7       ; get screen byte there
>        LD D,8          7       ; update scan line counter
>        JP backN        10

Firstly, rather than writing this lot out I'd probably do the char-line and
block calculations mathematically.  This frees up some registers and reduces
the obligation to keep lots of counters.

fixyN   ADD A,B         4       ; A=A+dx
        LD (HL),E       7       ; write byte to screen
        DEC H           4       ; move up one scan line
        LD E,(HL)       7       ; get screen byte there
        DEC D           4       ; if D=0 then we have moved into the
        JP NZ,backN     10/10   ;   next char line or block
        LD D,8          7       ; update scan line counter
        LD E,A          4       ; save accumulator
        LD A,L          4       ; move up one character line
        SUB 32          7
        LD L,A          4
        JR C,endN       12/7    ; end if crossing a block
        LD A,H          4       ; otherwise reverse the last 8 "DEC H"s
        ADD 8           7
        LD H,A          4
endN    LD A,E          4       ; restore accumulator
        LD E,(HL)       7       ; fetch screen byte
        JP backN        10

So 168 times out of 192, this takes 41 cycles (including the 5 needed to
jump out of the main draw routine); it takes 100 cycles 3 times out of 192,
and for the remaining 21 times it takes 110 cycles.  Hah!  I have actually
made it faster.  :-)

Secondly, we can avoid the expense in memory terms of all these fixyN
routines by having them called instead of jumped to.  This costs 3 cycles
for each move East and 4.3 for each move Northeast, but saves you from
potentially being bitten by the program getting too long for the relative
jumps.  It's all a big trade-off between many factors.
-- 
---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html


Article 71731 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!vixen.cso.uiuc.edu!ais.net!europa.clark.net!news-feed.inet.tele.dk!enews.sgi.com!decwrl!tribune.usask.ca!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Fri, 18 Jul 1997 15:03:10 -0600
Organization: Calgary Free-Net
Lines: 175
Message-ID: <5qolko$gqg@ds2.acs.ucalgary.ca>
References: <33A9DD07.153E@skynet.be> <33C3830B.446B@wi.leidenuniv.nl> <33C4CD48.41C6@wi.leidenuniv.nl> <5q5a8o$fl6@ds2.acs.ucalgary.ca> <11489.imc@comlab.ox.ac.uk>
Reply-To: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <11489.imc@comlab.ox.ac.uk>
Xref: news.acns.nwu.edu comp.sys.cbm:71731 comp.sys.sinclair:41738


On 15 Jul 1997, Ian Collier wrote:


Wouldn't you know it?  I wrote almost exactly the same code for updating
the y row.  It's below:

The variables needed are now reduced to:

> >enter:  A=dx/2
> >        B=dx/8=counter
> >        B'=dx
> >        C'=dy
> >        D'=scan line counter *new*
> >        HL'=screen address
> >        E'=(HL')=screen byte there before line drawn

To make the discussion below make more sense, here's a snippit of the draw
subroutine:

DRAW	EXX

	SUB C		; A=A-dy
	JR C,fixy3	; if carry, fix y row
back3	SET 3,E		; plot bit 3
 
	SUB C		; etc. for all pixels
	JR C,fixy4
back4	SET 4,E

(...)

	SUB C		; bit 7 is special as byte must be drawn
	JR C,fixy7	; and must advance to next column
back7	SET 7,E
	LD (HL),E
	DEC L		; *** Funny nobody else noticed this, but
			; the pixels are being plotted right to left
			; so this routine draws from right to left,
			; bottom to top.  Thus, 'DEC L' not 'INC L'.
			; It's just a matter of changing some INCs
			; and DECs and SUBs to ADDs to change the
			; line drawing direction if necessary.
	LD E,(HL)

	SUB C		; finish for all 8 pixels
	JR C,fixy0
back0	SET 0,E

(..)

	SUB C
	JR C,fixy2
back2	SET 2,E

	EXX		; finished 8 pixel group
	DJNZ DRAW	; if dx/8 counter is not zero, go back

	EXX		; after loop, be sure to write
	LD (HL),E	; final byte

	RET

All fixyN are identical.

fixyN   ADD A,B         4       ; A=A+dx
        LD (HL),E       7       ; write byte to screen
        DEC H           4       ; move up one scan line
        LD E,(HL)       7       ; get screen byte there
        DEC D           4       ; if D=0 then we have moved into the
        JP NZ,backN     10/10   ;   next char line or block

	LD D,8		7	; update scan line counter
	LD E,A		4	; save A
	LD A,L		4	; goto next char line
	SUB #20		7
	LD L,A		4
	JR C,block	7/12	; if carry, have moved to next block
	LD A,D		4	; add 8 to scan line
	ADD A,H		4
	LD H,A		4
block	LD A,E		4	; recover A
	LD E,(HL)	7	; get screen byte stored here
	JP backN	10

41*168/192 + 107*21/192 + 96*3/192 + 19 = 68.1 cycles for NE.
and 21.25 cycles for E as shown before.

> Secondly, we can avoid the expense in memory terms of all these fixyN
> routines by having them called instead of jumped to.  This costs 3 cycles
> for each move East and 4.3 for each move Northeast, but saves you from
> potentially being bitten by the program getting too long for the relative
> jumps.  It's all a big trade-off between many factors.

Yes, definitely preferrable, but only if those few cycles aren't
important.  Then there's only one copy of fixyN for all the DRAWs (see
below) which saves on the memory.  There won't be any trouble with the
relative jumps as half the fixyN can appear above the DRAW and half below.

Then there's a choice between a memory hungry 21.25/68.1 and a less memory
hungry 24.25/72.4.

This method can also be sped up in the x direction by replacing the SET
b,E instructions with an INC E instead.  In the fixy part, E is used as an
index into a table to get the byte written.  E would then have to be
reinitialized to the correct spot in the table to begin counting from a
place with the correct starting pixel set (another table lookup?).  One
time out of 8, this would also have to be done in the main loop, making
the savings per E pixel less than 4 cycles (probably about 2).

Anyway, that would be the fastest one could get with this method on a Z80
as we're left with 3 instructions per E pixel, 2 out of 3 of which are 4
cycles (the fastest possible).  The factor dragging it down with respect
to the 6502 is the relatively slow "jr".

Another method I would investigate is trying to find out how many x pixels
are plotted before a y pixel is changed (ie - calculate dx/dy with
perhaps logs?).  Then you could end up plotting several pixels or even
bytes without any decisions (jr's).  Although I know the version here
isn't the fastest possible, I'll stay satisfied with it.  Another useless
piece of info for the interested reader:  it can also be easily modified
to draw patterned lines.


This is actually the 2nd unrolled version (the one plotting points
directly on the screen was also unrolled).  And, as it happens, the
original version of Elite by Braben & Bell was written for the BBC
computer, another 6502 based machine, so the line drawing routine in it
was actually written for the 6502 and is not a translated z80 routine.

How this thing works:  I need 8 copies of the DRAW routine.  Each copy has
a different starting pixel (the one above starts plotting at bit 3; I also
need versions that plot starting on all other bits).

Notice that the counter is dx/8 and therefore these DRAW routines only
draw multiples of 8 pixels in the horizontal direction starting at any
horizontal coordinate. To be practical, I also need to draw an additional
0-7 pixels depending on the remainder of dx/8.  For this reason, I need
another 8 copies of DRAW above, though slightly modified.  The terminating
loop counter (EXX; DJNZ) is replaced with a JP to the correct DRAW
routine.  These routines are entered at any of the "SUB C" locations above
depending on 'dx mod 8' (remainder of the division).

The DRAW above takes 50 bytes.  The fixyN takes 25 bytes.  Therefore the
21.25/68.1 version needs less than 16*(50+25)=1200 bytes.  The 24.25/72.4
version needs less than 16*50+25=825 bytes.  (Less than because the
modified DRAWs for 0-7 pixels need just 7 bits rather than 8).

As for overhead to glue this together, it shouldn't take much.  Just the
use of a couple of calculated gotos which can be stored in the DE/HL
registers and acted upon using JP (HL).

This line draw is good, of course, for lines with slope between 0 and -1.
The version with slopes between 1 and infinite (the y coordinate changes
more often than the x coordinate) needs a slightly different routine.
Such a routine would be written such that overhead cycles are shuffled
from the y coordinate calculation to the x coordinate calculation.
Anyhow, starting with a 68.1 cycle y coordinate, I don't think I'd have
much trouble getting down to 56 cycles to get that 2:1 ratio with the so
far mentioned 28 cycle y coordinate on the C64.  I know that's a lot of
hot air before it's shown, but using the 68.1 cycles, the ratio is 2.43:1,
nowhere near the 3.54:1 ratio needed to make the Spectrum and C64 the same
speed.

And just a caution:  these cycle ratios have little to do with a
comparison of a z80 and a 6502 as all code is tied very tightly to each
machine's display file organization.  With the Spectrum, at least, there
was little thought put into how neatly the z80 could access the display
file.


Alvin


Article 71775 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 20 Jul 1997 22:06:11 GMT
Organization: Northwestern University, Evanston, IL
Lines: 150
Message-ID: <5qu24j$ive@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5q5a8o$fl6@ds2.acs.ucalgary.ca> <11489.imc@comlab.ox.ac.uk> <5qolko$gqg@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:71775 comp.sys.sinclair:41768


In article <5qolko$gqg@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht <albrecht@freenet.calgary.ab.ca> wrote:
>...
>
>Then there's a choice between a memory hungry 21.25/68.1 and a less memory
>hungry 24.25/72.4.
>...
>
>How this thing works:  I need 8 copies of the DRAW routine.  Each copy has
>a different starting pixel (the one above starts plotting at bit 3; I also
>need versions that plot starting on all other bits).
>...
>0-7 pixels depending on the remainder of dx/8.  For this reason, I need
>another 8 copies of DRAW above, though slightly modified.  The terminating
>...
>The DRAW above takes 50 bytes.
>The fixyN takes 25 bytes.  Therefore the

And are there not eight of them?

>21.25/68.1 version needs less than 16*(50+25)=1200 bytes.  The 24.25/72.4
>version needs less than 16*50+25=825 bytes.  (Less than because the
>modified DRAWs for 0-7 pixels need just 7 bits rather than 8).

Sixteen copies of a 250 byte routine gives me 4k.  Plus another set,
for lines with negative slope.  Plus another set of routines for stepping 
in the y-direction, and we get, what?  A complicated routine, taking 10k?
14k?  No exclusive-ORing?

Oh bother: now we're into just the kind of contest I detest, and exactly the 
kind of routines I (usually :) detest.  Oh well... A similar 64 routine
looks like

CONT1	SBC DX (or SBC #DX)	3 (2)
	BCS CONT2		2/3
FIX1	ADC DY (or #DY)			3 (2)
	STA TEMP			3
	TXA				2
	EOR #$7F			2
	EOR BITP,Y (or (BITP),Y)	4 (5)
	STA BITP,Y			4 (5)
	LDX #$7F			2
	DEY				2
	LDA TEMP			3

CONT2	SBC DX
	BCS CONT3
FIX2	ADC DY
	STA TEMP
	TXA
	EOR #$3F
	...
	LDX #$3F
	...
CONT8	SBC DX
	BCS NEXTX
	...
NEXTX	Fix up column using your favorite method
	DEC COUNT
	BEQ DONE
	JMP CONT1
	fix up last points

(Actually FIX1 uses LDA #$80 instead of the TXA/EOR business,
but it's written out for completeness, blah blah blah)

Each CONT is 4 bytes, and each FIX is 18 bytes (or 16 if using zp-indirect).
The setup just jumps to the right starting spot in the routine.

Times are 6 (or 5) cycles per step in X, plus 24 (or 23) per step in Y.
This routine is actually the 10/40 routine in "column format", i.e. with
a separate routine (e.g. a 30-cycle, or the 10/40) to handle the last 
points in the x-direction.

Compare 5/28 (or 6/30 or...) with your own numbers: 21/68 (or 24/72).
By my calculator, these are around -- surprise surprise -- 2.8:1 to 3.2:1.

And this on an algorithm very well-suited to the Z80.  Similarly, we
can recall that your first algorithm, when translated directly to the
C64, ran at right around 3:1.

>The version with slopes between 1 and infinite (the y coordinate changes
>more often than the x coordinate) needs a slightly different routine.
>Such a routine would be written such that overhead cycles are shuffled
>from the y coordinate calculation to the x coordinate calculation.
>Anyhow, starting with a 68.1 cycle y coordinate, I don't think I'd have
>much trouble getting down to 56 cycles to get that 2:1 ratio with the so

Good; please do write one.  Here's a routine in the manner you might be
wont to use:

CONT11	TAX		2
	LDA #$80	2
	EOR BITP,Y	4
	STA BITP,Y	4
	DEY		2
	TXA		2
	SBC DX		3 (2)
	BCC FIX1	2/3

CONT12	TAX
	LDA #$80
	...

FIX1	ADC DY
CONT21	TAX
	LDA #$40
	EOR ...

21 cycles, plus 4 for changes in x.  A more sane algorithm might loop
with Y, and use Y as the counter as well; these would make a 24 cycle
loop (plus 4 for changes in x).  More insane versions are also
available upon demand, but I think they are better suited to contests
than a real program.

>far mentioned 28 cycle y coordinate on the C64.  I know that's a lot of
>hot air before it's shown, but using the 68.1 cycles, the ratio is 2.43:1,

Looks like 3.2:1 so far.  The 28 cycles was for slope<1 (and was compact).

>nowhere near the 3.54:1 ratio needed to make the Spectrum and C64 the same
>speed.

Hmmmm...  You've seemed so insistent with this comment (and its implication)
that I decided I better go through the archives to check if anyone had
claimed that any software routines were going to be faster on the C64 than on
the Spectrum.  I couldn't find anyone, although there was one dashing young 
fellow who predicted that the typical cycle ratio would be around 3:1.

And, while I was in there, I came across some interesting declarations:
---
>I'm expecting a 2:1 cycle ratio for tasks (z80:6502) most of the time and
>at other times as low as 1.5:1 or maybe less.

>In graphics applications, I expect the z80 to be even faster than 2:1
>because these apps require fast access to the full 64k in a way that's
>not necessarily regular.

>Arbitrary
>precision arithmetic is one that I believe will be more than 2:1 but not
>much more and I think you'd be hard-pressed to find many apps that fall
>into this category.

>I'm expecting something less than 2:1 clock ratios on the average with
>ratios of 1.5:1 achieved in some classes of problems.  I do concede
>clock ratios of up to 2.5:1 (maybe 3:1) on rare occasions.
---

	evetS-


Article 71841 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!math.ohio-state.edu!howland.erols.net!cpk-news-hub1.bbnplanet.com!news.bbnplanet.com!baron.netcom.net.uk!netcom.net.uk!server3.netnews.ja.net!news.ox.ac.uk!news
From: imc@ecs.ox.ac.uk (Ian Collier)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: 21 Jul 1997 17:09:56 GMT
Organization: Oxford University Computing Laboratory, UK
Lines: 55
Message-ID: <11532.imc@comlab.ox.ac.uk>
References: <33A9DD07.153E@skynet.be> <5q5a8o$fl6@ds2.acs.ucalgary.ca> <11489.imc@comlab.ox.ac.uk> <5qolko$gqg@ds2.acs.ucalgary.ca>
NNTP-Posting-Host: boothp2.ecs.ox.ac.uk
X-Local-Date: Monday, 21st July 1997 at 6:09pm BST
Xref: news.acns.nwu.edu comp.sys.cbm:71841 comp.sys.sinclair:41806

In article <5qolko$gqg@ds2.acs.ucalgary.ca>, "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca> wrote:
>	LD A,D		4	; add 8 to scan line

I believe this is the only difference between your fixyN and mine (you
remembered that D=8).

>41*168/192 + 107*21/192 + 96*3/192 + 19 = 68.1 cycles for NE.
>and 21.25 cycles for E as shown before.

So how come you get different answers for the timings? ;-)

>> Secondly, we can avoid the expense in memory terms of all these fixyN
>> routines by having them called instead of jumped to.  This costs 3 cycles
>> for each move East and 4.3 for each move Northeast, but saves you from
>> potentially being bitten by the program getting too long for the relative
>> jumps.  It's all a big trade-off between many factors.

>Yes, definitely preferrable, but only if those few cycles aren't
>important.  Then there's only one copy of fixyN for all the DRAWs (see
>below) which saves on the memory.  There won't be any trouble with the
>relative jumps as half the fixyN can appear above the DRAW and half below.

It's a very close call, but I believe you are correct.

On the other hand, using the call, you could write a routine that plots a
line 256 pixels long without using a loop at all, and then to draw any line
simply jump into the routine at an appropriate value of x after poking in a
RET at another appropriate value of x.

>How this thing works:  I need 8 copies of the DRAW routine.  Each copy has
>a different starting pixel (the one above starts plotting at bit 3; I also
>need versions that plot starting on all other bits).

>Notice that the counter is dx/8 and therefore these DRAW routines only
>draw multiples of 8 pixels in the horizontal direction starting at any
>horizontal coordinate. To be practical, I also need to draw an additional
>0-7 pixels depending on the remainder of dx/8.  For this reason, I need
>another 8 copies of DRAW above, though slightly modified.

I'm not sure you need these.  You need eight routines, one _finishing_
at each possible position, and then to start the line you jump into the
appropriate routine at the appropriate bit-set instruction.

>                                      With the Spectrum, at least, there
>was little thought put into how neatly the z80 could access the display
>file.

Little thought from whom?  My opinion is that it was designed for speed
in printing characters, so they did in fact think about how the Z80 could
access it.  I can't think of any other reason for the layout - surely in
hardware terms a linear scan would have been easier?  But that would make
your line routine slower, not faster.
-- 
---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html


Article 72163 of comp.sys.cbm:
Path: news.acns.nwu.edu!newsfeed.acns.nwu.edu!news.ece.nwu.edu!news.cse.psu.edu!uwm.edu!homer.alpha.net!mvb.saic.com!nntprelay.mathworks.com!howland.erols.net!infeed2.internetmci.com!newsfeed.internetmci.com!in3.uu.net!207.167.14.9!scanner.worldgate.com!news.agtac.net!rover.ucs.ualberta.ca!news.ucalgary.ca!srv1.freenet.calgary.ab.ca!albrecht
From: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: Re: Elite line routine (was Re: spectrum faster in 3D)
Date: Sat, 26 Jul 1997 13:26:02 -0600
Organization: Calgary Free-Net
Lines: 251
Message-ID: <5rdj1u$ias@ds2.acs.ucalgary.ca>
References: <33A9DD07.153E@skynet.be> <5q5a8o$fl6@ds2.acs.ucalgary.ca> <11489.imc@comlab.ox.ac.uk> <5qolko$gqg@ds2.acs.ucalgary.ca> <5qu24j$ive@news.acns.nwu.edu>
Reply-To: "Alvin R. Albrecht" <albrecht@freenet.calgary.ab.ca>
NNTP-Posting-Host: albrecht@srv1.freenet.calgary.ab.ca
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <5qu24j$ive@news.acns.nwu.edu>
Xref: news.acns.nwu.edu comp.sys.cbm:72163 comp.sys.sinclair:42173


On 20 Jul 1997, Stephen Judd wrote:

> >21.25/68.1 version needs less than 16*(50+25)=1200 bytes.  The 24.25/72.4
> >version needs less than 16*50+25=825 bytes.  (Less than because the
> >modified DRAWs for 0-7 pixels need just 7 bits rather than 8).
 
> Sixteen copies of a 250 byte routine gives me 4k.  Plus another set,
> for lines with negative slope.  Plus another set of routines for stepping 
> in the y-direction, and we get, what?  A complicated routine, taking 10k?
> 14k?  No exclusive-ORing?

Not quite.  Ian's right: after some head scratching, 8 copies are needed.
So that's 8*(50+25)=600 bytes for the 21.25/68.1.  There are four copies
of this to cover all line drawing directions with |slope|<=1.  I'd use a
different version for |slope|>1 (below) which is faster.  So that's
4*600=2400 bytes plus whatever is below.

This version can be easily modified to do XORing or patterned lines.  If
E=0 rather than E=(HL) while doing the SET b,E instructions, you wind up
with a byte containing pixels that are to be plotted.  The actual
XOR/patterned line business will add ~2 cycles to the E direction.  Those
2 cycles can be shaved off by adopting the INC E rather than SET b,E
method I mentioned before.

That's the fastest this method can get on the z80.  However, it's not the
fastest method.  I'll post another version tomorrow which I hope will
get faster times.

> Oh bother: now we're into just the kind of contest I detest, and exactly the 
> kind of routines I (usually :) detest.  Oh well... A similar 64 routine

:) You gotta get your hands a little dirty when writing games.  But don't
be pointing accusing fingers at me for being such a memory hog; after
all, you are using absolute locations on the zero page - wouldn't that be
a no-no in a systems environment?

> looks like

...snip...

I'm going to have to spend some time thinking about this one - still
haven't quite got the hang of 6502 mnemonics.
 
> Times are 6 (or 5) cycles per step in X, plus 24 (or 23) per step in Y.
> This routine is actually the 10/40 routine in "column format", i.e. with
> a separate routine (e.g. a 30-cycle, or the 10/40) to handle the last 
> points in the x-direction.

I took a closer look at your 10/40 routine (isn't that 13/40?).  Here's
the xloop:

xloop	LSR chunk
     	BEQ fixc
      	SBC dy
     	BCC fixy

	DEX
	BNE xloop

which turns into 8/?  when it is unrolled.  Have you forgotten to include
a few cycles in your timings?  Unless I'm mistaken, the BEQ fixc is taken
once every eight pixels.  The fixc portion of your code seems to take at
minimum 38 cycles.  Shouldn't it be 8+38/8=12.75 cycles in the E
direction?  This is how I'm counting E cycles in the z80 version.


> Compare 5/28 (or 6/30 or...) with your own numbers: 21/68 (or 24/72).
> By my calculator, these are around -- surprise surprise -- 2.8:1 to 3.2:1.

That depends on whether we're counting cycles in the same way.  Let me
know if I'm mistaken above.

These "typical" cycle ratios you quote don't apply to the draw routine as
any implementation depends on each machine's display file organization
(see below for what I mean).

> And this on an algorithm very well-suited to the Z80.  Similarly, we

Really?  In the z80 version, one out of 3 instructions is a branch for
each E pixel.  In the C64 version, 1/2 of the instuctions is a branch.
That would indicate to me that this particular algorithm is well suited to
a 6502.

Stay tuned for another approach :).

> can recall that your first algorithm, when translated directly to the 
> C64, ran at right around 3:1.

Again, careful with the comparisons.  The cycle ratios apply to a Spectrum
& C64, not a z80 & 6502 - much has to do with the display files.

> >The version with slopes between 1 and infinite (the y coordinate changes
> >more often than the x coordinate) needs a slightly different routine.
...
> >Anyhow, starting with a 68.1 cycle y coordinate, I don't think I'd have
> >much trouble getting down to 56 cycles to get that 2:1 ratio with the so

> Good; please do write one.

Here's a version that allows XORing/general patterned lines.


DRAW	EX AF,AF'	4	; save A
	LD A,(HL)	7	; get screen byte
	OR E		4	; OR in pixel (or XOR, do pattern here)
	LD (HL),A	7	; write to screen
	EX AF,AF'	4	; restore A

	DEC H		4	; move up a row
	SUB dx		4
	JP NC,up1	10/10	; if carry, next x pixel
	ADD A,dy	4
	RRC E		8	; rotate pixel
	JP NC,up1	10/10	; if carry, next column
	INC L		4

up1	EX AF,AF'		; again, write pixel
	LD A,(HL)
	OR E
	LD (HL),A

	LD A,H		4	; once in 8 times, DEC H not good enough
	ADD A,7		7	; assume scan line - add 7
	LD H,A		4
	LD A,L		4
	SUB #20		7
	JP NC,scnline	10/10	; if carry, move up a block
	LD A,H		4
	SUB 8		7
	LD H,A		4

scnline EX AF,AF'	4	; restore A
	SUB dx		4	; back to normal here (above)
	JP NC,up2	10/10
	ADD A,dy	4
	RRC E		8
	JP NC,up2	10/10
	INC L		4

up2	EX AF,AF'		; same as DRAW
	LD A,(HL)
	OR E
	LD (HL),A
	EX AF,AF'

	DEC H
	SUB dx
	JP NC,up3
	ADD A,dy
	RRC E
	JP NC,up3
	INC L

up3	....			; up3 to up7 are the same as up2

	DJNZ DRAW


Time is 26+18+(36*21/24+51*3/24)/8=48.7 cycles in y direction.  Add 
22+4/8=22.5 if increasing x.

Here's yours.  The version I wrote above unrolls with respect to the goofy
vertical display file organization rather than w.r.t. the horizontal
pixel.  (Ahh, what a relief it would be just to be able to decrement a
variable to travel up the rows :(

And alongside is the equivalent z80 code just to emphasize the impact of
display file organizations on the cycle times.

> CONT11	TAX	2	EX AF,AF'	4
> 	LDA #$80	2	SET 7,(HL)	15
> 	EOR BITP,Y	4
> 	STA BITP,Y	4
> 	DEY		2	DEC L		4
> 	TXA		2	EX AF,AF'	4
> 	SBC DX		3 (2)	SUB dx		4 ;dx in some reg.
> 	BCC FIX1	2/3	JP C,FIX1	10/10

> CONT12	TAX
> 	LDA #$80
> 	...
> 
> FIX1	ADC DY
> CONT21	TAX
> 	LDA #$40
> 	EOR ...
> 
> 21 cycles, plus 4 for changes in x.  A more sane algorithm might loop

I do believe you have forgotten to update the column.  Doesn't that
involve a write into the LSBs of all those EOR BITP,Y / STA BITP,Y
instructions?  Doesn't that take a whole lot of cycles even after dividing
by 8?  Or should they be EOR (BITP),Y / STA (BITP),Y at 5 a piece instead?

The z80 equivalent above doesn't suffer from that problem.  The changing
in x coordinate takes 4 (for addition) + time to update column/8.  Which
gives it a 41/4 compared to your incomplete 21/4 : a little less than 2:1.

Add 6 cycles to the 41 if you want XORing.  Before I compare, let's see
some corrected cycle times.

I can also write a version (for the Spectrum) that sets pixels only that
will run at 39 plus 4.5 cycles if increase in the x direction.  Let me
know if you want to see it.  This is the fastest I think I can get for
slopes>1.

> >nowhere near the 3.54:1 ratio needed to make the Spectrum and C64 the same
> >speed.
 
> Hmmmm...  You've seemed so insistent with this comment (and its implication)

That's what you have to reach.

> >I'm expecting a 2:1 cycle ratio for tasks (z80:6502) most of the time and
> >at other times as low as 1.5:1 or maybe less.

You haven't seen the worst (and common) cases yet :).
 
> >In graphics applications, I expect the z80 to be even faster than 2:1
> >because these apps require fast access to the full 64k in a way that's
> >not necessarily regular.

I was referring to sprites, btw, but let's see how this draw routine turns
out once we have all the misunderstandings cleared away.
 
> >Arbitrary
> >precision arithmetic is one that I believe will be more than 2:1 but not
> >much more and I think you'd be hard-pressed to find many apps that fall
> >into this category.
 
> >I'm expecting something less than 2:1 clock ratios on the average with
> >ratios of 1.5:1 achieved in some classes of problems.  I do concede
> >clock ratios of up to 2.5:1 (maybe 3:1) on rare occasions.

I still stand by these - after I'm finished with your probs, I'll start
turning out what I think will be difficult problems for the 6502.

The 3:1 ratios you have been claiming I don't agree with - see other posts
as I have time to churn them out.

I'd like to clear the air since I have been accused (by someone else) of
claiming some huge speed advantage for the z80 over the 6502.  Since I
believe in a 2:1 physical clock ratio, a typical 2:1 cycle ratio in
software means the z80 & 6502 typically have a similar throughput.


Alvin


Article 72336 of comp.sys.cbm:
Path: news.acns.nwu.edu!merle!judd
From: judd@merle.acns.nwu.edu (Stephen Judd)
Newsgroups: comp.sys.cbm,comp.sys.sinclair
Subject: The End (was Re: Elite line routine)
Date: 31 Jul 1997 03:21:21 GMT
Organization: Northwestern University, Evanston, IL
Lines: 165
Message-ID: <5rp0bh$sh1@news.acns.nwu.edu>
References: <33A9DD07.153E@skynet.be> <5qolko$gqg@ds2.acs.ucalgary.ca> <5qu24j$ive@news.acns.nwu.edu> <5rdj1u$ias@ds2.acs.ucalgary.ca>
Reply-To: sjudd@nwu.edu (Stephen Judd)
NNTP-Posting-Host: merle.acns.nwu.edu
Xref: news.acns.nwu.edu comp.sys.cbm:72336 comp.sys.sinclair:42358

In article <5rdj1u$ias@ds2.acs.ucalgary.ca>,
Alvin R. Albrecht <albrecht@freenet.calgary.ab.ca> wrote:
>
>
>On 20 Jul 1997, Stephen Judd wrote:
>
>> >21.25/68.1 version needs less than 16*(50+25)=1200 bytes.  The 24.25/72.4
>> >version needs less than 16*50+25=825 bytes.  (Less than because the
>> >modified DRAWs for 0-7 pixels need just 7 bits rather than 8).
> 
>> Sixteen copies of a 250 byte routine gives me 4k.  Plus another set,
>> for lines with negative slope.  Plus another set of routines for stepping 
>> in the y-direction, and we get, what?  A complicated routine, taking 10k?
>> 14k?  No exclusive-ORing?
>
>Not quite.  Ian's right: after some head scratching, 8 copies are needed.
>So that's 8*(50+25)=600 bytes for the 21.25/68.1.  There are four copies

Not only is only one copy needed, you seem to have missed the essential 
point that each of your y-updates requires 25 bytes, and there are eight 
of them; hence 250 bytes per routine.  The statement above was simply a
correction of your calculation, which assumed 16 copies.

>That's the fastest this method can get on the z80.  However, it's not the
>fastest method.  I'll post another version tomorrow which I hope will
>get faster times.

You should of course feel free to do so, but I think my postings are at
an end; see below.

>> Oh bother: now we're into just the kind of contest I detest, and exactly the 
>> kind of routines I (usually :) detest.  Oh well... A similar 64 routine
>
>:) You gotta get your hands a little dirty when writing games.  But don't
>be pointing accusing fingers at me for being such a memory hog; after
>all, you are using absolute locations on the zero page - wouldn't that be
>a no-no in a systems environment?

I am not at all sure what you are attempting to imply.  What else would I
do with a zero page location?

As to what I dislike, it is the fact that this has turned into a contest
at all.  That is, you seem to concentrate on "winning", but I'm not quite
sure how/what you're trying to win.  My own goal was to gain insight into
a number of issues, by asking for some code that I could compare with
my own.  It now seems to be an iteration:

	wait 1-2 weeks for posting with new and faster routine
	follow up with corrections and 6510 version of routine
	repeat until 0=1

where each successive iteration involves ever longer, more unrolled, more 
cumbersome code.  There's nothing wrong with that but it doesn't seem to 
be generating any new insights, and there are any number of things I would 
rather do than grade code (*shudder*), rehash old routines (ugh), or
engage in insubstantial polemics (anathema).

I have been looking for insight into three main questions:

1. How do the Z80 and 6502 differ, and what is the Z80 approach to solving
   problems/how does it differ from the 6502 approach?

2. How would the Spectrum perform on practical problems, specifically the 
   kinds of programs that I write/have written?

3. Is there any merit to the earlier claims about the clear superiority
   of the Spectrum a) in general and b) specifically for 3D graphics?

On the whole I have learned a number of things, and overall it has been
a useful and enlightening experience.  My conclusions are in a following
post, along with my answers to the above questions, but first a few things 
need tidying up:

>> Times are 6 (or 5) cycles per step in X, plus 24 (or 23) per step in Y.
>> This routine is actually the 10/40 routine in "column format", i.e. with
>> a separate routine (e.g. a 30-cycle, or the 10/40) to handle the last 
>> points in the x-direction.
>
>I took a closer look at your 10/40 routine (isn't that 13/40?).  Here's
>the xloop:
>
>...
>
>which turns into 8/?  when it is unrolled.  Have you forgotten to include

That looks like the dim4 routine, which is clearly not a 10/40 routine.
As I indicated in the quote above, the code I posted was 10/40 unrolled across
columns; as I indicated many weeks back, the 10/40 routine is a slightly 
unrolled version of the dim4 routine.  It looks like:

	SBC dy
	BCS :CONT1
	ADC DX
	STA TEMP
	LDA #$7F
	...
:CONT1	DEX
	BEQ DONE

>> Compare 5/28 (or 6/30 or...) with your own numbers: 21/68 (or 24/72).
>> By my calculator, these are around -- surprise surprise -- 2.8:1 to 3.2:1.
>
>These "typical" cycle ratios you quote don't apply to the draw routine as
>any implementation depends on each machine's display file organization
>(see below for what I mean).

Last time I checked these articles have been in two newsgroups: one devoted
to the Spectrum, and the other to the C64.  You are basically arguing a
moot point, to nobody in particular.

>> And this on an algorithm very well-suited to the Z80.  Similarly, we
>
>Really?  In the z80 version, one out of 3 instructions is a branch for

Small number of variables, simple adds and subtracts, sequential memory
access... that's about as Z80 as it gets.

>Time is 26+18+(36*21/24+51*3/24)/8=48.7 cycles in y direction.  Add 
>22+4/8=22.5 if increasing x.

Bizarre numbers, but 49/22 compared with 21/4...  2.6:1.  Whaddaya know.

>> 
>> 21 cycles, plus 4 for changes in x.  A more sane algorithm might loop
>
>I do believe you have forgotten to update the column.  Doesn't that

I have/did not.

>> >nowhere near the 3.54:1 ratio needed to make the Spectrum and C64 the same
>> >speed.
> 
>> Hmmmm...  You've seemed so insistent with this comment (and its implication)
>
>That's what you have to reach.

I do?  Since when?  You seem to know a lot about what I have to do.  Is
that how one "wins" this game?

>> >I'm expecting a 2:1 cycle ratio for tasks (z80:6502) most of the time and
>> >at other times as low as 1.5:1 or maybe less.
>
>You haven't seen the worst (and common) cases yet :).

I have now seen more than seven different algorithms, addressing a fairly
broad range of computing problems.  I have also computed the cycle ratios
for those algorithms.  Want to guess what they've come out to be?

>The 3:1 ratios you have been claiming I don't agree with 

That is because you start with the answer you are looking for, and then
gather evidence to support it; as a general tip, you would do better to
do things in the opposite order.

>- see other posts as I have time to churn them out.

Ummmm...  I could read the Economist, read great works of literature,
write killer 64 programs, listen to great music, do great mathematics,
investigate the mysteries of the universe... or I could read the wordly
Wisdom of Alvin, as you find the time in your busy and important schedule 
to churn it out.  Not exactly a tough choice, eh? :)

You are, of course, free to have the last word(s) :).

-Steve