
   [Previous] [Next] [Current Results] [Get Thread] [Author Profile]
   [Post] [Post] [Reply]
   [LINK][LINK]
     _________________________________________________________________
                                      
  Article 2 of 9
  
Subject:      Re: Elite line routine (was Re: spectrum faster in 3D)
From:         imc@ecs.ox.ac.uk (Ian Collier)
Date:         1997/07/08
Message-Id:   <11421.imc@comlab.ox.ac.uk>
Newsgroups:   comp.sys.cbm,comp.sys.sinclair
[More Headers]

In article <5prc2d$6vh@news.acns.nwu.edu>, sjudd@nwu.edu (Stephen Judd) wrote:
>Well, if they are a shift and add type routine I don't doubt it,
>especially if it's 16 bits.  But I am referring to the fast multiply,
>a useful algorithm posed about a month back:

>       let f(x)=x^2/4
>       then a*b = f(a+b) - f(a-b)

>So, with a simple table of squares you can do an 8x8->16 integer multiply
>in around 45 cycles on a C-64.  That's the algorithm I'm waiting to
>see the Z80 version of.

Off the top of my head, since I have pretty much forgotten what went on
back then, how about this.  I assume both 8-bit numbers are positive.  The
overflows could get hairy, and to some extent it depends on which cases
are going to be most common.  I will try to minimise the average.  Here
goes...

         ; D and E hold the 8-bit numbers
         LD   A,D          4
         SUB  E            4
         JR   NC,nocarry1  12/7
         NEG               8      ; A = |D-E|
nocarry1 ADD  A            4
         LD   L,A          4
         LD   H,tbl_hi     7
         JR   NC,nocarry2  12/7
         INC  H            4      ; HL = 256*tbl_hi + 2*|D-E|
nocarry2 LD   C,(HL)       7
         INC  L            4
         LD   B,(HL)       7      ; BC = f(D-E)
         LD   H,tbl_hi     7
         LD   A,E          4
         ADD  D            4      ; A = D+E
         JR   NC,nocarry3  12/7
         INC  H            4
         INC  H            4
nocarry3 ADD  A            4
         LD   L,A          4
         JR   NC,nocarry4  12/7
         INC  H            4      ; HL = 256*tbl_hi + 2*(D+E)
         CCF               4      ; carry always reset for SBC
nocarry4 LD   E,(HL)       7
         INC  L            4
         LD   D,(HL)       7
         EX   DE,HL        4      ; HL = f(D+E)
         SBC  HL,BC        15     ; HL = f(D+E) - f(D-E)
         ; HL holds the result

This is more or less one linear program.  Tricks such as jumping out will
improve the timing only if the branches never rejoin the program (for
example: JR NC followed by two INCs takes either 12 or 15, but jumping
out and then back in will take either 10 or 20, which is worse on average).
This means you could write a program with 8 different return points which
takes a constant time (apart from the NEG thing) and saves between 1 and 5
cycles for each of the three branches.  You can even shave 1-2 cycles off
the NEG thing in the same manner by having a table of squares of negative
numbers.

Anyway, the time taken by this program is

8+{12,15}+15+{12,11}+33+{12,15}+8+{12,15}+37

which ranges from 148 to 158.
--
---- Ian Collier : imc@comlab.ox.ac.uk : WWW page (including Spectrum section):
------ http://www.comlab.ox.ac.uk/oucl/users/ian.collier/imc.html

   [LINK][LINK]
     _________________________________________________________________
                                      
   [Previous] [Next] [Current Results] [Get Thread] [Author Profile]
   [Post] [Post] [Reply]
     _________________________________________________________________
                                      
   Home   Power Search   Post to Usenet   Ask DN Wizard   Help
   Give us feedback!  |  Advertising Info  |  Press Releases  |  Jobs  |
   Policy Stuff
   Copyright © 1995-97 Deja News, Inc. All rights reserved.
