[Gllug] Re: Dual core AMDs

Robert Newson ran at bullet3.fsnet.co.uk
Thu Jul 14 23:57:13 UTC 2005


Christian Smith wrote:

...
> Ah, but it did save a branch instruction, which was at least 2 (3?) cycles
> per loop. There was non of this modern pipelining technology back then,
> you know(*).
...

> (*) Actually, the Z80 did make a few instructions execute 1 cycle slower,
>     so that it could get the next instruction in flight, thus forming a
>     primitive 2 stage pipeline under some circumstances. Can't remember
>     the instructions, but they could have been some memory writes.

Forgot to mention: the 6502 seemed much more clock cycle efficient, ranging 
from 2 to 7 clock cycles (but generally in the 2-4 range), whereas the Z80 
ranges from 4 to 23 clock cycles (with indexed addressing particularly 
penalised at 15-23 cycles[1]).

How did the 6502 manage this?  Simple: by pipelining[2] (and syncronous 
memory accessing[3]) - as each instruction was finishing off, the next 
instruction was being loaded...the only problem comes with branches[4] which 
are taken causing a bubble in this pipeline; the next op code is fetched as 
the branch instruction is interpreted and has to be discarded as the 
corrected address is then put into PC.

[1] the Z80 was an 8080 "clone", but AFAICR, the 8080 had no index regs, and 
so they seem to have been "cobbled" into the instruction set by being a 
prefix (byte) specifying the index reg to the (HL) addressing op code, along 
with the byte displacement.

[2] The overlap of fetching the next memory location while interpreting the 
current data from memory minimizes the operation time of a normal 2- or 
3-byte instruction and is referredt to as pipelining.  (Synertek 
SY6500/MCS6500 Microcomputer family programming manual, August 1976)

[3] every time the clock ticked, a byte of data was fetched from, or written 
to, memory.  A quirk of the asyncronous memory access of the Z80 (or so I 
was told by a reliable source - ie my older brother) was that a 3.5MHz Z80 
could actually run faster than a 4Mhz Z80 because the designer might put in 
an extra wait state (clock cycle) to ensure the data was stable before being 
used (memory access was not as fast then as it is now).

[4] including subroutines and JMP instructions (as they're branches that are 
always taken)


Another thing I noticed about the two processors was that the 6502 did 
subtraction by negative addition (ie adding the 1's complement of the number 
being subtracted and the Carry flag - thus the carry flag had to be set 
before a [sequence of] subtraction[s]), whereas the Z80 seemed to actually 
do subtraction (requiring the carry flag to be cleared before a [sequence 
of] subtraction[s] - or use the sub instr for the first one when poss).

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list