I saw a video on YouTube from computerphile, where one of the guys who invented the ARM cpu said back in 1986 their CPU was cheaper than the 68000 and the 80368 because of the RISC architecture. If this is the case, what took them so long to catch on? Was ARM unavailable in Japan at the time? Did the 32 bit data bus require too much external circuitry?
Re: why it took so long to catch on: an instruction set not compatible with x86 (i.e. couldn't run most existing software during that time period) sure makes this difficult. For another example, see the PowerPC.
In 1986 the x86 first-mover advantage wasn't insurmountably huge yet. But the 68k's might have been.
Performance-for-cost-wise, the Acorn Archimedes was neither drastically better (similar MIPS/MHz and MHz) nor cheaper than 68k or x86-based machines (Fair comparison: Amiga 1000 (1985; 256k RAM) for $1300, Amiga 2000 (1987; 512k RAM) for $1500 vs Archimedes 305 (1987; 512k RAM) for £800≈$1300)
Whenever I've sat down and actually run the numbers on real-world performance ... RISC architectures have always been a good deal more marketing woo then actually drastically better. Even the much-vaunted DEC Alpha seems to have really only been impressive because it was easier to make their design run at higher total power dissipation than other architectures.
ARM has single cycle memory access. I don't see how the 68000 could be close at all, unless the archimedes had slow memory.
RISC architectures consume craptons more memory for instruction encoding than CISC ones. This has always been their Achilles's heel. (It wasn't until Thumb and MIPS16 that they actually addressed this crippling design flaw)
(edit: the RISC central thesis is "what if each instruction did less? then we should be able to execute more of them". It turns out that when you make each instruction do less
and you make the instructions all the same larger size, you end up limited by memory bandwidth.)
Synthetic comparison:
ARMv2 is 0.5MIPS/MHz; 68k is 0.2MIPS/MHz. More-nearly-real-world comparison:
ARMv2 is about 300Dhrystones/MHz while
68k is about 250Dhrystones/MHz. (edit2: fix shared error in order of magnitude)
The original ARM design was genius, but ARM probably wouldn't have been a choice for any Japanese systems in the mid to late 80s. Acorn was a relatively small British firm, and ARM chips initially were built primarily for their own computer. From Wikipedia, it's not clear to me that they started widely licensing them out until the early 90s - and at that point, other chips were more than catching up to ARM's offerings on many fronts. And remember, hardware is usually in development for several years before release.
The common narrative is that Nintendo chose the 65816 for the SNES in the hopes of continuity with the NES: either for backwards compatibility, or at least for the sake of familiarity and code reuse for existing developers. And Sega chose the 68k for the MD because they'd been using it in arcade boards since the mid-80s, before ARM was a thing; it was also a widely available and kind of beloved chip used in pretty much all top-of-the-line desktops and workstations at the time, so developers had a good chance of being familiar with it.
Accepting the (debatable) premise that RISC ideas were what gave ARM the edge, well...pretty much every console manufacturer did start using RISC chips around the early to mid-90s, and really didn't stop until the most recent console generation. It's just that by the time that became an option there were better options on the market than ARM.
The 3DO (1993) ran on an ARM chip.
MIPS chips are RISC chips and probably the purest examples of them, being directly descended from the researchers at Stanford who pioneered the RISC idea in the first place. Nintendo partnered with SGI (the world leaders in the 3D space at the time) for the N64, and all of SGI's workstations all ran on MIPS, so no surprise that the N64 did too.
Sony also chose MIPS for the PlayStation. I can only speculate as to why - maybe to keep the hardware in line with the SGI workstations many devs probably were using, maybe because at that point in time it's where they got the best bang for their buck.
Sega went with Hitachi's SuperH architecture for the 32X and Saturn. As I understand it SuperH is also RISC-y, but has higher code density than ARM (before Thumb) and MIPS, so maybe that choice was to maximize the memory bandwidth they had. Or maybe it was because Sega preferred to work with and could get better deals from a fellow Japanese company like Hitachi. They continued using SuperH for the Dreamcast as well.
adam_smasher wrote:
Sony also chose MIPS for the PlayStation. I can only speculate as to why - maybe to keep the hardware in line with the SGI workstations many devs probably were using, maybe because at that point in time it's where they got the best bang for their buck.
They had at least some prior experience with MIPS:
Sony NEWS. It's hard to say how much this influenced the decision.
Because ARM was this tiny little British company from the people that made the Spectrum and nobody had heard of them. Raise you hand if your not a Pom and you know what an Acorn Computer is... exactly. Back when they made it, It wasn't really that fast nor that powerful. 2nd if I wanted to make and sell a computer I need tools, assemblers, compilers, books, documentation etc The 68K has them in spades just sitting around. Motarola have a road map, there is the 68K10, the FPU, the 68K20, the 68K30 is expected on QX Year Y. X86 has it spades, the 6502 has it spades. I also want to make sure that when I order 2 million I will get 2 million chips, Motorola, will look at an order of 2Million and say "next wednesday". Acorn next Summer
2nd I want to make sure the company will be alive next year so I can still get their chips. There are a lot of factors when choosing a CPU, bang per clock is actually not a big one.
I also thing we are now in a post RISC world, I mean MIPS is dead, SPARC a memory, ARM now has NEON, and JAVA VM byte code instructions so its long past being "RISC". PPC dead. Scalar Super Computers distant memory
It will be interesting to see what Apple do with their new ARM Macs... hopefully just implode. But if the rumors of 64Cores are correct. I imagine they might need to cut them down a bit. However loosing SIMD is going to slay their Video Encoding times...
Oziphantom wrote:
MIPS is dead
I'd say "in terminal condition in the hospital" but not fully dead yet. Who knows, maybe they can manage to carve out a niche away from the bully that is ARM.
Quote:
SPARC [...] PPC
POWER is bizarrely not dead, since IBM just released new POWER9 chips using 14nm FinFETs.
Even more bizarre, PPC is also not so dead—you can still buy them from NXP at unreasonably high prices.
SuperH, Alpha, and PA-RISC all seem pretty dead. SuperH at least has a modern BSD-licensed softcore for it, but that's not very interesting.
Oh, right, I completely forgot about Itanium! The reason that MIPS elected to roll over and die (sigh). Officially being taken off life support.
As new things go, there's Mill and RISC-V, but ... the former looks too weird to predict anything about, and the latter seems to be
deliberately making some bad decisions (
partial counterargument)
Quote:
ARM now has NEON, and JAVA VM byte code instructions so its long past being "RISC".
It's seemed like what we basically learned over the past forty years of ISA design is not that "CISC" or "RISC" was right, but our assumptions as to what useful primitives are was flawed.
POWER is alive as long as IBM wants it so. They're definitely moving in the right direction, just lacking lower cost options - I'd be running a Raptor right now if the cost was more reasonable.
On the PS1, ability to get a decent compiler was part of the decision. They hired Cygnus to do a GCC port, them being the prime team available for such things back then. There's a lot of interesting history there.
When the RISC ideas were first floated, they were really good ones. CPUs were blowing a huge portion of their die space on underused instructions, decoding logic, and microcode, and there weren't just weren't enough transistors left for pipelining. Plus memory access speeds were roughly on par with CPU speed, so memory bandwidth wasn't such a massive bottleneck. There was a reason that everyone except for Intel abandoned CISC architectures in the early 90s: PowerPC and MIPS chips were crushing everyone else on speed and efficiency.
Modern manufacturing processes mean that there's now way more transistors available to CPU designers, and now the heavier decoding costs for CISC are negligible. Combine that with Intel's heroic best-in-the-business efforts to pipeline, reorder, and branch predict the hell out of the execution stream and more compact code representation that make better use of memory bandwidth and suddenly RISC turned out to be a bit of a dead end.
adam_smasher wrote:
Plus memory access speeds were roughly on par with CPU speed, so memory bandwidth wasn't such a massive bottleneck.
Each RISC instruction does less on average than a CISC instruction. RISC can run faster, but they also
have to run faster to get comparable performance. And most of the tricks that can be applied to improve RISC performance work without too much variation in CISC contexts too.
There weren't all that many "internal processing cycles" after the original 8086, 80286, and 68k—it was either unimportant for performance and didn't matter that it was dispatched to microcode, or it was low-hanging fruit for optimization.
It's not just Intel's x86 efforts for comparison here; comparing Dhrystones across different CPUs in any given year is usually fairly comparable, regardless of ISA. As much as it sounded at the time like RISC
should have been a huge improvement, it just doesn't seem to have worked out that way.
Oziphantom wrote:
Raise you hand if your not a Pom and you know what an Acorn Computer is...
This question is slightly broken because I think you kinda have to be a pom to know what pom means? (I had to
look it up.) I do know about Acorn computers, but I also read a lot of British computer mags when I was a kid, plus you're talking to a crowd that has an interest in at least one old computer already.
It's pretty interesting to me that both PS4 and XBone switched to x86 for this generation. Even on the PS3 and XBox 360 they made the tandem choice of PowerPC (...though beyond that choice the two architectures differed quite significantly). Seems like whatever the economic pressures that led to that decision for both Microsoft and Sony had them thinking the same way, at least for these last two.
I was under the impression you live in the Commonwealth? Am I mistaken?
The fact that this is a forum of computer nerds with a taste of the exotic and then still how many have heard of Acorn Archimedes was to further prove the point of how obscure it is.
The main power of RISC is the lower TDP. So when IBM made the CELL processor they have power limits and heat limits forced upon they by the form factor of a console, so they used a PowerPC core to get the power down. 360 having PPC lets them get 6 cores.
I think the switch to X86 was because PC gaming came on strong at the end of the PS3/360 life cycle and it became a power race once more, they didn't have to compete against each other they had to compete against STEAM sales that gave you the same game for $10 and with better graphics. To which I think for the lay customer having a "Will yeah but that is PPC that is what MACs(irony even they switched to x86 and are now going back to RISC ) has back in the day and we all know macs suck at gaming not Intel its not the same my 6 core intel beats you 6 core PPC", vs now "NO its X86 for X86".. the details of ivy, sandy, haswell et cetera are lost on the average customer.
psycopathicteen wrote:
I saw a video on YouTube from computerphile, where one of the guys who invented the ARM cpu said back in 1986 their CPU was cheaper than the 68000 and the 80368 because of the RISC architecture. If this is the case, what took them so long to catch on? Was ARM unavailable in Japan at the time? Did the 32 bit data bus require too much external circuitry?
It's because the RAM requirement, it needs 1 cycle access RAM, this is why the acorn archimedes was so expensive at the time, with 512/1mo (and even 2Mo) of 140ns SRAM for stock machines (8mhz) .
You can concider the CPU cheaper than a 68k, but unfortunately not the whole machine because of the cost of RAM .
Oziphantom wrote:
Because ARM was this tiny little British company from the people that made the Spectrum and nobody had heard of them.
Yeah, pretty much that. I don't know what the situation was in 1986 or whenever, but I'm sure it was a huge deal that 68000 was licensed to many manufacturers. I don't have a lot of 68K machines, but it seems like whenever I look at a board with one, I don't know if I've ever seen an actual Motorola, it's always a Hitachi or something else. If ARM was single-source at the time (I have no idea actually), there better be a huge advantage to designing it into something, because you could potentially be waiting a long time to get the parts you need for production.
Quote:
I also thing we are now in a post RISC world, I mean MIPS is dead, SPARC a memory, ARM now has NEON, and JAVA VM byte code instructions so its long past being "RISC". PPC dead. Scalar Super Computers distant memory
Don't forget all the embedded computers we are surrounded with, it's out of sight, out of mind. Lots of stuff embedded in ASICs and stuff that we use all the time, they're just not "personal computers" of course. I've heard that MIPS pretty much dominates routers and such, 6502 at some point was big in automotive, a while back Chuck Peddle said he was designing a USB 3.0 controller that had something like 6 6502 cores running inside it, and in recent years I'd heard that Renesas is the company that was selling more CPUs than anybody else. I'm a big tech dork and I'd have to say Renesas is a company that I've heard of, but that's about the extent of my familiarity.
Renesas is basically Hitachi, as far as I know. They're big on selling 68k emulation cores in embedded systems.
And yeah, Wikipedia link:
https://en.wikipedia.org/wiki/Renesas_Electronics
Oziphantom wrote:
Because ARM was this tiny little British company from the people that made the Spectrum and nobody had heard of them.
It's irrelevant to the discussion but Sinclair made the Spectrum, not Acorn. Chris Curry did leave Sinclair to become one of the founders of Acorn, but were unquestionably different companies.
memblers wrote:
Don't forget all the embedded computers we are surrounded with, it's out of sight, out of mind.
And not just household things, but also in industrial automation and control, healthcare, the military.. Sometimes all you want is a small, stable, timing critical system where you know how every cycle is spent.
If I was designing a CPU in the late 80's for a game system, I would probably design a 16-bit RISC CPU with 8 registers, a 16-bit data and 16-bit addressing.
Ehh.... even in hindsight, I don't think that's a good compromise. RISC architectures really benefit from having more registers (every time you have to spill to memory hurts them a lot more because they don't have complex addressing modes), and with only 8, x86-32 may be the most limited number of registers in something that's not accumulator-based (6502, 12/14/16-bit PIC, 8051)
(8080/Z80 is its own funny thing, with 4 to 15 registers depending on specific variant (prime registers, IX, IY) and how you count (split 8-bit/fused 16-bit).)
The 68k had 16, as does ARM, x86-64, 24-bit PIC, MSP430. SPARC, MIPS, SuperH, PPC, AVR have 32.
You might take a closer look at the MSP430 ISA.
Even the Super FX had 16 registers. However, it was severely hobbled by its 8-bit instruction size, which may have been due to targeting one instruction per cycle (kinda) while using SNES cartridge memories (outside the instruction cache it was 3-5 depending on speed, and 6-10 would have been even worse). I wonder what it would have cost to go to 16-bit instructions with 16-bit or dual 8-bit memories and just use the bottom bit of SNES addresses as a half-word strobe or chip select (this would also have massively improved pixel buffer throughput). On the other hand, the instruction cache might have felt even smaller unless it were expanded... Also, reading from RAM was horribly expensive because there was no data cache and no way to preload - I think the PLOT circuitry probably took precedence here.
It also had echoes of accumulator-based design, in that it defaulted to using R0 if you didn't specify FROM, TO, or WITH. A number of instructions used 4-bit opcodes and 4-bit operands (to specify a register), and using R0 as both the other operand and the destination, or even just one or the other, could be much faster than using arbitrary registers for both.
...and I just realized that with 8 registers, you could specify both operands and the destination in a single 16-bit instruction and still have 7 bits of opcode left... on the other hand, I've programmed the Super FX before, and I was very glad I had as many registers as I did...
I think having zero page addressing on a RISC cpu is a good idea.
Or at least fast [Rn+dd] addressing, as 68000 and MIPS have. Under classic Mac OS on 68K, each running application got its own 32K of space in RAM for a struct called its "A5 world." This was used to store global variables and the like, as the actual low memory was reserved for the operating system.
To paraphrase Colin Chapman "All CPUs gain weight". So RISC moves towards CISC, CISC never moves towards RISC. The lines get bllury. Is an AVR really RISC or is it just because its 8 bit and therefor not really a x86 thus RISC. It almost seems RISC means "its not a x86/a64"
To get around the RAM speeds and because the PS2 is built like an Amiga, it has Scratch Pad RAM for high-speed access. Sadly ( well actually kind of worked out better in the long run) the PSP didn't get Scratch Pad RAM... nor did the PS3, which would have really helped for those Float<>Int conversions...
Oziphantom wrote:
To paraphrase Colin Chapman "All CPUs gain weight". So RISC moves towards CISC, CISC never moves towards RISC. The lines get bllury. Is an AVR really RISC or is it just because its 8 bit and therefor not really a x86 thus RISC. It almost seems RISC means "its not a x86/a64"
That the 12/14/16-bit PICs' documentation refers to it as a RISC ISA is endlessly amusing to me.
"Not a 68k, not an x86, not a PDP11, not a VAX. Wait, what are those last two?" Now I want to see a RISC machine that runs out of drum memory.
tepples wrote:
Or at least fast [Rn+dd] addressing, as 68000 and MIPS have. Under classic Mac OS on 68K, each running application got its own 32K of space in RAM for a struct called its "A5 world." This was used to store global variables and the like, as the actual low memory was reserved for the operating system.
Yeah, there would be a [5bit + Rn] addressing mode. Or 6bit, if there's enough space for it.
Oziphantom wrote:
Is an AVR really RISC or is it just because its 8 bit and therefor not really a x86 thus RISC. It almost seems RISC means "its not a x86/a64"
The STM8 is
advertised as an 8bit CISC processor despite high resemblance to 6502 with
A,X,Y,SP,PC, & condition registers plus zero page. Compared to the 6502 it feels like the STM8 has an instruction for any operation you can think of plus a dozen you never think to use. It also has 18 different addressing modes, so seems STmicro is able to tell CISC from RISC.
I've always wondered but never asked, is the CISC/RISC denomination pretty much arbitrary? Problem is that it seems there's no real cutoff (or at least I'm not aware of it); people don't seem to know where the 65XX family falls.
I've got to say, seeing all these old, no longer used processor architectures is a bit depressing.
People often like to say that the Saturn/PS1/N64 was the last console generation where the hardware was meaningfully different, and while the graphics are much more similar, I'd probably give this distinction to the Dreamcast/PS2/GameCube/Xbox; it's cool how every single system used a different processor architecture and kind of shows how different the landscape was then, I think. ARM went up through the ranks under it's own merit, but the one thing that's always bothered me is x86; we're still using the same architecture as a now 40 year old processor, although that might not be too fair of a comparison because of how much has been added. I don't know if it's lasted over the years because it's truly a great design, or rather Intel's massive market share and the desire to keep backwards compatibility.
I think "RISC" means no instructions take longer than 2 cycles, and operands have to fit inside the instruction word.
If cycle count is your primary factor here, then ARMv4 as implemented in ARM7TDMI is not RISC because loads take 3 cycles, and multiplies may take a while depending on the size of the numbers. Nor would most modern CPUs, as L1 and especially L2 cache misses have a huge pile of wait states.
Espozo wrote:
I've always wondered but never asked, is the CISC/RISC denomination pretty much arbitrary? Problem is that it seems there's no real cutoff (or at least I'm not aware of it); people don't seem to know where the 65XX family falls.
I've got to say, seeing all these old, no longer used processor architectures is a bit depressing.
People often like to say that the Saturn/PS1/N64 was the last console generation where the hardware was meaningfully different, and while the graphics are much more similar, I'd probably give this distinction to the Dreamcast/PS2/GameCube/Xbox; it's cool how every single system used a different processor architecture and kind of shows how different the landscape was then, I think. ARM went up through the ranks under it's own merit, but the one thing that's always bothered me is x86; we're still using the same architecture as a now 40 year old processor, although that might not be too fair of a comparison because of how much has been added. I don't know if it's lasted over the years because it's truly a great design, or rather Intel's massive market share and the desire to keep backwards compatibility.
X86 is seen as a really bad architecture, the textbook of not what to do(although 68K is also thrown around a bit as well), given the 8008->8080->8086->186->286->386..... nature of it, even back then it was not seen as being good, just intel was able to keep the backwards compatibility which is the main selling point of an IBM compatible, and they were able to make chips with the most bang, they cost a lot but business doesn't really care about cost in the way we do. A $5000 computer is still cheaper than a person, and having a computer that can do twice as much than a $3000 one is still a plus on paper. They were also able to make chips in small predictable increments that had new epochs every few years and one could mostly drop a new chip into their old machine to get more life/omph out of it. I.e putting a P233 into my P133 computer will actually get me more performance, putting a 16Mhz 6502 into my Commodore 64, won't make an Iota of difference.
I think the 65XX range is a CISC design. Its R-M rather than R-R. The Z80 is also CISC even though it is R-R as it has Micro Code. I think not having Micro Code is seen as a big part of being RISC. From memory there is a lot of debate if the 6502s Decode ROM counts as Micro Code, the different length instructions mean that the Chip has logic to handle different length fetches, 1/2/3 byte instructions while RISC has fixed length fetches which puts the instruction decode directly into the opcode.
RISC very approximately means:
- no microcode / microops (unlike x86; note that NEC V20/30 x86 clones had none)
- register-register architecture
** What memory addressing modes are present are often very simple, rarely anything fancier than 6502's ADDRESS,X - instructions are constant length. Any operation that was too big to fit into a single instruction gets broken into multiple real instructions.
** What do I mean by multiple "real" instructions? All PICs are constant-length (12/14/16/24 bit) and have two pipeline stages (fetch / execute). There are several 16-bit-instuction PIC ops that use the contents of the fetch stage as an additional parameter to the execute stage. The second op is explicitly encoded as NOP a that has lots of "don't care" bits in it. In contrast, MIPS has separate real instructions for "load the upper half" and "load the lower half" of a word. - instructions are, in the absence of cache stalls, constant duration. Multiply and division instructions require so much logic that they break this frequently. A single stage unsigned 32-bit multiplier uses drastically more transistors than the entirety of the 6502. A rolled-up one only requires an adder, barrel shifter, and a bunch of AND gates, but then you need 32 cycles for it.
- all registers are equally capable (even if some are reserved for stack pointer and/or program counter). SIMD instructions almost always break this; a 32-bit register is really rather too small for that. Even a 64-bit register is a little cramped. (see SSE vs AVX)
Accumulator-based architectures don't really fit into the RISC vs CISC paradigm.
Oziphantom wrote:
X86 is seen as a really bad architecture, the textbook of not what to do(although 68K is also thrown around a bit as well), given the 8008->8080->8086->186->286->386.....
68000 really? I don't think it's a bad architecture, it's just that the performance is overrated because 68K nuts always go the extra mile optimizing the shit out of everything.
Natural evolution of ISAs is deemed 'ugly' because it wasn't designed to be The Right Thing In The First Place.
Entirely aside from the question of What The Right Thing Is, this is as absurd as complaining about many other failures to predict the future. It's the hardware version of
perfect is the enemy of good [enough].
Backwards compatibility does cause suboptimal things and complexity. CISC is deemed worse than RISC because it (theoretically) "wastes" the transistors that aren't doing things all the time ... but e.g. with early RISC architectures that just meant that they failed to identify other constraints (such as memory bandwidth, cache size, and heat dissipation)
psycopathicteen wrote:
Oziphantom wrote:
X86 is seen as a really bad architecture, the textbook of not what to do(although 68K is also thrown around a bit as well), given the 8008->8080->8086->186->286->386.....
68000 really? I don't think it's a bad architecture, it's just that the performance is overrated because 68K nuts always go the extra mile optimizing the shit out of everything.
I forget the actual number but its something like 60~80% of the die is used on making the perfect balanced orthogonal instruction set. This makes yields lo, number of chips per wafer lo, and hence made the chip large, expensive, power hungry and hot. A dream to program for, you can optimise till the cows come home, but when one looks at actual amount of silicon doing work its not that good. To which the argument is, work out what instructions you actually need, and then reduce the die to only do those things and get cheaper and faster CPUs as a result. Doing A7+ is nice but do you really need it, or would A0+ and A1+ have done for the most part?
To be fair these CPUs where designed in a different era with different problems. RAM was very expensive so having more complex instructions that saved RAM was a huge bonus. Assembly was hand generated( i.e assembled by hand) and there are no macros. So Micro Code is basically a hardware maco. This saved on programming time.
I wonder if the 68000 could've used a PLA instead of a ROM if the developers had time to figure out the patterns. Say you had "add.w $10(d0.w, a2), d5" it could've broke it down like this.
1) Fetch "$10(d0.w, a2)"
2) then do "add.w operand, d5"
Registers selection bits would work independently from the opcode.
Sure you could probably make logic gates to handle the cases. Just its a lot harder to make changes and test with. As each time you add an instruction/fix a thing, you need to change the logic structure and solve it. These days you just put the logic into VHDL and let the computers crunch it in a few hours. Back in 79 not so much. However ROM is pretty compact as its a nice formatted grid. An 8K ROM vs the PLS100N the 8K holds a lot more info for probably similar die size. ROM also has a fixed propagation delay, the optimised gates would have different delay, so you would have to Gate them which then needs another clock .....
I'm thinking there must be some patterns that the 68000 engineers took advantage of. The transistor count was estimated to be 60-70k transistors, and using the fact that there are 16-bit instruction words (with a few exceptions) there would need to be just 1 transistor per instruction. Maybe the register selection fields are ignored by the microcode rom address generator, and maybe there are microcode instructions that specifically pull register fields from the instruction register.
1 transistor per instruction? you clearly have no idea how a CPU works. I mean if you want to OR all 16bits you might be able to get away with 1 transistor, but that is a very very rare case. The CPU has T states, so you it needs to take the instruction, decoded what it needs, then set up the T state counter, then expand itself out to the hundred or so internal signals that it needs to generate from the instruction word. Then step though the counter, while activating the right internal control lines per T state, and exit the T state at the right count.
For example the 6502 from an 8bit instruction code then has a 21x130 decode rom. I.e it takes 21bits(Instruction, Not Instruction(-1), Clock, T counter) in and then generates 130 control signals for parts of the CPU to "do the instruction"
I meant having an individual microcode for every single variation of every instruction would've been impossible.
Quote:
RAM was very expensive so having more complex instructions that saved RAM was a huge bonus.
I don't think it was for saving RAM, the 68k generated code is not know to be compact.
In fact a main feature of the 68k was using slow RAM,but this was true at the start of his conception, and this argument became quickly void when some chips need faster access to the main RAM, like DMA or video chip (if it has no dedicated RAM) ,ST and amiga use a way faster RAM than the 68k needs .
The 68k was a good CPU at the time for workstations, in opposition to his competitors,which were faster but also really way more expensive .
Was there any real why 8086 and 68000 needed 4 cycles to access memory, other than having higher Mhz rating then competitors?
psycopathicteen wrote:
Was there any real why 8086 and 68000 needed 4 cycles to access memory, other than having higher Mhz rating then competitors?
Someone who knows 68K might have a better answer, but my guess is that it was limited by the speed of the most affordable memory of it's time, which seems to be 1979, in proportion to what seemed practical for their CPU logic. About all you can do at that point is make the data bus wider.
It comes down to how they take their clocks and how they are marketed.
So a 6502 takes a a 2 phase non-overlapping clock, this basically gives it a 2mhz clock from a 1Mhz clock. This means its clock drivers are more complicated. The Z80 for example takes a 4Mhz clock, single phase. The 6510 fixed the clock issue and generates the clock internally from a single input clock. So from the 1Mhz clock the 6502 gets 4 events, from the 4Mhz clock the Z80 gets 4 events.
You can't just have something happen from nothing, you need an "event" to trigger it, and to gate the logic steps.
Say LDA #4
Phi2 Hi: You need put the PC on the address bus
Phi2 Lo: You need to read the data from the data bus
Phi1 Hi: decode the opcode
Phi1 Lo: increment the PC
Phi2 Hi: put the PC on the address bus
Phi2 Lo:read the data from the data bus
Phi1 Hi: set the A with the value
Phi1 Lo: increment the PC
Waiting cycles is not for RAM, you can just slow down the clock speed if you want to use slower RAM.
Take the Z80
Clock 0 : put PC on address bus
Clock 1 : increment PC
Clock 2 : read value from data bus
Clock 4 : Do Ram Refresh, decode Instruction
Clock 5 : put PC on address bus
Clock 6 : increment PC
Clock 7 : read value from data bus into A
If you want to do more things at once, you need to have move adders and the like on the die. So if you do it step by step you can use the ALU to increment the PC, as well as do an ADD instruction. Each bit of parallelization requires more die, which drops the number of CPUs per wafer and more gates equals higher chance of a chip failing and hence yes working chips, so that pushes the price up more.
your max clock speed is determined by which step takes the longest amount of time to do. So you might break a step down into smaller steps which makes it take longer but it allows for a higher clock overall.
Sophie Wilson seems to have just given a talk about the design of the ARM
https://hackaday.com/2018/05/08/sophie- ... efficient/
I saw that they talked about the Dhrystone benchmark again. Was that another rigged "see how well a CPU can emulate 68000 instructions 1:1" benchmark? Is it doing math with a bunch of random absolute long addressing?
Dhrystone is a synthetic computing benchmark program developed in 1984 by Reinhold P. Weicker intended to be representative of system (integer) programming. The Dhrystone grew to become representative of general processor (CPU) performance. The name "Dhrystone" is a pun on a different benchmark algorithm called Whetstone.
[...]
The Dhrystone benchmark contains no floating point operations, thus the name is a pun on the then-popular Whetstone benchmark for floating point operations. The output from the benchmark is the number of Dhrystones per second (the number of iterations of the main code loop per second).
[...]
Dhrystone remains remarkably resilient as a simple benchmark, but its continuing value in establishing true performance is questionable. It is easy to use, well documented, fully self-contained, well understood, and can be made to work on almost any system. In particular, it has remained in broad use in the embedded computing world, though the recently developed EEMBC benchmark suite, HINT, Stream, and even Bytemark are widely quoted and used, as well as more specific benchmarks for the memory subsystem (Cachebench), TCP/IP (TTCP), and many others.
[...]
Dhrystone may represent a result more meaningfully than MIPS (million instructions per second) because instruction count comparisons between different instruction sets (e.g. RISC vs. CISC) can confound simple comparisons. For example, the same high-level task may require many more instructions on a RISC machine, but might execute faster than a single CISC instruction.
Dhrystones actually measure integer computational ability. MIPS don't.
I know RISC CPUs need more instructions to load and store memory, but once the memory is loaded into registers, the amount of instructions needed evens out with CISC CPUs.
That is simply not true.
(edit) There are just too many variables for such a flat statement to possibly be true. 8-bit PIC is more-or-less RISC (albeit accumulator-based, so...) but it does a lot less per instruction than any real 32-bit ISA.
VLIW, approximately the CISCiest of CISC things, does tremendously more per instruction during every instruction. By definition. Many modern ISAs include SIMD instructions, which are CISCy, and their corresponding truly-RISCy things are horrifically more verbose than natively supporting SIMD.
Depending on your CISC architecture, your task, whether it's you or a compiler (/how good the compiler is) writing the machine code...
CISC architectures don't just have more powerful addressing modes. They also often have instructions to automate loops, string operations, block memory transfers...even
polynomial evaluation.
And while the number of
instructions might in some cases be comparable, the number of bytes often aren't - variable length encodings sometimes let CISC ISAs use a single byte for a common instruction that on the RISC side would require four.
Here's a neat paper I found when googling, inspired by this topic. In their measurements they found that CISC generally did tend towards rather denser code. Depending on the problem and the architectures in question, you're often looking at half as much.
Comparing ARM to the 68000, I just don't see that many instructions the 68000 has that the ARM can't do. I never saw anything as complex as polynomial equations in one instruction on the 68000.
If you actually want to see how the 68k is competitive with ARMv1 despite being 8 years older (using half as many instructions per dhrystone), you'll need to look at the actual resulting machine code for both machines.
Otherwise... you're just letting your preconceptions blind you.
What do you mean by "preconceptions"? You think I'm not looking up the actual instruction sets?
Fact: ARMv1 and 68k both clocked at 8MHz calculate comparable number of dhrystones per second, despite ARMv1 executing ≈250% as many instructions. Therefore the ARMv1 must be doing 50% the work per instruction.
If you want to see how this is true, you need to actually look at the sequence of instructions that are used in the dhrystone benchmark on these two machines, not just look at the available instructions.
Quote:
If you want to see how this is true, you need to actually look at the sequence of instructions that are used in the dhrystone benchmark on these two machines.
I don't know if that can be found on the internet.
If a cross-compiler targeting each architecture is available on the Internet, and the
source code of Dhrystone is available on the Internet, then the assembly code resulting from compiling Dhrystone can be calculated from these.
adam_smasher wrote:
Here's a neat paper I found when googling, inspired by this topic. In their measurements they found that CISC generally did tend towards rather denser code. Depending on the problem and the architectures in question, you're often looking at half as much.
The problem is that this code density is determined from a C compiled code,for me it's unreliable because it depend on how good the compiler is for each CPU,and can drastically reduce the final code density.
The 68000 would've had a head start with compilers.
psycopathicteen wrote:
The 68000 would've had a head start with compilers.
Yes like all modern CPU, but it's not fair to compare code density with an IA64 for exemple which memory size nowadays are not really a problem, but parallelising process or others modern things are,to summarise the code is less dense but more efficient.
Of course code size in memory is still a problem, particularly instruction cache, L2 cache, and L3 cache.
tepples wrote:
Of course code size in memory is still a problem, particularly instruction cache, L2 cache, and L3 cache.
I agree, but code efficiency is more important than code density here,this is not a bottleneck because the huge main RAM is alway here if needed .
This why modern CPUs are crisc and not cisc only .
You have it backwards: CISC ISAs are as CISCy as ever. RISC chips have gradually turned into CRISC chips, in large part due to code density issues.
Modern CPUs are *insanely* good at reordering, parallelizing, and fusing instructions together once they get inside the pipeline. Nothing - *nothing* - a CPU can do internally comes close, performance-wise, to the cost of hitting RAM. When writing tight code (at least on a first pass) I view branchless operations on registers as effectively *free*; my big concerns are always always always branch prediction failures and cache misses.
Quote:
RISC chips have gradually turned into CRISC chips, in large part due to code density issues.
I do not agree, this is more because the RISC CPUs have a hard time having a high frequency due largely to his reduced pipeline.
I would like to see how a 68000 expert would pull off "add r0, r1, r2, asl #8" in one cycle.
That's literally out of scope? This isn't about "all instructions are faster on ISA X than on ISA Y". It's that the two ISAs provide different instructions and different tasks can be done in different ways. You wouldn't want to do that on 68k unless you were specifically writing an ARM emulator on the 68k.
Comparing a single unique unchanging instruction speed is more or less useless in all contexts. There is literally no context in which that is a useful metric. (This is specifically why MIpS is a useless metric)
8086 IDIV was abominably slow. But you wouldn't claim that the XT ran at 24KIpS as a result, that's just misleading.
There's no way it's just 20% faster at the same megahertz speed.
Your disbelief doesn't change the fact that the recorded Dhrystones per MHz for ARMv1 are only 20% higher than the 68k.
If you want to argue with it, you need to actually dig up the corresponding assembly for the algorithms on the two ISAs. Only then can you make a meaningful comment about the relative performance.
Otherwise... you're just letting your preconceptions blind you.
It might help as a baseline to hand-compile Dhrystone to idiomatic 6502, 65816, LR35902, and Z80 assembly, so they can be compared with GCC and LLVM output for m68k and ARM. Has this been done publicly?
Dhrystones can't measure how fast a collision detection routine written in assembly code is.
Your point being?
A Dhrystone is an arbitrary metric. It is less misleading than pure MIPS. It does not indicate suitability for other tasks other than those measured. It is definitely not a good metric of floating point capabilities, which are explicitly out of scope.
Some things benefit from certain CISC features. (For example: 68k register autoincrement). Sometimes the simplicity gained in early RISC designs outweighs that. Sometimes they don't. But replacing the metric "dhrystones/sec" with the metric "bounding box collisions/sec" doesn't tell you more, it just tells you something different.
The N64 has a very fast RISC CPU. And yet it is completely hamstrung by having high latency and limited bandwidth to main system memory. If you just counted operational cycles without paying attention to memory latency, you'd get a horrifically misleading picture.
The earliest ARM CPUs were only used in Acorn's Archimedes, which had graphical capabilities somewhere between EGA and VGA, and sound capabilities comparable to the Amiga OCS. But when I look at the very few demos that were posted to Pouet and viewable on youtube ... it doesn't even come close to its contemporaries. Is this a fair metric? Who knows, but on top of the other metrics, all I'm hearing is the same thing over and over:
RISC promised us that simpler instruction sets were a key to performance. Other factors have always overwhelmed any resulting gain.
I never said MIPS was accurate.
Then why do you insist that this ARM should be drastically faster than the 68000? It looks like your sole metric for believing so is the relative MIPS of the two.
This 68000 instruction takes 12 cycles:
move.w (a0)+,(a1)+
The ARM needs 2 instructions to perform this:
ldrh r2,(r0),#2
strh r2,(r1),#2
Both of these are 3 cycles, which adds up to 6 cycles, which is still half as many cycles.
There's no point in continuing to argue with you while you're:
1- Using manuals pertaining to newer versions of ARM than what was contemporary
2- Using instructions that didn't exist on those contemporary versions of ARM
3- Using the naive slow way of doing things on 68k
If you want to find out why the relative performance, you must find the corresponding disassemblies. Have you even looked at what the dhrystone metric is measuring?
I sense substantial rectal discomfort again...
One can have fast instructions but as reality has shown, you do lose significant amount of performance simply to needing whole lot more of those instructions to do variety of tasks. Those benchmarks help to judge such things, comparison of getting some sort of tasks done rather than the most basic steps needed to perform those tasks.
I'm kinda curious what the end goal is with the argument. Hypothetically let's say there was a way to definitively prove that ARM or 68k was e.g. 20% better power/cost ratio than the other, in a given year, in a given country, whatever. If you had such an answer, what would you wish to do with it? I feel like we passed the point of "I'm just curious" several pages ago, and the motives at this point are a big mystery to me?
If you're looking to learn how to write efficient code for an old ARM or 68k setup there's probably a much more direct way to learn about it than arguing about which is better. Same deal if you're trying to build something new with old parts, etc.
If you want to decide for yourself whether some decision made by Sega or whomever 30 years ago was objectively right or wrong, I mean, at best that's a "curiosity" question, but even if you think you could definitively compare the CPUs quantitatively, for the actual decision being made there's enough other important economic factors to make such a comparison almost meaningless. Business relationships, factory location, scale of production, logistics of supply, this stuff is way more than enough to skew the actual practical cost of making this choice well beyond whatever the raw computing power difference is worth. We're having such a hard time being quantitative about it now with total hindsight, and it would have been much harder to compare at the time. Other factors were much more important to the decision; the best a CPU maker could demonstrate would just be a reasonably competitive amount of power.
...and if you're not actually being specific about these things and want to argue about very vaguely matters like RISC vs CISC, I don't see what kind of comparison you could possibly hope to make. The difference between the architectures is interesting to talk about, but posing it as an argument about which is quantitatively better? It's been really weird to spectate this.
Kinda the bottom line is just that they're both practical CPU architecture types that have both remained competitive, which is why neither has died off. In the same respect they've both adapted over time as well to remain competitive, which is part of why the definitions for these architecture types are increasingly vague. Part of remaining competitive is about other things besides cost and computing power, too. There are a lot of other practical factors with a CPU, but also even running a business takes a lot more than making a good product, and it's all relevant to the answer to the question of why to use one or the other.
lidnariq wrote:
There's no point in continuing to argue with you while you're:
1- Using manuals pertaining to newer versions of ARM than what was contemporary
2- Using instructions that didn't exist on those contemporary versions of ARM
3- Using the naive slow way of doing things on 68k
If you want to find out why the relative performance, you must find the corresponding disassemblies. Have you even looked at what the dhrystone metric is measuring?
Well then every document for the ARM2 has to be wrong then, because every one I can find says it can post increment by a constant.
How is my example a naive approach? You kept repeating how "some 68000 instruction require 2 ARM instructions and that's a big deal" so I showed you an example of a complex instruction the 68000 has, and the ARM is STILL twice as fast. Now you're saying that complex instructions on the 68000 don't matter because they're slow anyway and that good programmers would make better use of simpler instructions, which ironically was one of the reasons why RISC archecture was invented in the first place.
I think in some case instruction as
Code:
ADD Dn,<ea>
can give the edge to the 68K, still i wonder why you're trying to compare these 2 CPU. The ARM is a real 32 bit CPU released in 1985 with a 32 bit wide BUS when the 68000 is a 1979 CPU with "only" a 16 bit wide BUS (and that is a strong difference).
The reason why I'm comparing the two now is because I doubt that Dhrystones benchmark is accurate. The fact the memory bus is 2x wide and 4x as fast at the same Mhz speed, but only shows up as 20% faster on the Drystones test, makes me skeptical they didn't use a good enough compiler for the ARM. I know "RISC" means that it sometimes has to use more instructions, but I doubt that ARM ever needs 4 times as many instructions as the the 68000.
It might've been an edge case where something needed exactly 15 general purpose registers, and having just 13 general purpose registers brought the ARM down.
Are you questioning the accuracy of the Dhrystone benchmark or the particular compilers used when generating the particular Dhrystone results you're measuring?
If the former: no one's under any illusion that the Dhrystone benchmark is measuring anything besides performance on the Dhrystone benchmark. It's meant to be roughly representative of an "average" program, which might not correspond to the sort of programs you're writing.
If the latter: I'd be surprised if compilers for the two different architectures differed in quality all that much, but anything's possible. It's arguable that part of what makes Dhrystone useful is measuring the quality of compiled code using contemporary compilers, too - theoretical max performance is far less important to most people in most cases than the sort of performance they'll actually tend to get with real software compiled with real compilers. But as many people in this thread have pointed out already, the only way for you to get any answers that'd satisfy you about this is to look at the generated output yourself.
In any case: the question of why you care is still kinda open.
Quote:
In any case: the question of why you care is still kinda open.
I don't actually care that much. I just had to explain because people were misunderstanding what I was saying.