| || |
|Architosh News Reports|
|Architosh Staff ([email protected])|
The G4 processor and QuickDraw 3D: Speed Shootout
We at Architosh just received our first G4 350 MHz machine. Fearing that such a low megahertz number might mean slower speeds, the first thing we did was test it against a G3 and Pentium III machine to get a feel for the G4's speed potential. Our tests were 'stopwatch' so they are not deadly accurate, but good enough to show indications.
How Fast Will VectorWorks and QuickDraw 3D Rendering Speed up with a G4?
Here's a quick disclaimer. This chart is for one test only. It represents just a "quickly test" to get a feel for the G4's floating-point potential and its ability to push pixels around. It's also very good news for those of you out there thinking that a 350 MHz G4 might not be worth getting simply because the megahertz number is low.
Also, below we have technical explanations from our contacts at Diehl Graphsoft about QuickDraw 3D and OpenGL rendering and what is going on mathematically with the chip.
G4 350MHz versus G3 233MHz versus Pentium III 450MHz
Shorter bars are better. Time in seconds it took to process a VectorWorks rendering in Final Hidden Line mode. QuickDraw scores are noted below. The file was a 750K file, VectorWorks 8.1 and 8.5.1 on Mac and Windows. Notes on the machines: PB G3/233MHz, 64MB RAM, Mac OS 8.6, VM-on, 30MB dedicated to VW's.; Gateway 450 PentiumIII 450MHz, 128MB RAM, Windows 98SE, dynamic memory assignments to VW.; Power Macintosh G4/350MHz, 128MB RAM, Mac OS 8.6, VM-on, 30MB dedicated to VW's.
G4 up to Twice as Fast!!
Using a large urban model (cityscape) with numerous blocks of buildings, made up of miscellaneous high towers, we ran a Final Hidden Line mode rendering to get a good test of the machine's floating-point power. Then we did a "QuickDraw 3D Interactive" mode rendering of the same image from the same exact point in space (300' above plane-x looking at 200' above plane-x across the cityscape).
Both of these rendering tests did substantially better on the G4 machine than the PowerPC G3 and Pentium III. Most importantly, the G4 outpaced the Pentium III by a factor of 2x, being up to twice as fast on the Hidden Line Mode rendering tests (scores shown above) despite the 100MHz difference. On the QuickDraw 3D rendering test the scores for the G3 and Pentium III were very close at approximately 8 1/2 seconds (G3) and 7 seconds (Pentium III). On the G4 the tests took approximately 4 seconds. With such short time intervals this QuickDraw 3D rendering test was harder to stopwatch accurately. So in all cases we ran the tests a minimum of two times to get an average. The G4 was noticeably faster at QuickDraw 3D rendering in interactive mode as well, with the computer being able to zoom through the cityscape in flybys with the city fully rendered. This was simply not possible under the G3 or Pentium III which would default to wire-frame during this process.(note, this was with the default settings in VectorWorks)
What Makes the G4 So Fast?
This is the golden question for PC users who just can't come to grips with why a slower processor (in megahertz) can be a faster processor in real life tests. Chips are complicated items and there are many reasons why the G4 was twice as fast at this specific VectorWorks rendering test than a PIII 450 MHz chip in a Gateway machine. One of the key reasons is explained in the diagram below (taken from Apple's brochure on G4). The G4 may chew the fat slower but its mouth is up to four times as big. This is true of all G4's compared to Pentium III's, regardless of the other factors affecting performance in rendering.
Another way of explaining how megahertz doesn't equal processor speed to PC and Mac users that don't understand it is through a 'roller coaster' analogy. After all, almost everyone has been to a theme park and stood in long lines. Take this situation as an analogy of the lines of computer code waiting in line for the processor to process.
The Roller Coaster Analogy for Chip Performance: G4 vs PIII
In this analogy the chip (microprocessor) is the roller coaster ride itself. The code (the little lines of zeros and ones) are the people in line to go on the roller coaster ride. Pretend there are two identical roller coasters, each with identical turns, bends, and ups and downs -- the exact same track. Now imagine that the roller coaster ride named 'G4' actually moves the train carrying the people (which remember are the code) a bit slower than the roller coaster named 'Pentium III'.
Question: If there are 1000 people in each line for each coaster which roller coaster will finish first giving everyone a ride? (bear with me now, this is a simplification)
Answer: It depends on the size of the trains on each coaster. If the Pentium and G4 roller coaster each have a train with a maximum capacity of holding 25 people then the Pentium III will finish first -- presuming each train is filled to capacity each turn on both coasters, all other factors equal. But if the G4 roller coaster has a train with a maximum capacity of 100 people, then the G4 will finish first, even though the Pentium roller coaster is actually faster at completing one ride/turn (trip -- all other factors equal).
Further Explanations. It also depends on if the trains are filled to capacity each trip. As the diagram above demonstrates the G4 has a capacity to carry four times the amount of code through the processor in one cycle (in our case the cycle speed is 350MHz). This partly explains why the G4 beat the pants off the Pentium III, despite the Pentium III's megahertz advantage (despite how fast its train is in the analogy above). But this is only one part of the story.
Other Factors - Refining the Roller Coaster Analogy
Another key reason is the BUS speed between the cache on the chip and the size of the chip's cache systems. On processors cache is a small amount of memory that temporarily holds code for reuse and the cache connects to the processor's execution units via a BUS (an electronic roadway). On the G4 both the caches are bigger (holding more code) than on the Pentium III [Editor's note 1, see below]. One way of thinking about this is to imagine a tram ride that goes between the end of the roller coaster ride straight to the beginning of the line for the roller coaster. The tram ride is only so fast (in even multiples of the roller coaster ride) getting people back to the line and has a limited capacity.
In real life computing many bits of code are reused in the processing cycle. The chips' Level 1 and Level 2 cache temporarily hold code for processing or re-processing. It's like those 1000 people in line are wearing two different color shirts (blue and white) with everyone in a blue shirt needing to go on the ride multiple times. Again, imagine the race between both roller coasters, 1000 people (blue and white shirts in equal amounts) needing to complete a process wherein 'blue shirted' people need to go on the ride exactly two times. They are ordered in line the same way for each coaster.
Question 2: Who will finish giving everyone a ride the proper amount of times if the "tram" on the G4 and Pentium III roller coasters run at the same speed rate (1/2 that of the roller coaster itself) but the G4 has a tram with twice the capacity? [Editor's note 1, see below]
Answer 2: In this particular example the G4 (and the G3 as well) have bigger caches than the Pentiums and hence help to make the processing go quicker, despite the slower MHz speeds, because they get more 'blue shirted' people back in line faster.
In real life, when the cache (tram) is filled to capacity, code has to return all the way back to main RAM (random access memory). It's like saying, "Sorry buddy, you have to take that slow Greyhound bus back to camp and come back in the morning." The electronic road to RAM is on the main BUS -- of which the PC's have tended to be a bit faster, but only a little. This BUS's speed too must be divisible in even multiples of the processors speed. In the G4 with the 350 MHz case, the BUS at 100 MHz would divide evenly into the processor 3.5 times -- not quite even but close enough. This technically means that for that Greyhound bus to get back to the park where the roller coaster is in time it takes to make that journey the roller coaster will make 3.5 trips (3.5 complete cycles), which is why memory is important in processing speeds, keeping those trams fat with blueshirted people! (or precisely, keeping those executions units fat with code to process).
OK, enough of this analogy; it is definitely an oversimplification ... but, I believe it suffices in communicating one very important thing about microprocessors: megahertz isn't the only big determinant in a chips' performance. Like a roller coaster giving rides to lines of code, the size of the coaster train, the speed at which code can get back in line and the amount of code that gets sent packing back to the main RAM, instead of getting shuttled back to the start of the line, all determine greatly the effective speed of a processor. For those who don't yet understand things like backside cache, Level 1 cache and BUS speeds, and how they all effect chip performance, use the 'roller coaster' analogy as a way of simplifying the way a microprocessor works -- giving lines of code a ride through its processing units (the things that do the math).
Well, I hope you enjoyed this analogy, but here are some more technical explanations, from someone who knows a great deal.
Technical Explanations for G4 Speeds
According to Sean Flaherty, chief technology officer at Diehl Graphsoft Inc., the rendering process in the VectorWorks rendering examples explained above (Final Hidden Line and QuickDraw 3D) fall roughly into three major steps: (1) "preparation of the geometry for rendering (filling the pipeline), (2) processing of the geometry (sorting, clipping, etc.), (3) conversion of the results back into VectorWorks objects."
As one might guess, much of the time in the Final Hidden Line mode rendering takes place in the middle step, ie, processing the geometry. In a hidden line mode diagram the mathematics are vector-based making floating-point performance crucial. This is where the G4 processor and its 128-bit vector unit and AltiVec instruction set can make a big difference. NOTE: VectorWorks does not yet take advantage of AltiVec instructions in the G4.
QuickDraw 3D, as Sean explains it, is different as it is a raster rendering engine, "so it spends much of its time manipulating big images in memory. This is where the AltiVec optimizations come into play, allowing greater bandwidth with image processing engines like QuickDraw 3D." Sean mentioned that this is an area where AltiVec can provide greater benefits than in its vector rendering because it allows parallel 'integer' operations.
Another key area in the G4 versus Pentium III debate is memory managers. As larger models are created in VectorWorks and then rendered at high resolutions with textures and so forth, efficient use of memory becomes very important. You end up dipping into virtual memory and on the Mac platform efficient access to virtual memory is not as good as on Windows, plus the user must manually adjust memory for VectorWorks on the Mac to give it sufficient memory to handle large models. As Sean mentioned, Mac OS X will make a "huge leap forward for Mac users" with its modern memory management and virtual memory. At the present, however, Windows 98 (and NT) provide better memory management aiding in the process or rendering very large files.
Lastly, Sean said one very interesting thing about the G4 processor. And that is that in the software development community the "G3/G4 chip architecture is known for its incredible math performance, both integer and floating-point." He added that he finds this ironic because the Mac still carries the stigma of not being a 'serious' scientific platform. Most of you doing serious 3D and scientific imaging on the Mac probably could agree with that assessment.
Well, the purpose of testing the G4 in this limited way to get an idea of how fast its floating-point performance was over the G3 and Pentium III processors -- using a real world example in VectorWorks. Please bare in mind, however, that there are many other factors involved in determining rendering performance, getting great performance out of any CAD machine on any platform, etc.. All this test confirms is that when it comes to doing math, the G4 processor appears to be clearly capable of living up to its intended reputation as the first 'supercomputer' on a chip (even if the 'supercomputer' status it refers to is based on outdated 1996 US Government standards).
Also bear in mind that we did not test an Athlon chip-based PC (as we don't have access to one, yet). From our look at SPEC-95 processor scores the new AMD Athlon processor may be one wicked fast chip (as it clearly beats the Pentium III's in numerous tests, as published in magazines). It may be, that when it comes to VectorWorks rendering, the AMD Athlon may be the faster ride!
For questions and comments please email us at: [email protected]
1. Technically, the Pentium III's have two different versions wherein the cache operates at processor speed and 1/2 processor speed. Those that operate at full speed have 256KB Advanced Transfer Cache (on-die full speed Level 2 cache). The 0.25 micron processor technology used on the 450 MHz PIII's do not have this, but instead use half-speed 516KB Level 2 cache. Hence, the Gateway 450 PIII machine we tested complies with our little analogy and technical descriptions. Not all Pentium III's are created equal -- and I'm not talking about Coppermine.
Other Architosh News Reports and Related Material
|Feedback||Back to Architosh News|
| || |