Welcome, Guest. Please login or register.
Did you miss your activation email?
May 22, 2013, 06:04:41 AM

Login with username, password and session length
Search:     Advanced search
The look of the forum is still being worked on. Thank you for your patience.
193705 Posts in 16235 Topics by 17051 Members
Latest Member: wheelwinter50

* Home Help Login Register
AppleGeeks.com  |  General  |  Mac-ish Talk  |  Topic: ps3 + Tiger = <3 0 Members and 1 Guest are viewing this topic.
Pages: 1 ... 3 4 [5] 6 Go Down Print
Author Topic: ps3 + Tiger = <3  (Read 10992 times)
James217
Sr. Member
****
Posts: 374



« Reply #60 on: August 20, 2005, 12:09:30 AM »

Look, I don't know where this started, but this isn't good for incoming people, everyone hates grudges. Just leave each other alone before Hawk steps in, and if this keeps up he will. Keep your stuff to PMs if you wanna fight. I'm not taking sides, neither one of you is acting like adults, so keep it out of the forums. Post corrections, ideas, anything intellectual, but personal attacks have no place in an intellectual forum such as this.
Logged

Blarg!
rusty
Guest
« Reply #61 on: August 20, 2005, 03:06:54 AM »

Look, I don't know where this started, but this isn't good for incoming people, everyone hates grudges. Just leave each other alone before Hawk steps in, and if this keeps up he will. Keep your stuff to PMs if you wanna fight. I'm not taking sides, neither one of you is acting like adults, so keep it out of the forums. Post corrections, ideas, anything intellectual, but personal attacks have no place in an intellectual forum such as this.

On reflection my previous reply was a bit harsh.  I don't have a grudge against anybody, and I haven't said anything that was not my honest opinion.  But then....I'm not the one telling people what they should and should not do from behind the safety of a computer.

And I stand by my remark about it being unfair to expect Stigmata to offer anything of substance to a technical debate of this sort, especially when it's a subject matter he has little or no experience in dealing with.  He may take it as insult, but that's not how it was meant.

Now...back to the suitability of the Cell running OSX...anything to add to my previous post on this or are you done with taking the moral high ground?




« Last Edit: August 20, 2005, 03:31:05 AM by rusty » Logged
Shmi
Guest
« Reply #62 on: August 20, 2005, 03:31:29 AM »

even if it is currently impractical/impossible, It'd still be cool to have sooner or later. and yeah, that is true about most people not caring HOW it works, as long as it does.
Logged
rusty
Guest
« Reply #63 on: August 20, 2005, 03:43:57 AM »

even if it is currently impractical/impossible, It'd still be cool to have sooner or later. and yeah, that is true about most people not caring HOW it works, as long as it does.

Amen brother...
Logged
lg_alucard
Guest
« Reply #64 on: August 20, 2005, 07:51:51 PM »

Theoretically IBM could make a supercomputer that would make the top 100 list by using 24 cell processors, whereas all the supercomputers on the list of top 500 contain 600-2000+ processors.

http://en.wikipedia.org/wiki/Cell_processor

The part that really applies to this debate is the section on Software Engineering.

Also, IBM has submitted a linux kernel patch for the cell and is maintianing the GDB, and Sony is maintaining the GCC/Binuntils for the cell.  Now that being said, if I had a cell processor machine there would be nothing holding me back from running say... Gentoo on it, compile times would be holy geez amazing.

Another point that I'd like to make here is this.  OS X relies HEAVILY on open source projects, like KDE (browser), FreeBSD (what darwin is based off of), GCC (compiler), etc.  Now if Apple decided to use the Cell (nope but it'd be cool anyway) they'd basically just have to use the GCC compiler and have IBM develope a kernel patch for the Mach kernel.

I should stop ranting... got my wisdom teeth removed yesterday so now I'm on powerful pain killers... oxycondone...  but still just check out that wikipedia entry.

Quote from: stigmata
The Cell is in-order only; it uses eight processing cores (even dual-core is going to be a big step), seven of which are SIMD-units (none of which use VMX - which would completely invalidate four years' worth of Apple's vector optimisations);

that arguement about why they shouldn't use the cell is completely irrelevant, as Pentium chips lack AltiVec as well.
Logged
rusty
Guest
« Reply #65 on: August 20, 2005, 09:00:19 PM »


Quote
Quote
The Cell is in-order only; it uses eight processing cores (even dual-core is going to be a big step), seven of which are SIMD-units (none of which use VMX - which would completely invalidate four years' worth of Apple's vector optimisations);

that arguement about why they shouldn't use the cell is completely irrelevant, as Pentium chips lack AltiVec as well.

That little comment may have been less to do with the suitability of the Cell, and more to do with Stig's extreme dislike for me and anything that I say Smiley
Logged
lg_alucard
Guest
« Reply #66 on: August 20, 2005, 09:20:21 PM »

oops forgot taht I had cut out this long rand I wrote at the top of all of that which says that I'm a Cell fanboy and am on rusty's side (though that should be obvious) on this one.
Logged
stigmata
Hero Member
*****
Posts: 2223


Why are you looking at my Macintosh?


WWW
« Reply #67 on: August 20, 2005, 10:49:52 PM »

I'm not taking sides, neither one of you is acting like adults

Lucky me, being a minor. =)

Quote
that arguement about why they shouldn't use the cell is completely irrelevant, as Pentium chips lack AltiVec as well.

Not really. Apple's going to end up supporting SSE and VMX concurrently in their releases of OS X. It'd just add further complication to support yet another vector standard, at which point we move into the realm of Extreme Fucking Stupidity.

Quote
Now if Apple decided to use the Cell (nope but it'd be cool anyway) they'd basically just have to use the GCC compiler and have IBM develope a kernel patch for the Mach kernel.

They already do use the GCC compiler, but you've raised a valid point: GCC is completely unsuited to compiling for the Cell processor, since it assumes the minimum number of processors are available. Furthermore, GCC isn't aware of the seven SIMD cores, and so cannot possibly compile code that will run particularly well on them - especially if that code isn't well suited to vector processing, such as AI or otherwise branchy code.

Quote
and more to do with Stig's extreme dislike for me and anything that I say

Half-right. I don't dislike you as a person, despite appearances: it's the crap you're speaking that I object to.

Quote
Nothing against you Stig, but he's the only one providing documentation.

Like that Yonah technical manual of his. Great reading, that.

All the information I've referred to is in the public domain. You'll find the wikipedia entry on the Cell processor that lg_alucard linked to very informative, and this will make a good read too, as will this if you want some comparative dirt on the Xenon.

Quote
As for the suitability of running OSX, the PPE can farm out processes/threads to each of the SPE's dynamicaly as well as taking care of some of the more general OS tasks.

Invalid argument: what the hell is point of farming out a branchy thread to an SPE? You're looking at a performance loss by doing so, and you can't assume that it's a viable solution to assume that everything can be done more quickly by a vector processor faster than a than by a traditional one. I point you to AI, Physics and control software in particular, as well as anything that uses branchy logic. After all, you can't use SIMD processing (Single Instruction, Multiple Data) when the same instruction cannot be used on the same data for the desired result.

Quote
Sony have been using the best SIMD microarchitecture for five years (running at 150MHz) so why re-invent the wheel?

Best in what respect? A microarchitecture that doesn't support double-precision arithmetic certainly isn't as good as double-precision VMX for scientific computing, so I'd be interested in hearing how it's better.

It's also important to note that they have reinvented the wheel with respect to OS X and the PowerPC: by changing away from VMX and OOO processing, Sony and IBM have completely invalidated the optimisations that have already been done for the G4 and G5. That is, not only would Apple now have to reoptimise for a new vector microarchitecture, but they'd also have the substantially alter their instruction scheduling as well - for a good few years, OS X on a Cell would run like sludge, and there's no way to avoid that. The Cell does support VMX, using a single very simple VMX unit on the PPE, but you're looking at AltiVec performance well below the G4 or G5.

Quote
Infact it is exactly this sort of dynamic load distribution that makes the Cell capable of running mulltiple OS's.

Let's also note that the Cell has no built-in multiuser support, but let's not go there.

Quote
But perhaps if you look up the MIPS documention for the R4000 fpu that might be a good place to start

Why bother, when the Cell isn't based on the R4000?
Logged

Quote from: Draliseth
Listen to Stigmata.
Quote from: auric
DON'T listen to stigmata...
lg_alucard
Guest
« Reply #68 on: August 21, 2005, 12:14:12 AM »

SSE1/2/3 are all far inferior to the VMX SIMD.  Also, I just saw this and thought it was interesting...

Quote from: wikipedia
Additionally, IBM has included a VMX (AltiVec) unit in the Cell PPE.

Also, the GCC that I was referring to would be suitable for use with the Cell since I was speaking of the modified GCC specifically for the Cell being maintained by Sony.
Logged
stigmata
Hero Member
*****
Posts: 2223


Why are you looking at my Macintosh?


WWW
« Reply #69 on: August 21, 2005, 01:46:16 AM »

Quote
Additionally, IBM has included a VMX (AltiVec) unit in the Cell PPE.

I know, and I mentioned it in my last post.

Quote
Also, the GCC that I was referring to would be suitable for use with the Cell since I was speaking of the modified GCC specifically for the Cell being maintained by Sony.

Why would they maintain GCC for compatibility with the Cell when they already have a proprietary compiler of their own?
Logged

Quote from: Draliseth
Listen to Stigmata.
Quote from: auric
DON'T listen to stigmata...
rusty
Guest
« Reply #70 on: August 21, 2005, 08:57:30 PM »

Argh...there's no point in quting anything you've said Stig.  You obviously missed the entire point of why many modern processors that use OOE have such heavy BP hardware and the subsequent effect on the instruction pipeline.

Physics is NOT branch heavy...do I have to copy and paste my integration code for a rigid body? Actually...screw it...I'll do that ans you can show me where the branch heavy code is.  Here you go;

Quote
void
CAsRigidBody::Update(
                    real lrDeltaTime )
{

    AsReal            lrOneOverMass;
    CAsV3d          lTotalDisp;
    CAsV3d          lTmpVec;
    CAsMatrix3x3    lOrient;
    CAsV3d          ldRight, ldUp, ldAt, lNorm;
    CAsQuat         lQOrientation;

    // Only update if the body is actually active.
    if (!IsActive())
    {
        // Set forces to zero???
        return;
    }
   

    lrOneOverMass = 1.0f / mrMass;
   
    // ================================================================
    // Effects of the linear force, impulse and any translations
    // done to the body.
    // ================================================================
   
    // Add linear force to the impulse, giving us the total
    // impulse applied this frame.
    lTmpVec = mTotalForce * lrDeltaTime;
    mTotalImpulse += lTmpVec;
   
   
    // Add the effects of the force to the velocity
    mLinearVel += (mTotalImpulse * lrOneOverMass);

    // Use the translation and velocity applied this frame
    // to give us the amount that we've moved this frame.
    lTotalDisp = mTotalTranslate;
    lTotalDisp += (mLinearVel * lrDeltaTime);
   
    // Clear the impulse and force accumulators as well as
    // the translation accumulator.
    mTotalImpulse.Zero();
    mTotalForce.Zero();
    mTotalTranslate.Zero();

    // Update the position of the body
    mPositionOld = mBodyMatrix.mPos;
    mBodyMatrix.mPos += lTotalDisp;
                                   
   
    // ================================================================
    // Effects of the torque and angular impulses applied to the
    // body
    // ================================================================     
    lTmpVec = mTotalTorque * lrDeltaTime;
    mTotalImpulseAngular += lTmpVec; 
   
    mAngularMomentum += mTotalImpulseAngular;
    mAngularVel = TransformPoint(mTensorInvWorld, mAngularMomentum);

    // Clamp crazy angular velocities
    lNorm = mAngularVel;
   if (!IsEffectivelyZero(lNorm.GetLength()))
   {
       AsReal  lrClamp = lNorm.NormaliseGetLength();
       if(lrClamp > krClampAngularVel)
       {
           mAngularVel = lNorm * krClampAngularVel;
            mAngularMomentum *= 0.95f;
       }
    }

    // Update the euler orientation of the body
    mOrientation += (mAngularVel * lrDeltaTime);
    mOrientation.SetX( SgPhys_NormaliseRotation( mOrientation.X()) );
    mOrientation.SetY( SgPhys_NormaliseRotation( mOrientation.Y()) );
    mOrientation.SetZ( SgPhys_NormaliseRotation( mOrientation.Z()) );       
   

    // Clear the angular effects
    mTotalTorque.Zero();
    mTotalImpulseAngular.Zero();
   
    // Rotate the body, around the axis of the angular velocity and by an angle prop. to the mag. of angularVel.
   // The cross product method - cross product gives direction that tip of vector is moving - just add this onto vector.   

    ldAt    = CrossProduct(mBodyMatrix.mAt,    mAngularVel);
    ldUp    = CrossProduct(mBodyMatrix.mUp,    mAngularVel);
    ldRight = CrossProduct(mBodyMatrix.mRight, mAngularVel);
   
    lTmpVec = ldAt * lrDeltaTime;
    mBodyMatrix.mAt -= lTmpVec;
   
    lTmpVec = ldUp * lrDeltaTime;
    mBodyMatrix.mUp -= lTmpVec;
   
    lTmpVec = ldRight * lrDeltaTime;
    mBodyMatrix.mRight -= lTmpVec;
   
    mBodyMatrix.OrthoNormalise();


     // Build the inverse matrix
    mBodyMatrixInv = mBodyMatrix;
    mBodyMatrixInv.Invert();   

    // Build a matrix that takes into account for the center of mass.
    CalcLocalMatrix();
           
       
    // ================================================================
    // Effects of the torque and angular impulses applied to the
    // body
    // ================================================================     

    // Hmmm...should lOrient be the body or local matrix??? Does it make a
    // difference?
    lOrient.Set( mBodyMatrix.mRight, mBodyMatrix.mUp, mBodyMatrix.mAt );

    mTensorInvWorld = Multiply(lOrient.GetTranspose(),  mTensorInv);
    mTensorInvWorld = Multiply( mTensorInvWorld, lOrient); 

    //
    // Update the linear velocity and the speed.
    //
    mLinearVelDir = mLinearVel;
    if (mLinearVel.IsEffectivelyZero())
    {
        mrSpeed = 0.0f;
        mLinearVelDir.Zero();
    }
    else
    {
        mrSpeed = mLinearVelDir.NormaliseGetLength();
    }

    // Reset the velocity constraint.
    mbVelocityConstraintSet = false;
}

There are six possible branches in this one function, which will only be called once a frame for each instance of a rigid body.  In that time the BPB will be completely invalidated by all of the code called before and after this function. So the BP hardware inncurs penalties (extremely heavy ones on modern desktop processors) everytime this function is called.

BP only works on a very locallised view of the code, not the entire executable.  I could go into an A.I. example of why branch prediction hardware is also a big problem if you really want me to.

And yes...the SPE's support double precision floats, but that means nothing.  Floating point numbers may be precise but they are very, VERY innacurate and are order dependent when it comes to minimizing rounding error during operations.

Double precision floats increase this problem because of the nature of the mantissa/exponent normalisation. 

And you should look at the R4000 SPECIFICATIONS for a good exanple of what a good instruction set does and what the VU/SPE vector operations are based on.  Compare it against the AltiVec and SSEx instruction sets and you can easily see a huge difference in functionality.

« Last Edit: August 23, 2005, 08:07:04 PM by rusty » Logged
stigmata
Hero Member
*****
Posts: 2223


Why are you looking at my Macintosh?


WWW
« Reply #71 on: August 22, 2005, 06:10:11 AM »

Quote
Argh...there's no point in quting anything you've said Stig.

Oh, man, that hurt me badly. Excuse me while I cry into my own shoulder, because nobody loves me enough to lend me their own.

Quote
You obviously missed the entire point of why many modern processors that use OOE have such heavy BP hardware and the subsequent effect on the instruction pipeline.

...

Physics is NOT branch heavy...do I have to copy and paste my integration code for a rigid body? Actually...screw it...I'll do that ans you can show me where the branch heavy code is.  Here you go;

I'm aware that many modern processors use out-of-order execution, and why they have such heavy BP hardware. I'm aware that the removal of this hardware substantially reduces the size and complexity of the instruction pipeline, but what I'm not aware of is how this kind of radical simplification of the processor can have any benefit for Apple's professional market. Sure, what you're looking at is this lovely next-gen processor: unfortunately, it's one that was designed for gaming, and what we do not see in gaming machines is a general design approach that allows the processor to be equally fast in a large number of tasks.

Code doesn't have to be branch-heavy to be unsuited to processing by one of the Cell's SPEs. The fact that the Cell has seven of the damn things is only useful if you can break down the processor's load into eight separate threads, a task that's completely useless for AI, control, physics and so on - especially when those tasks all have to share the same L2 cache of the 970 (a paltry 512kb), or even less per SPE. You're looking at a performance loss against similar processors that provide significantly more L2 cache than that, especially the Xenon, POWER5 and even the Pentium.

Even apart from the pathetic amount of cache that the Cell packs, you're looking at a performance hindrance as well for code that does better with OOE: video encoding and decoding, in particular. Again, you're looking at yet another loss for the scientific market: while the Cell sports a massive 256 GFlop predicted performance for single-precision floating-point maths, single-precision floats are nowhere near precise enough for scientific uses. Instead, we have to look to double-precision maths, which is where the 970 really shines. Look at the raw numbers:

Cell
256 GFlop single-precision floating point performance @ 4.0Ghz
~26 GFlop double-precision floating point performance @ 4.0Ghz

PowerPC 970
19GFlop double-precision floating point performance @ 2.5Ghz

Take a look at that again. The 970 has a double-precision floating-point performance 75% that of the entire Cell at 60% of the clockspeed - considerably more efficient, especially considering that's only for a single processor, and not in the most recent incarnation. What's that again? A single, general purpose PowerPC core beats and eight-core gaming processor running at almost twice the clockspeed in scientific benchmarks?

Would you be stupid enough to sacrifice Apple's beachhead in the scientific community just to have a Cell in your box?

Quote
And yes...the SPE's support double precision floats, but that means nothing.  Floating point numbers may be precise but they are very, VERY innacurate and are order dependent when it comes to minimizing rounding error during operations.

My god. Double-precision floating points are accurate enough to model the entire universe and predict climate change, but it's not accurate enough for you? Just what the hell do you do in your free time - calculate infinity to the nth place?

Quote
And you should look at the R4000 SPECIFICATIONS for a good exanple of what a good instruction set does and what the VU/SPE vector operations are based on.  Compare it against the AltiVec and SSEx instruction sets and you can easily see a huge difference in functionality.

I note that the PowerPC 970 is considerably more efficient in dealing with high-precision maths; it really depends on what you're looking for, doesn't it?

Good luck with that gaming machine of yours. I prefer to be able to do more than just that on my own.
Logged

Quote from: Draliseth
Listen to Stigmata.
Quote from: auric
DON'T listen to stigmata...
rusty
Guest
« Reply #72 on: August 22, 2005, 07:01:49 AM »

Quote
I'm aware that many modern processors use out-of-order execution, and why they have such heavy BP hardware. I'm aware that the removal of this hardware substantially reduces the size and complexity of the instruction pipeline, but what I'm not aware of is how this kind of radical simplification of the processor can have any benefit for Apple's professional market. Sure, what you're looking at is this lovely next-gen processor: unfortunately, it's one that was designed for gaming, and what we do not see in gaming machines is a general design approach that allows the processor to be equally fast in a large number of tasks.

No...it was designed for far more than just gaming. This was The Cell consortiums design brief...design a fast, efficient processor that is able to be used for a wide variery of applications. Go take a look at this

Quote
Code doesn't have to be branch-heavy to be unsuited to processing by one of the Cell's SPEs. The fact that the Cell has seven of the damn things is only useful if you can break down the processor's load into eight separate threads, a task that's completely useless for AI, control, physics and so on

Wow...you talk as if you have some sort of experience in this sort of thing.  I've managed to fit rigid body physics, vehicle dynamics and deformable collision data into the 16Kb scratchpad + 4K of LS memory for the VU of the PS2.  So...how is 256K not going to be enough for me or anybody else?

Face it...you don't have a clue on what you're taking about.  You don't know ANYTHING useful about programming/software engineering, never mind something that demands a wide range of expert knowledge in many disciplines such as games programming. 


Quote
- especially when those tasks all have to share the same L2 cache of the 970 (a paltry 512kb), or even less per SPE. You're looking at a performance loss against similar processors that provide significantly more L2 cache than that, especially the Xenon, POWER5 and even the Pentium.

Ahh...you see...this is where the DMAC comes into play. I explaind it's function before, or did you skip past this part?  You HAVE to process the data in small chuncks because at the end of the day, the bigger the data chunk; the more likely your are to suffer from system bus lock-ups, delaying the operation of the other cores.

Again, this is something you would know if you had any experience in writting software on an architecture that requires any form of paralllism.


Quote
you're looking at a performance hindrance as well for code that does better with OOE: video encoding and decoding, in particular.

Again this is mere suposition on your part, much like everything else you've said. How do you know it's better Stig? Where are the figures? The reasons as to why OOE is better?  Sell it to us all son...none of us are convinved. 


Quote
Again, you're looking at yet another loss for the scientific market: while the Cell sports a massive 256 GFlop predicted performance for single-precision floating-point maths, single-precision floats are nowhere near precise enough for scientific uses. Instead, we have to look to double-precision maths, which is where the 970 really shines. Look at the raw numbers:

Cell
256 GFlop single-precision floating point performance @ 4.0Ghz
~26 GFlop double-precision floating point performance @ 4.0Ghz

PowerPC 970
19GFlop double-precision floating point performance @ 2.5Ghz

Take a look at that again. The 970 has a double-precision floating-point performance 75% that of the entire Cell at 60% of the clockspeed - considerably more efficient, especially considering that's only for a single processor, and not in the most recent incarnation. What's that again? A single, general purpose PowerPC core beats and eight-core gaming processor running at almost twice the clockspeed in scientific benchmarks?


Peak SIMD performance for a 1.8Ghz 970 is apparently only 14.4Gflops using the TWO alitvec cores inside of the processor. Where are you getting your numbers from?  Mine are from here

Besides...if we take your own viewpoint from previous threads on this, all that really matters is that the Cell is faster on double-precision math overall compared to the 970.  Wasn't it you that said that people don't care HOW it's faster, only that it is?

I'm not saying any more on this.  Really.  I'm arguing about software engineering practices with somebody who has NO experience in this sort of field, and well...it's a waste of effort on my part.

As for the Cell v 970 v AS-32 debate....that's another subject matter and for another thread.  You bowed out of the thread I started on Parallel Proceesing CPU's, so I'll wait for you to start a thread of your own.

« Last Edit: August 22, 2005, 08:33:11 AM by rusty » Logged
stigmata
Hero Member
*****
Posts: 2223


Why are you looking at my Macintosh?


WWW
« Reply #73 on: August 23, 2005, 08:42:49 AM »

Quote
Wow...you talk as if you have some sort of experience in this sort of thing.

Wow, so do you. The wonderful thing about the internet is not what job you've held in the past, but what you can say about reality - we've been here before. And no, I don't want to see any more pictures of your desk.

Quote
've managed to fit rigid body physics, vehicle dynamics and deformable collision data into the 16Kb scratchpad + 4K of LS memory for the VU of the PS2.  So...how is 256K not going to be enough for me or anybody else?

I was under the impression that the purpose of "next gen" processors was to support "next gen" features, like soft-body physics and considerably more realistic physics. Now, forgive me, but it's been a while since I sat down at a PS2 and said to myself, "Jesus! That's some amazing soft-body physical modelling right there!"

The point is that the Cell is going to be directly competing against other processors of the same generation, and the single lead that it has - potentially superior multithreading ability, depending on circumstance - is heavily dependent on the code's ability to be processed in parallel. In cases like physics modelling, where that's not possible, you're left with a physics model of unprecedented complexity sharing the same 512kb of cache with the AI and control code, and not the 256kb of one of the SPE's, since we've already agreed that you cannot relegate physics code to a SIMD-only core without an obscene performance loss.

So that's 512kb of cache you have to work with, shared between those three core tasks in the game. Compare that to the 2Mb you find in any other modern, gaming-oriented processor, and you see why I have an issue here: the Cell cannot take advantage of either it's SIMD capabilities or it's 256kb of cache per SPE. You are, in short, stuck with the same crappy L2 you have on a G5.

Quote
Face it...you don't have a clue on what you're taking about.  You don't know ANYTHING useful about programming/software engineering, never mind something that demands a wide range of expert knowledge in many disciplines such as games programming.

There was a time when I would have cared when someone said something like that. But when it's coming from someone that's so unaware of the specs of the proprietary hardware they code for; someone that thinks they've proven their profession by taking a photo of someone's desk and posting it; someone that claims they know the most intimate technical details of a processor and then links to the specs for another, entirely different architecture as evidence; someone that's unaware of the limitations of SIMD processing... then I simply don't give a shit.

Quote
Peak SIMD performance for a 1.8Ghz 970 is apparently only 14.4Gflops using the TWO alitvec cores inside of the processor. Where are you getting your numbers from?  Mine are from here

For a professional software engineer, I'm surprised you can't tell the difference between 1.8Ghz and 2.5Ghz. Here's a hint: one of them is much bigger.

Now, looking at what you said, I have two issues.

Firstly, no PowerPC, of any make, has ever included two AltiVec cores per processor. I think you might mean two FPUs per core, which the G5 does have but are not the same thing. Link, and quote:

Quote
Multiple pipelined execution units, branch prediction, and a SIMD, or vector processing (Altivec) unit

I'm sorry, you were saying something about not knowing what you were talking about? If you seriously can't tell the two apart, I'm very worried for you.

Now, not to get sidetracked, my second issue with the above quote of yours: if you take a 1.8Ghz G5 that clocks 14.4 GFlops, then clearly the floating-point performance of the processor is a function of it's clockspeed (do this at home, kids: divide 14.4 by 1.8 to get Cool. Now, if you extrapolate out to a clockspeed of 2.5Ghz (now, multiply 8 by 2.5) to get a predicted performance of exactly 20 GFlops.

What's the real performance of the 970 at 2.5Ghz? The machine is generally conceded to reach about 19-20 GFlops using VMX, per processor - the first link I have to hand is this, since I don't have the original one to hand at the moment. The progression is linear, and I honestly have no idea how someone that's apparently studied physics and maths for long enough to create physics engines professionally could fail to spot a relationship as simple as that. It's a linear progression, for god's sake...

Quote
Again this is mere suposition on your part, much like everything else you've said. How do you know it's better Stig? Where are the figures? The reasons as to why OOE is better?  Sell it to us all son...none of us are convinved. 

First, lay off the intentially offensive and demeaning language.

Secondly, if you've ever dealt with video in your life you'd realise that it's imposssible to stream data to a processor, and have it constantly processing. I needn't go into much detail, surely, but OOE was a solution developed to tackle the type of stalls that occur when the processor must wait for extra data before it can execute an instruction, typically as a result of the disparity between the FSB and processor's clocks. For a processor like the Cell, you face the exact same difficulties - lessened marginally by a shorter instruction pipeline, but worsened by the ridiculous clockspeed at which the Cell operates. An in-order-only processor inevitably suffers from considerable stalling as it awaits new data from the bus, especially an eight core processor - that's one metric shittonne of data you need to be constantly streaming, and it's simply not possible to run at full capacity without OOE.

Quote
Besides...if we take your own viewpoint from previous threads on this, all that really matters is that the Cell is faster on double-precision math overall compared to the 970.  Wasn't it you that said that people don't care HOW it's faster, only that it is?

But it isn't faster than a single 2.7Ghz G5 following the same progression as above... so, yes, I'll stick by that.

Quote
I'm not saying any more on this.  Really.  I'm arguing about software engineering practices with somebody who has NO experience in this sort of field, and well...it's a waste of effort on my part.

This coming from someone that can't even recognise a linear progression? I'm sorry, but that just doesn't sting.

Quote
You bowed out of the thread I started on Parallel Proceesing CPU's, so I'll wait for you to start a thread of your own.

Why bother? You'll just keep trying to tell me that you can vectorise Physics engines, after all.





An interesting sidenote: Where did you get that physics code from further up the page? Is it GPL'd?
« Last Edit: August 23, 2005, 08:44:49 AM by stigmata » Logged

Quote from: Draliseth
Listen to Stigmata.
Quote from: auric
DON'T listen to stigmata...
Liratheal
Full Member
***
Posts: 176

You rev your engine too much to have a penis.


« Reply #74 on: August 23, 2005, 11:39:09 AM »

Mm'kay. I got to page three of this topic and got bored with reading Stigmata and Rustys discussion, riviting and enthraling as it is.

I think you're both underestimating one thing. The sheer cool factor of an octo-core machine.

Example.

"I have an AMD FX 55 etc etc"
"I have an Octo-thread machine etc etc"
"..."

See the ownage in that statement?

On topic:

Eh. The PS3 with OSX would just be a glorified PB, without the mobility.

Although. It would be pretty.
Logged

Quote from: Jack
People who use Access have big penises
<br />
Quote from: souless_samurai
due to the market being saturated with old people who just wont die...
Pages: 1 ... 3 4 [5] 6 Go Up Print 
AppleGeeks.com  |  General  |  Mac-ish Talk  |  Topic: ps3 + Tiger = <3
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.096 seconds with 20 queries.