Optimization ideas needed


#1
undo off
    		(
    			with redraw off
    			for i = 1 to total do
    			(
    				l = copy ny
    				l.transform = pos[i]
    				attach olleaves l
    			)
    		)
Hi..  This loop is from a kind of scatter thing where "pos" is an array of matrix3 & "ny" is the object I´m scattering ( attaching to "olleaves" )

This works, but is extremely slow.. Any clues on how to speed it up? ( The help file dried out quiet quickly on optimization, and this forum houses braincapacity beyond my imagination )

Thx in advance… Steffen…


#2

I did a quick test with spheres, and replaced your loop code:

ny.transform = pos[i]
meshop.attach olleaves ny deleteSourceNode:false

Taking out the copy made it 3 times quicker on average :slight_smile:


#3

Hi Dave…

Your answer is just as fast as your solution :wink: Thanks so much!!!
I thought I would be out in some kind of facearray stuff to avoid loading “ny” all the time…

This one is going public so I´m not the only one thankfull :wink:


#4

Great stuf :slight_smile:


#5

Hi Dave…

For what ever crazy reason im afraid your loop doesn´t do the job, when implemented in the rest of my code… On a test I actually get a slowdown from 8 minutes to 10 minutes, but this could be due to something else, donno ??? ( 7.500 iterations ) Anyway these runtimes are crazy compared to the realtime feedback in for instance the scatter utility, so there must be something hidden under the hood…


#6

OK, 7500 sounds like a lot!

Well I don’t have much experience with large datasets, but I would suspect that the problem might lie in the fact that as an object’s size increases, it becomes increasingly expensive to perform operations on it.

Why don’t you try doing, say, 10 iterations of 750 (creating 10 new objects) then a further 10 iterations to attach these 10 new objects together.

It may be that the square root of 7500 might be the best iteration to go for (ie, 86 loops, then, 86 more to attach)

You could store a timestamp after each process in the loop to prove (or disprove) the point

Dave


#7

The trouble with attachments is that the Undo buffer gets filled up really fast - each attach operations accumulates the previous version of the mesh, so it grows exponentially.

Thus, if you do not intend to perform an undo after that, you should better add with undo off () around the code. See “How to Make It Faster?” chapter in the MAXScript Reference - the second example in the “Disable Undo system when possible” (at least in the Max 2008 and 2009 version) shows the difference. On my ancient 800MHz system where the original test was performed about 5 years ago, it made the code 20 times faster…


#8

Hi noontz,
I may have found something that could help you with the speed issue: use the “maxOps.CloneNodes”, in place of “copy”.
Here two sample scripts and comparison.

First version with “copy” -> 107.109 seconds


 (
 	-- init number of scattered objects
 	iNumObjects = 10000
 	
 	-- init the transfom matrix array
 	am3Transform = #()
 	am3Transform[iNumObjects] = matrix3 0
 	
 	-- build the transform matrix array
 	for i = 1 to iNumObjects do
 	( 
 		am3Transform[i] = matrixFromNormal(normalize (random [-1, -1, -1] [1, 1, 1]))
 		am3Transform[i] = translate am3Transform[i] (random [-100, -100, -100] [100, 100, 100])
 	)
 	
 	-- create sample object to scatter
 	oSample = box()
 	
 	-- init the scattered objects array
 	aoScatter = #()
 	aoScatter[iNumObjects] = oSample
 	
 	iTimeStart = timeStamp()
 	
 	-- scatter sample object
 	undo off
 	(
 		with redraw off
 		for i = 1 to iNumObjects do
 		(
 			oTemp = copy oSample
 			oTemp.transform = am3Transform[i]
 			aoScatter[i] = oTemp
 		)
 	)
 	
 	iTimeStop = timeStamp()
 
 	format "Processing took % seconds
" ((iTimeStop - iTimeStart) / 1000.0)
 )
 
 -- Processing took 107.109 seconds

 

Alternate version with “maxOps.CloneNodes” -> 46.828 seconds


 (
 	-- init number of scattered objects
 	iNumObjects = 10000
 	
 	-- init the transfom matrix array
 	am3Transform = #()
 	am3Transform[iNumObjects] = matrix3 0
 	
 	-- build the transform matrix array
 	for i = 1 to iNumObjects do
 	( 
 		am3Transform[i] = matrixFromNormal(normalize (random [-1, -1, -1] [1, 1, 1]))
 		am3Transform[i] = translate am3Transform[i] (random [-100, -100, -100] [100, 100, 100])
 	)
 	
 	-- create sample object to scatter
 	oSample = box()
 	
 	-- init the scattered objects array
 	aoScatter = #()
 	aoScatter[iNumObjects] = oSample
 	
 	iTimeStart = timeStamp()
 	
 	-- scatter sample object
 	
 	-- NOTE: do not use undo off, according to the MAXScript reference it crashes
 	-- Max if an undo is performed after this operation.
 	with redraw off
 	(
 		for i = 1 to iNumObjects do
 		(
 			maxOps.CloneNodes oSample offset:[0,0,0] expandHierarchy:false cloneType:#copy newNodes:&aoTemp
 			aoTemp[1].transform = am3Transform[i]
 			aoScatter[i] = aoTemp[1]
 		)
 	)
 		
 	iTimeStop = timeStamp()
 
 	format "Processing took % seconds
" ((iTimeStop - iTimeStart) / 1000.0)
 )
 
 -- Processing took 46.828 seconds

 

The alternate version required less than half time.
I hope this helps.


#9

Hi all…

Thanks for your efforts. I allways feel a good spirit in this forum!

@ Bobo: You´ve saved my ### before, but as you see in the initial code, I allready went through the help content ( unless I´m using the wrong syntax/scope offcourse ) :wink:

@ Dave & SyncView: I´ll go through both & maybe a combo & see what happens…

I´ll be back with some results & code. ( I´m a little busy ATM so please give me a few days )

Thx again to all three of you!


#10
with redraw off
 		for i = 1 to total do
 		(
 			maxOps.CloneNodes ny offset:[0,0,0] newNodes:&l
 			l = l[1]
 			undo off
 			(
 				l.transform = pos[i]
 				attach olleaves l
 			)
 		)

Well that maxOps dropped the runtime to the half as pointed out! Great! I moved the undo off into the loop, as there´s a known issue with this & maxOps.CloneNodes.

@ Dave: The attach iterations work linear: 10000 iterations takes 100 times longer than 100 iterations, so there doesn´t seem to be any gain on grouping them.


#11

Well, here in London it’s a bank holiday (so no work), and it’s pissing down with rain (so no play).

I’ve also been working on a timestamping / benchmarking struct in the past few weeks, so this seemed like a perfect case study opportunity!

The results are actually really interesting but with the concept and data in front of you, perhaps not that surprising.

The good news is that I’ve managed to speed things up 30 times!

Turns out the slowdown IS related caused by the size of the mesh you’re attaching the mesh to, so when in a loop, this has a rather nasty cumulative effect.

Also, the square root DOES turn out (4 times out of 5 anyway) to be the most efficient number to split the loop up into.

Anyway - I put the results up on my blog, so feel free to take a look.

Cheers,
Dave


#12

Hot damn, I love optimised code. :smiley:

Perhaps we should have a general discussion of what makes things faster/less memory intensive.


#13

The MaxScript Help file is a good start, then; see the “How to make it faster?” topic %)


#14

Oh I’m aware of that, I think it would be interesting if other methods have come up in people’s scripting journeys to expand the manual’s recommendations.


#15

Alright… here’s a few I recall off the top of my head - haven’t dug into my scripts to find old comments on oddball speed-ups.

At one point I needed random point3 integers, lots and lots of them. The problem is, random <point3> <point3> makes the values within those point3s floats.
So…



 startTime = timeStamp()
 for i = 1 to 3000000 do (
 	randP3 = random [0,0,0] [10,10,10]
 	randP3.x = randP3.x as integer
 	randP3.y = randP3.y as integer
 	randP3.z = randP3.z as integer
 )
 format "time: %
" ((timeStamp() - startTime) / 1000.0)
 time: 16.854
 
 startTime = timeStamp()
 for i = 1 to 3000000 do (
 	randP3x = random 0 10
 	randP3y = random 0 10
 	randP3z = random 0 10
 	randP3 = [randP3x,randP3y,randP3z]
 )
 format "time: %
" ((timeStamp() - startTime) / 1000.0)
 time: 12.629
 

Subtle, but I actually had to generate tens of millions; becomes a little less subtle. This is on a slower machine, though :slight_smile:


Here’s another favorite… UI updates. UI updates are slow.
ProgressBars are a very popular method of showing, well, progress. But updating them all the time is slow. Update them only periodically. For example, only if the update would actually show in the progress bar (depending on its width).


 rollout test "test" (
 	progressbar pb_test
 )
 createDialog test
 pb = test.pb_test -- pre-initialize, lest we incur a speedhit for getting it from the rollout all the time.
 
 startTime = timeStamp()
 for i = 1 to 1000000 do (
 	pb.value = (i / 1000000.0) * 100
 )
 format "time: %
" ((timeStamp() - startTime) / 1000.0)
 time: 29.012
 
 
 startTime = timeStamp()
 for i = 1 to 1000000 do (
 	if (mod i 7500 == 0) then ( -- 7500 in this case depending on progressbar width of 133px
 		pb.value = (i / 1000000.0) * 100
 	)
 )
pb.value = 100 -- make sure it does read 100% at the end
 format "time: %
" ((timeStamp() - startTime) / 1000.0)
 time: 2.043
 
 -- lets' get rid of that mult as well
 startTime = timeStamp()
 for i = 1 to 1000000 do (
 	if (mod i 7500 == 0) then ( -- 7500 in this case depending on progressbar width of 133px
 		pb.value = (i / 10000.0)
 	)
 )
pb.value = 100
format "time: %
" ((timeStamp() - startTime) / 1000.0)
 time: 1.943
 

Another one, no code for obvious reasons… f you have an intense loop with a lot of if-tests to validate a value; for example to prevent OOB accesses or so, consider try/catch. Typically try/catch is slower than if-tests, but with enough if-tests, try/catch may just win. Pretty rare, and probably not recommended practice; but if you’re going for speed…


An old favorite… always name the objects you create, as max will otherwise waste time trying to find a unique name for the new object (obviously only do this if you have no fear of creating non-unique names). This assumes cloneNodes cannot be used for whatever reason.


 delete objects
 startTime = timeStamp()
 undo off (
 	for i = 1 to 5000 do (
 		newSphere = geoSphere()
 	)
 )
 format "time: %
" ((timeStamp() - startTime) / 1000.0)
 time: 60.226
 
 delete objects
 startTime = timeStamp()
 undo off (
 	for i = 1 to 5000 do (
 		newSphere = geoSphere name:("mySphere" + i as string)
 	)
 )
 format "time: %
" ((timeStamp() - startTime) / 1000.0)
 [color=Blue]time: 34.149[color=#fffffe]
 

[/color][/color]


#16

Now that’s what i’m talking about! :smiley:


#17

Large meshes can get exponentially slower for certain operations. In a script I wrote to export grids from TIN’s using RayMeshGridIntersect, I resorted to splitting up the TIN before doing intersections, which greatly reduced the running time of the script.

When it comes to copying mesh-data, I immediately thought that perhaps it would be faster to use a Trimesh instead of a node, but I’m not sure how that would affect your transforms.

Someone mentioned if-sentences; sometimes it is not convenient to structure your code optimally to minimize if-sentence overhead (like duplicating a lot of similar, large loops for different outcomes). On one such occation I used a function variable with great success. Prepare a number of predefined functions for each outcome and then assign the appropriate one to a variable before you start the loop. Then your loop can handle different cases without any (unnecessary) if-sentences.


#18

Hi everybody…

Thanks so much for following up! I´m monster busy & won´t have time to implement & test before next week as mentioned to Dave in a PM ( anyone interested in optimization should check his blog ), but please let the ideas come rolling… I´m all ears ( but no brains ATM )


#19

If you haven’t already seen from the link above, I’ve updated my TimeStamper struct. It provides a set of useful methods and properties for all things timing, including:

[ul]
[li]basic start / stop / and then print / alert the result[/li][li]multiple starts then print / alert the average / total[/li][li]differences bewteen different TimeStampers (for comparison tasks)[/li][li]full reporting on all timed tasks, including a breakdown on which iterations were fastest, that can be used straight-off-the-bat in Excel (see graph below)[/li][/ul]You can run multiple timers (each stored in a variable), and the base messaging functions print stuff in English, i.e. “Processing ‘Task Name’ took 7.81 seconds”

Here’s the dump of the tests run above, copied to Excel and a graph made on the columns:


The project home page is here: http://www.keyframesandcode.com/code/development/maxscript/time-stamper/

Ideas and suggestions welcome.

Cheers,
Dave


#20

Thought I’d bump this thread just to thank the previous poster(Dave) for the meshop-attach optimisation… from 215seconds down to 1second… quality! :slight_smile: