I’ll test all these potential optimizations when I get back home from work. I appreciate all the tips so far!
Yeah, it works pretty well with just the height criterion. Here is a test the guy I helped made:
I’ll test all these potential optimizations when I get back home from work. I appreciate all the tips so far!
Yeah, it works pretty well with just the height criterion. Here is a test the guy I helped made:
Would you try both functions if you get some time?
It would be great to have your opinion and findings.
slowing down for in-place method doesn’t make sense for me. probably it’s some issue specific to max .net sdk.
if removing unnecessary multiplications and divisions don’t really affect the performance that means 99% of time the method is doing something different. but i don’t see anything unless taking a value from one piece of memory and putting to another. in c++ sdk it’s almost nothing and doesn’t take any time.
I tried both versions and I get same result as PolyTools3D - the in-place version is about 60% slower for me.
I believe this is because that no value data is actually stored on the IPoint3:
bFlags = (dotnetClass "System.Reflection.BindingFlags")
allFields = dotnet.CombineEnums bFlags.Public bFlags.NonPublic bFlags.Instance
fields = ((((dotnetclass "Autodesk.Max.GlobalInterface").Instance.Point3.Create 1 1 1).GetType()).GetFields allFields)
for p in fields do print (p.ToString())
"Point3* unmanaged_"
"Boolean owning_"
The only data is a pointer to a native Point3. Each access requires dereferencing a native pointer, which I assume is slower from a c# assembly.
Nevertheless, the method runs very fast (1-2 ms for 40000 vertices on my machine), and if you need it any faster than this, you might as well start over in C++.
This runs about 18 times faster (yes, this is still c# code )
//define this somewhere in your class
struct Point3
{
public float X;
public float Y;
public float Z;
}
Point3* mapP = (Point3*)(mapVerts[0] as Autodesk.Max.Wrappers.Point3).INativeObject__Handle;
Point3* meshP = (Point3*)(mesh.GetVert(0) as Autodesk.Max.Wrappers.Point3).INativeObject__Handle;
for (int i = 0; i < numMapVerts; i++)
{
mapP->X = 1;
if (meshP->Z > HeightThreshold)
{
mapP->Y = Math.Max(0, mapP->Y - FadeInValue);
}
else
{
mapP->Y = Math.Min(1, mapP->Y + FadeInValue);
}
mapP->Z = mapP->Y;
mapP++;
meshP++;
}
for 1000 iterations:
previous fastest version: ~1800ms
this version: ~100ms
It requires you to reference Autodesk.Max.Wrappers in your project, and compile with unsafe code flag.
I wouldn’t recommend doing this in production code, I don’t know what issues could occur.
The C# takes 3ms to process a 100K mesh (3000ms for 1000 calls)
How much is “almost nothing” in C++? Just curious to know how much faster is the native code.
this is almost the same what c++ code has to do. excepting that mesh verts and map verts are already arrays of Point3.
i don’t think c++ code can do it much faster. only maybe getting the mesh might be a little faster.
another thing is update the render mesh after you set this colors and force rebuild of a cached one. and i’m not sure it’s possible to rebuild only colors. so the update will take much more time than a calculation.
I was actually expecting the C++ version to be 3-5 times faster, but 20 times faster is still a huge difference.
Although this function involves very common operations, I supposed the difference can be much larger than that in other cases.
Well, I guess a more flexible language comes with its price. What a shame that we can’t maintain and compile a single C++ version.
Thank you Rotem and Denis for your input, and thank you Håvard for sharing this.
c++ version gives me 0.00027 sec for 100K mesh… but my machine is pretty old.
as i said the function doesn’t really do anything else than get and set values in memory.
def_visible_primitive(getMeshMapVerts, "getMeshMapVerts");
Value* getMeshMapVerts_cf(Value **arg_list, int count)
{
check_arg_count_with_keys(getMeshMapVerts, 1, count);
if (!is_mesh(arg_list[0])) return &undefined;
int vnum = 0;
Mesh* mesh = arg_list[0]->to_mesh();
int channel = key_arg_or_default(channel, Integer::intern(1))->to_int();
if (mesh->mapSupport(channel))
{
vnum = min(mesh->numVerts, mesh->getNumMapVerts(channel));
UVVert *mv = mesh->mapVerts(channel);
for (int k=0; k < vnum; k++)
{
if (mesh->verts[k].z) { }
float y = min(1.0f, max(0.0f, mv[k].y));
mv[k] = Point3(y,y,y);
}
}
return Integer::intern(vnum);
}
it’s not exactly the same but does do the same operations most native for c++ sdk way
Hmm I thought I would try to optimize it with SIMD for the fun of it, but I realize now it’s really not an ideal candidate due to the minimal amount of actions performed inside the loop and the AoS layout of the Point3’s.
Wouldn`t this an ideal candidate for parallelization through TPL?
http://msdn.microsoft.com/en-us/library/dd460717(v=vs.110).aspx
Probably only for meshes with huge amounts of vertices.
I would bet that the overhead of parallelization would be almost as high or higher than the runtime of the single-threaded function, though it’s easy enough to test.
With 40k verts using parallel.for I got ~23 seconds on 10k iterations, and ~37 seconds using the single threaded version. So that is pretty cool.