Make it faster: reading binary data


#1

Hi,
I’m working on a script to read srtm data and use it to build terrains in 3dsMax. I’m using .hgt files which are readily available online. A sample can be downloaded here. Documentation is here.
It works great, but it’s a bit slow. One of the reasons is the fact that the data consists of 16 bit integers. This means I’ve got to read two bytes from the file and turn those into a 16 bit integer to get to my data. Also, the data is provided in big-endian while I need to read it little-endian by reversing the order of the bytes.

So here’s what I’ve got.

function fn_getHgtDataSample hgtFilePath dataCount:160000 =
  (
  	local hgtBinStream = (dotNetClass "System.io.file").open hgtFilePath (dotnetClass "system.io.filemode").open --open the hgt file as a filestream
  	local hgtReadMethod = (hgtBinStream.GetType()).GetMethod "Read" --using the normal .read method didn't work: http://forums.cgsociety.org/showthread.php?f=98&t=1108923
  		
  	local theBuffer = dotnetobject "System.Byte[]" (dataCount * 2) --set up a buffer to get the requested amount of data. Each data-item consists of 2 bytes
  	local isLittleEndian = (dotNetClass "System.BitConverter").IsLittleEndian --the hgt data is big-endian. We need to reverse the byte order to get good data
  	local toInt16 = (dotNetClass "System.BitConverter").ToInt16 --this method will convert two bytes (8 bit each) to a 16 bit signed integer
  	
  	hgtReadMethod.invoke hgtBinStream #(theBuffer, 0, theBuffer.Length) --get an entire row of data
  	if isLittleEndian do theBuffer.Reverse theBuffer --going from big-endian to little-endian
  	local arrData = for i = 0 to theBuffer.Length-1 by 2 collect ToInt16 theBuffer i --loop over the streamed bytes and collect int16's
  		
  	hgtBinStream.close()
  	arrData
  )
  (
  	clearListener()
  	gc()
  	local st = timeStamp()
  	local mem = heapFree
  	local hgtPath = @"some\path	o	he\file\here\N34W119.hgt"
  	local theData = fn_getHgtDataSample hgtPath dataCount:160000 
  	format "%" (msg as string)
  	format "Time: % ms, memory: %
" (timestamp()-st) (mem-heapfree)
  )
  

Now I’m streaming the data. An alternative I’ve found is to use a binaryreader. It has a method to read int16 directly from the data, but it only works if the data is little-endian, which it is not.
The above sample will get 160k points of data which means a grid of 400 by 400. Other methods will arrange the data in an actual grid and build the meshes. Getting the data takes a little over 1 second on my machine. I’d like to make it faster.


#2

It looks like a massive parallel process, so if you want high performance i recommend you to do that with CUDA/OpenCL and Max SDK.


#3

I’m afraid the SDK is out of my league.
I was hoping for some pointers on either reading the data faster or converting the raw data faster into the values I need. But if there are no apparent improvements possible within maxscript, I’ll make do with what I have now.


#4

try these two things:

netbytes = (dotnetclass "System.IO.File").ReadAllBytes @"c:	emp\N34W119.hgt" asdotnetobject:on
mxsbytes = (dotnetclass "System.IO.File").ReadAllBytes @"c:	emp\N34W119.hgt" 

the difference between them is first doesn’t convert to mxs value (array of integers).
the conversion takes 90% of all reading time.
the best you can get is about a half of second method, because you will return two times smaller array combining two bytes in one integer.


#5

Thanks Denis for the suggestion, again.
I’ve implemented the readallbytes method like so

function fn_getHgtDataSample hgtFilePath dataCount:160000 =
 (
 	local netbytes = (dotnetclass "System.IO.File").ReadAllBytes hgtFilePath asdotnetobject:on
 	local isLittleEndian = (dotNetClass "System.BitConverter").IsLittleEndian --the hgt data is big-endian. We need to reverse the byte order to get good data
 	local toInt16 = (dotNetClass "System.BitConverter").ToInt16 --this method will convert two bytes (8 bit each) to a 16 bit signed integer
 	if isLittleEndian do netbytes.Reverse netbytes --going from big-endian to little-endian
 	local arrData = for i = 0 to dataCount*2-1 by 2 collect ToInt16 netbytes i --loop over the bytes and collect int16's
 	arrData
 )

So instead of streaming the bytes, I’m getting the entire file in memory at once. This is ok, since my files are all 2.75 MB, not too large. There’s hardly any speed difference though. Still a little over 1 second. It’s better on memory than the streaming solution.
I guess the bulk of the processing time is in the conversion from two bytes to an integer.


#6

here is how you can do it with on-fly assembly:

fn readFileOpsAssembly =
 (
 	source  = ""
 	source += "using System;
"
 	source += "public class ReadFileOps
"
 	source += "{
"
 	source += "	public UInt16[] ReadFileShort(string file)
"
 	source += "	{
"
 	source += "		byte[] data = System.IO.File.ReadAllBytes(file);
"
 	source += "		int len = Buffer.ByteLength(data);
"
 	source += "		UInt16[] result = new UInt16[len / 2];
"
 	source += "		for (int k = 0, i = 0; k < len; k += 2, i++)
"
 	source += "		{
"
 	source += "			result[i] = (UInt16)(data[k + 1] << 8 | data[k]);
"
 	source += "		}
"
 	source += "		return result;
"
 	source += "	}
"
 	source += "}
"
 
 	csharpProvider = dotnetobject "Microsoft.CSharp.CSharpCodeProvider"
 	compilerParams = dotnetobject "System.CodeDom.Compiler.CompilerParameters"
 
 	compilerParams.GenerateInMemory = on
 	compilerResults = csharpProvider.CompileAssemblyFromSource compilerParams #(source)
 	
 	compilerResults.CompiledAssembly.CreateInstance "ReadFileOps"
 )
 fileops = readFileOpsAssembly()
 /*
 bb = fileops.ReadFileShort @"c:	emp\N34W119.hgt"
 */

as i said it’s about 2 time faster.

ps. double check that i do right the ‘little indian’ conversion


#7

the bulk is the conversion of .net object(s) to mxs value(s). that’s why a SDK solution can dramatically change the performance


#8

Yes, exactly. And thanks for the piece of code. It performs like you said.