Point Colour Conversion Efficiency


#3

how many lines of data has the origin file?

Anywhere from 10000 to 300 million plus. :wink: The big files take forever and sometimes crash due to memory. Task manager is only showing one thread being used as well.

Another problem with my script is that you can only do one file at a time…ideally i want to put some batch support in there…like picking a directory of files and letting it go through and convert each file. I’ve been attempting that one, without success, because there is a sample script in the help for loading maxfiles…doing something…then closing them. Figured that was a good place to start.


#4

does it mean the average file has millions lines of data?


#5

i don’t think you can do it with MXS. you method crashes because of running out of memory…

Another problem with my script is that you can only do one file at a time…ideally i want to put some batch support in there…

forget about this problem. as against the first one it’s not a problem at all.


#6

i would try to do it using c# dll… but we will try to make an on-fly assembly.
wait a bit…


#7

well… i have a result.
1,000,000 line file converted to right format for ~10 sec. with no memory leak.
so hopefully 100,000,000 should be processed for ~ 2 min… Slow? Yes. But why do you make so big files? :wink:


global LargeFileOps = 
(
	source = ""
	source += "using System;
"
	source += "using System.Text;
"
	source += "using System.IO;
"
	source += "public class LargeFileOps
"
	source += "{
"
	source += "	public int MakeLargeFile(string sourceFile, string line, int count)
"
	source += "	{
"
	source += "		StringBuilder sb = new StringBuilder(); 
"
	source += "		for (int k = 0; k < count; k++)
"
	source += "		{
"
	source += "			sb.AppendLine(line);
"
	source += "		}
"
	source += "		StreamWriter sw = new StreamWriter(sourceFile);
"
	source += "		sw.Write(sb);
"
	source += "		sw.Close();
"
	source += "		return count;
"
	source += "	}
"
	source += "	public int ConvertCloud(String sourceFile, String targetFile)
"
	source += "	{
"
	source += "		String[] data = File.ReadAllLines(targetFile);
"
	source += "		if (data != null)
"
	source += "		{
"
	source += "			StringBuilder sb = new StringBuilder();
"
	source += "			String format = \"{3} {4} {5}\
\";
"
	source += "			char[] sp = new char[] { ' ' };
"
	source += "			foreach (String d in data)
"
	source += "			{
"
	source += "				String[] str = d.Split(sp, StringSplitOptions.RemoveEmptyEntries);
"
	source += "				if (str.Length == 6)
"
	source += "				{
"
	source += "					sb.AppendFormat(format, Array.ConvertAll(str, new Converter<String, String>(CloudColor)));
"
	source += "				}
"
	source += "			}
"
	source += "			StreamWriter sw = new StreamWriter(sourceFile);
"
	source += "			sw.Write(sb);
"
	source += "			sw.Close();
"
	source += "			return data.Length;
"
	source += "		}
"
	source += "		else return -1;
"
	source += "	}
"
	source += "	public static String CloudColor(String st)
"
	source += "	{
"
	source += "		return (String.Format(\"{0:0.000}\", Single.Parse(st)/255.0).ToString());
"
	source += "	}
"
	source += "}
"


	csharpProvider = dotnetobject "Microsoft.CSharp.CSharpCodeProvider"
	compilerParams = dotnetobject "System.CodeDom.Compiler.CompilerParameters"

	compilerParams.ReferencedAssemblies.AddRange #("System.dll")

	compilerParams.GenerateInMemory = on
	compilerResults = csharpProvider.CompileAssemblyFromSource compilerParams #(source)
	
	assembly = compilerResults.CompiledAssembly
	assembly.CreateInstance "LargeFileOps"
)

(
	LargeFileOps.MakeLargeFile @"c:	emp\cloud.txt" "0 0 0 -20 100 255" 1000000

	gc()
	t1 = timestamp()
	m1 = heapfree
	d = LargeFileOps.convertCloud @"c:	emp\cloud_cc.txt" @"c:	emp\cloud.txt"
	format "data: %
" d
	format "C#  >> time:% memory:%
" (timestamp() - t1) (m1 - heapfree)
)

i’m lazy to copy/paste a line 1,000,000 times to make a test file, so i added MakeLargeFile to the class :slight_smile:


#8

what things can be improved? has anyone an idea?

#1 for really large files we need #progress event
#2 can we find anything better than string.split?
#3 is there anything faster than use array.convertall?

anything else?


#9

C# should be the faster solution as denis has said but FYI your code could be more memory efficent by replacing

theText += theVal as string + " "

with

append theText (theVal as string + " ")

or alternatively removing theText variable altogether and formatting straight to your output file.

OR again - formatting to a stringstream and then writing the result to the file every N iterations where N = 1000 or 100000 or so to reduce the number of individual write calls.


#10

everything is right what you said. but our case is very-very-very problematic. a simple mxs tricks can’t solve it.
try for example simply create > 1,000,000 strings in mxs and see the memory leak.


#11

as i said i don’t see a solution with using pure mxs… but…
if anyone want to play:

do the same but using memstream

don’t make any arrays or strings in you method

it’s possible. it shouldn’t eat memory. it will be slow… but at least you have a chance to see an end… :slight_smile:


#12

Denis, I think the results of your method are different than the original code. It should leave the first 3 values untouched but still write them back to the output file.

Regarding performance, this yields a significant improvement:

source += "	public int ConvertCloud(String sourceFile, String targetFile)
"
	source += "	{
"
	source += "		String[] data = File.ReadAllLines(sourceFile);
"
	source += "		if (data != null)
"
	source += "		{
"
	source += "			char[] sp = new char[] { ' ' };
"
	source += "			string fmt = \"F3\";
"
	source += "			for (int d = 0; d < data.Length; d++)
"
	source += "			{
"
	source += "				String[] str = data[d].Split(sp, StringSplitOptions.RemoveEmptyEntries);
"
	source += "				if (str.Length == 6)
"
	source += "				{
"
	source += "					str[3] = (Single.Parse(str[3]) / 255f).ToString(fmt);
"
	source += "					str[4] = (Single.Parse(str[4]) / 255f).ToString(fmt);
"
	source += "					str[5] = (Single.Parse(str[5]) / 255f).ToString(fmt);
"
	source += "					data[d] = String.Join(\" \",str);
"
	source += "				}
"
	source += "			}
"
	source += "			File.WriteAllLines(targetFile, data);
"
	source += "			return data.Length;
"
	source += "		}
"
	source += "		else return -1;
"
	source += "	}
"

The reasons for improved performance are:

  1. The data is changed in place, making less work for garbage collector.
  2. Using File.WriteAllLines takes care of newline characters for me.
  3. Array.ConvertAll was converting all 6 values instead of just the last 3.
  4. Single.ToString(<string> format) is less expensive than String.Format(<string> format, <object> obj)
  5. String.Join is faster than String.Format for this number of elements

#13

LO,
i absolutely agreed with everything…


#14

K…you guys are lightyears ahead of me here but it seems like you are getting it to work. I can’t really make any sense of what you guys have coded here though…so not quite sure how to create a script that i can run to test it.

Lo is right…the first three values are untouched and the last three are RGB values that are being converted to float values essentially.

For clarity sake, a new file doesn’t need to be made with the converted data if it would speed things up. I did it that way because i don’t really know another way. So if its easier to open a file…change it…then save it, thats fine by me. I really appreciate you guys having a look at it…even though i don’t get what your doing! :slight_smile:


#15

Use the code Denis posted and replace the relevant part with the method I posted.

Unfortunately that would introduce no speedup at all, though nothing prevents you from specifying the same file as both source file and target file.


#16

What would really speed things up was if you could use files that have the values as bytes instead of ascii.


#17

LO, i couldn’t beat your version… i don’t see any way to improve it.
so using lo’s version i added the progress changed event… here is a sample how to use it:


global CloudsAssembly
fn CreateCloudsAssembly forceRecompile:on =
(
	if forceRecompile or not iskindof ::CloudsAssembly dotnetobject or (::CloudsAssembly.GetType()).name != "Assembly" do
	(

source = ""
source += "using System;
"
source += "using System.Text;
"
source += "using System.IO;
"
source += "public class Clouds
"
source += "{
"
source += "	public delegate void ProgressHandler(object sender, ProgressEventArgs e);
"
source += "	public event ProgressHandler ProgressChanged;
"
source += "	private bool _reportProgress = false;
"
source += "	public bool ReportProgress 
"
source += "	{
"
source += "		get { return _reportProgress; }
"
source += "		set { _reportProgress = value; } 
"
source += "	}
"
source += "	private int _reportRate = 1;
"
source += "	public int ReportRate 
"
source += "	{
"
source += "		get { return _reportRate;  } 
"
source += "		set { _reportRate = value; } 
"
source += "	}
"
source += "	private void onUpdateProgress(string file, float progress, int size)
"
source += "	{
"
source += "		if (ProgressChanged == null) return;
"
source += "		ProgressEventArgs args = new ProgressEventArgs(file, progress, size);
"
source += "		ProgressChanged(this, args);
"
source += "	}
"
source += "	public int MakeTestCloud(string sourceFile, string line, int count)
"
source += "	{
"
source += "		string[] data = new string[count];
"
source += "		for (int k = 0; k < count; k++) data[k] = line;
"
source += "		File.WriteAllLines(sourceFile, data); 
"
source += "		return count;
"
source += "	}
"
source += "	public int ConvertCloud(string sourceFile, string targetFile)
"
source += "	{
"
source += "		string[] data = File.ReadAllLines(targetFile);
"
source += "		if (data != null)
"
source += "		{
"
source += "			char[] sp = new char[] { ' ' };
"
source += "			string fmt = \"F3\";
"
source += "			for (int k = 0; k < data.Length; k++)
"
source += "			{
"
source += "				string[] str = data[k].Split(sp, StringSplitOptions.RemoveEmptyEntries);
"
source += "				if (str.Length == 6)
"
source += "				{
"
source += "					str[3] = (Single.Parse(str[3]) / 255f).ToString(fmt);
"
source += "					str[4] = (Single.Parse(str[4]) / 255f).ToString(fmt);
"
source += "					str[5] = (Single.Parse(str[5]) / 255f).ToString(fmt);
"
source += "					data[k] = String.Join(\" \", str);
"
source += "				}
"
source += "				if ((ReportProgress) && ((k+1) % ReportRate == 0))
"
source += "				{
"
source += "					onUpdateProgress(targetFile, 100f * (k+1) / data.Length, data.Length);
"
source += "				}
"
source += "			}
"
source += "			File.WriteAllLines(sourceFile, data);
"
source += "			if (ReportProgress) onUpdateProgress(targetFile, 100f, data.Length);
"
source += "			return data.Length;
"
source += "		}
"
source += "		else return -1;
"
source += "	}
"
source += "	public class ProgressEventArgs : EventArgs
"
source += "	{
"
source += "		private string _file;
"
source += "		public string File
"
source += "		{
"
source += "			get { return _file; }
"
source += "			private set { _file = value; }
"
source += "		}
"
source += "		private float _progress;
"
source += "		public float Progress
"
source += "		{
"
source += "			get { return _progress; }
"
source += "			private set { _progress = value; }
"
source += "		}
"
source += "		private int _size;
"
source += "		public int Size
"
source += "		{
"
source += "			get { return _size; }
"
source += "			private set { _size = value; }
"
source += "		}
"
source += "		public ProgressEventArgs(string file, float progress, int size)
"
source += "		{
"
source += "			File = file;
"
source += "			Progress = progress;
"
source += "			Size = size;
"
source += "		}
"
source += "	}
"
source += "}
"

		csharpProvider = dotnetobject "Microsoft.CSharp.CSharpCodeProvider"
		compilerParams = dotnetobject "System.CodeDom.Compiler.CompilerParameters"

		compilerParams.ReferencedAssemblies.AddRange #("System.dll")

		compilerParams.GenerateInMemory = on
		compilerResults = csharpProvider.CompileAssemblyFromSource compilerParams #(source)
		
		if (compilerResults.Errors.Count > 0 ) then
		(
			errs = stringstream ""
			for i = 0 to (compilerResults.Errors.Count-1) do
			(
				err = compilerResults.Errors.Item[i]
				format "Error:% Line:% Column:% %
" err.ErrorNumber err.Line err.Column err.ErrorText to:errs 
			)
			MessageBox (errs as string) title: "Errors encountered while compiling C# code"
			format "%
" errs
			undefined
		)
		else
		(
			assembly = compilerResults.CompiledAssembly
		)
	)
)
CloudsAssembly = CreateCloudsAssembly()

(
	Clouds = CloudsAssembly.CreateInstance "Clouds"
	Clouds.ReportProgress = on
	Clouds.ReportRate = 1000
	
	fn updateProgress s e =
	(
		format "% %
" e.Progress e.Size e.File
	)
	dotnet.removeAllEventHandlers Clouds
	dotnet.addEventHandler Clouds "ProgressChanged" updateProgress

	Clouds.MakeTestCloud @"c:	emp\cloud.txt" "0 0 0 -20 100 255" 100000
	
	Clouds.ConvertCloud @"c:	emp\cloud_cc.txt" @"c:	emp\cloud.txt"
)


#18

OMG! That’s definitely a first :cool:


#19

i have a little improvement… :wink:
if our file format correct (all values are separated by one space symbol), we don’t need
StringSplitOptions.RemoveEmptyEntries option to split. splitting without the option makes the function a little faster.

if the file is not correct the using of StringSplitOptions.RemoveEmptyEntries doesn’t help any way in some cases (if tabulation used in place of space for example)


#20

Wow! That works wayyyyyyyyy better then mine. It seems alot faster as well. One last thing i commented out the part where you were creating the the test file. This line:


 Clouds.MakeTestCloud @"c:	emp\cloud.txt" "0 0 0 -20 100 255" 100000
 

I just need to know how to make the input file and the output file selectable from a user standpoint. This line:


 Clouds.ConvertCloud @"c:	emp\cloud_cc.txt" @"c:	emp\cloud.txt"
 

Just wanted to say thanks again you guys…i really appreciate the help.


#21

it’s really WOW.
.NET gives us a chance. there is no way to do it using only pure MXS.
is it faster? yes. it’s 10 times faster.
but the performance is not really the issue. the memory leaking is the real problem.

are you asking about how to launch the open/save dialog? give me a break… that’s not really a question to ask anyone for…


#22

This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.