View Full Version : Find and Replace in huge .txt files


RGhost
11 November 2010, 10:29 AM
fn test =
(
start = timeStamp()
f="d:\\test.txt"
fw="d:\\test_w.txt"
local contents = (dotNetClass "System.IO.File").ReadAllLines f

for i = 1 to contents.count do
(
if (matchPattern contents[i] pattern:"*text*") then
(
print contents[i]
--contents[i] = replaceByString contents[i] "bla2"
)
)

(dotNetClass "System.IO.File").WriteAllLines fw contents
end = timeStamp()
format "Processing took % seconds\n" ((end - start) / 1000.0)
)
test()


Hello. I wrote this function to find and replace some words in my text files. Each of my text file has 200mb of size, thats why I need to seed up this process. But seems I'm stuck :( I'll be glad to hear any suggestions how speed up this process.

JHN
11 November 2010, 10:50 AM
Maybe it's better to keep all the processing in dotnet.
Or maybe even lookat python, it's supposed to be really fast at processing text files.

-Johan

garryclarke
11 November 2010, 10:54 AM
PERL is very good at processing text files. It's pretty easy to use as well.

I've used it to auto inset tags into VRML files.

lo
11 November 2010, 04:44 PM
fn test =
(
start = timeStamp()
f="d:\\test.txt"
fw="d:\\test_w.txt"
while (heapFree<((getfileSize f)*4)) do heapSize+=10000000
local contents = (dotNetClass "System.IO.File").ReadAllLines f

for i = 1 to contents.count do
(
if (matchPattern contents[i] pattern:"*text*") then
(
contents[i] = substituteString contents[i] "text" "bla2"
)
)

(dotNetClass "System.IO.File").WriteAllLines fw contents
contents=#()
gc light:true
end = timeStamp()
format "Processing took % seconds\n" ((end - start) / 1000.0)
)
test()

If you make sure there's enough memory in your heap it's not that slow IMO. I tried this on a 120mb file which had lots of occurences of "text" in it, and it took 27 seconds, which I think is reasonable within the bounds of maxscript considering the size of the file. How long is it taking you to execute with your 200mb files?

denisT
11 November 2010, 06:39 PM
I wrote this function to find and replace some words in my text files. Each of my text file has 200mb of size, thats why I need to seed up this process. But seems I'm stuck :( I'll be glad to hear any suggestions how speed up this process.

what can you store in 200MBytes text file and use it with MAX?!

denisT
11 November 2010, 07:32 PM
You have to stay with c#/.net solution and not go back and forth .net <-> mxs.

global FileAssembly
fn CreateFileAssembly forceRecompile:on =
(
if forceRecompile or not iskindof ::FileAssembly dotnetobject or (::FileAssembly.GetType()).name != "Assembly" do
(

source = ""
source += "using System;\n"
source += "using System.IO;\n"
source += "using System.Text.RegularExpressions;\n"
source += "class FileIO\n"
source += "{\n"
source += "static public void ReplaceInFile(string fileIn, string searchText, string replaceText)\n"
source += "{\n"
source += " StreamReader reader = new StreamReader(fileIn);\n"
source += " string content = reader.ReadToEnd();\n"
source += " reader.Close();\n"
source += " content = Regex.Replace(content, searchText, replaceText);\n"
source += " StreamWriter writer = new StreamWriter(fileIn);\n"
source += " writer.Write(content);\n"
source += " writer.Close();\n"
source += "}\n"
source += "}\n"

csharpProvider = dotnetobject "Microsoft.CSharp.CSharpCodeProvider"
compilerParams = dotnetobject "System.CodeDom.Compiler.CompilerParameters"

compilerParams.ReferencedAssemblies.Add("System.dll");

compilerParams.GenerateInMemory = true
compilerResults = csharpProvider.CompileAssemblyFromSource compilerParams #(source)

FileAssembly = compilerResults.CompiledAssembly
FileAssembly.CreateInstance "FileIO"
)
)
global FileIO = CreateFileAssembly()
global replaceInFile = FileIO.ReplaceInFile


for 210Mb file where every 3rd word has to be replaced it takes 15sec on my machine for not cached file and 5sec for the cached.

RGhost
11 November 2010, 08:37 PM
what can you store in 200MBytes text file and use it with MAX?!
I'm patch .mi files because of 3dsmax exporter shortcomings. :/


denisT: thank you for example, I'll try it.
lo: thank you. I'll keep in mind max's memory limitations.

CGTalk Moderation
11 November 2010, 08:37 PM
This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.


1