PDA

View Full Version : Find and Replace in huge .txt files


RGhost
11-08-2010, 09:29 AM
fn test =
(
start = timeStamp()
f="d:\\test.txt"
fw="d:\\test_w.txt"
local contents = (dotNetClass "System.IO.File").ReadAllLines f

for i = 1 to contents.count do
(
if (matchPattern contents[i] pattern:"*text*") then
(
print contents[i]
--contents[i] = replaceByString contents[i] "bla2"
)
)

(dotNetClass "System.IO.File").WriteAllLines fw contents
end = timeStamp()
format "Processing took % seconds\n" ((end - start) / 1000.0)
)
test()


Hello. I wrote this function to find and replace some words in my text files. Each of my text file has 200mb of size, thats why I need to seed up this process. But seems I'm stuck :( I'll be glad to hear any suggestions how speed up this process.

JHN
11-08-2010, 09:50 AM
Maybe it's better to keep all the processing in dotnet.
Or maybe even lookat python, it's supposed to be really fast at processing text files.

-Johan

garryclarke
11-08-2010, 09:54 AM
PERL is very good at processing text files. It's pretty easy to use as well.

I've used it to auto inset tags into VRML files.

lo
11-08-2010, 03:44 PM
fn test =
(
start = timeStamp()
f="d:\\test.txt"
fw="d:\\test_w.txt"
while (heapFree<((getfileSize f)*4)) do heapSize+=10000000
local contents = (dotNetClass "System.IO.File").ReadAllLines f

for i = 1 to contents.count do
(
if (matchPattern contents[i] pattern:"*text*") then
(
contents[i] = substituteString contents[i] "text" "bla2"
)
)

(dotNetClass "System.IO.File").WriteAllLines fw contents
contents=#()
gc light:true
end = timeStamp()
format "Processing took % seconds\n" ((end - start) / 1000.0)
)
test()

If you make sure there's enough memory in your heap it's not that slow IMO. I tried this on a 120mb file which had lots of occurences of "text" in it, and it took 27 seconds, which I think is reasonable within the bounds of maxscript considering the size of the file. How long is it taking you to execute with your 200mb files?

denisT
11-08-2010, 05:39 PM
I wrote this function to find and replace some words in my text files. Each of my text file has 200mb of size, thats why I need to seed up this process. But seems I'm stuck :( I'll be glad to hear any suggestions how speed up this process.

what can you store in 200MBytes text file and use it with MAX?!

denisT
11-08-2010, 06:32 PM
You have to stay with c#/.net solution and not go back and forth .net <-> mxs.

global FileAssembly
fn CreateFileAssembly forceRecompile:on =
(
if forceRecompile or not iskindof ::FileAssembly dotnetobject or (::FileAssembly.GetType()).name != "Assembly" do
(

source = ""
source += "using System;\n"
source += "using System.IO;\n"
source += "using System.Text.RegularExpressions;\n"
source += "class FileIO\n"
source += "{\n"
source += "static public void ReplaceInFile(string fileIn, string searchText, string replaceText)\n"
source += "{\n"
source += " StreamReader reader = new StreamReader(fileIn);\n"
source += " string content = reader.ReadToEnd();\n"
source += " reader.Close();\n"
source += " content = Regex.Replace(content, searchText, replaceText);\n"
source += " StreamWriter writer = new StreamWriter(fileIn);\n"
source += " writer.Write(content);\n"
source += " writer.Close();\n"
source += "}\n"
source += "}\n"

csharpProvider = dotnetobject "Microsoft.CSharp.CSharpCodeProvider"
compilerParams = dotnetobject "System.CodeDom.Compiler.CompilerParameters"

compilerParams.ReferencedAssemblies.Add("System.dll");

compilerParams.GenerateInMemory = true
compilerResults = csharpProvider.CompileAssemblyFromSource compilerParams #(source)

FileAssembly = compilerResults.CompiledAssembly
FileAssembly.CreateInstance "FileIO"
)
)
global FileIO = CreateFileAssembly()
global replaceInFile = FileIO.ReplaceInFile


for 210Mb file where every 3rd word has to be replaced it takes 15sec on my machine for not cached file and 5sec for the cached.

RGhost
11-08-2010, 07:37 PM
what can you store in 200MBytes text file and use it with MAX?!
I'm patch .mi files because of 3dsmax exporter shortcomings. :/


denisT: thank you for example, I'll try it.
lo: thank you. I'll keep in mind max's memory limitations.

CGTalk Moderation
11-08-2010, 07:37 PM
This thread has been automatically closed as it remained inactive for 12 months. If you wish to continue the discussion, please create a new thread in the appropriate forum.