Bsearch to replace finditem with huge array?


#1

I want to use bsearch to replace finditem function,blow is the origin way:

Fn Find_string_in_array arr str=
(
	count=0
	for i=1 to arr.count do
	(
		if findstring arr[i] str!=undefined then
		(
			count+=1
		)
	)
	count
)

key_string="C:\Users\Administrator\Personal\WeChat Files"
need_to_find=#(
"C:\Users\Administrator\Personal",
"C:\Users\Administrator\Personal\3ds Max 2020",
"C:\Users\Administrator\Personal\3ds Max 2021",
"C:\Users\Administrator\Personal\3dsMax",
"C:\Users\Administrator\Personal\Apowersoft",
"C:\Users\Administrator\Personal\Autodesk Application Manager",
"C:\Users\Administrator\Personal\WeChat Files",
"C:\Users\Administrator\Personal\WeChat Files\test.max",
"C:\Users\Administrator\Personal\Windows Live"
)

print (Find_string_in_array need_to_find key_string)

–results
–2 matches
/*
“C:\Users\Administrator\Personal\WeChat Files”,
“C:\Users\Administrator\Personal\WeChat Files\test.max”,
*/

It will too slow for a huge array,I am trying to use bsearch to do the same job,but I don’t know how to use it.

Fn bsearch_fn=
(
	--how to compare the string?
)
Fn Find_string_use_bsearch arr str=
(
	count=0
	compare=bsearch str need_to_find Comp_fn
	if compare!=undefined then count+=1
	count
)
print (Find_string_use_bsearch need_to_find key_string)

Any help would be appericate!


#2

this is one way to do it, if you can guarantee the strings are already sorted you can forgo the qsort stage…

  (
    	
    	key_string="C:\Users\Administrator\Personal\WeChat Files";
    	strings = #(    "C:\Users\Administrator\Personal", "C:\Users\Administrator\Personal\3ds Max 2020",
    						"C:\Users\Administrator\Personal\3ds Max 2021", "C:\Users\Administrator\Personal\3dsMax",
    						"C:\Users\Administrator\Personal\Apowersoft", "C:\Users\Administrator\Personal\Autodesk Application Manager",
    						"C:\Users\Administrator\Personal\WeChat Files", "C:\Users\Administrator\Personal\WeChat Files\test.max",
    						"C:\Users\Administrator\Personal\Windows Live" );
    	
    	fn string_indexing_sort_compfn v1 v2 string_array: = stricmp string_array[v1] string_array[v2];
    	fn string_indexing_search_compfn v1 v2 string_array: = stricmp v1 string_array[v2];
    	indexing = for i = 1 to strings.count collect i;
    	
    	qsort indexing string_indexing_sort_compfn string_array:strings;
    	key_index = bsearch key_string indexing string_indexing_search_compfn string_array:strings;

    )

#3

If the
key_string="C:\Users\Administrator\Personal\3ds Max"

then the key_index returns undefined,in my example with findstring function,returns

"C:\Users\Administrator\Personal\3ds Max 2020"
"C:\Users\Administrator\Personal\3ds Max 2021"

:disappointed_relieved:


#4

i’m afraid it’s difficult to do with besearch , the function just search the first element matched
If i need to do this , I will use regular expression , here is a reference

findstrings=@"C:\Users\Administrator\Personal\WeChat Files"
aa=#(
@"C:\Users\Administrator\Personal",
@"C:\Users\Administrator\Personal\3ds Max 2020",
@"C:\Users\Administrator\Personal\3ds Max 2021",
@"C:\Users\Administrator\Personal\3dsMax",
@"C:\Users\Administrator\Personal\Apowersoft",
@"C:\Users\Administrator\Personal\Autodesk Application Manager",
@"C:\Users\Administrator\Personal\WeChat Files",
@"C:\Users\Administrator\Personal\WeChat Files\test.max",
@"C:\Users\Administrator\Personal\Windows Live"
)
bb=aa as string
cc=substituteString bb "\\" "/"
ss=dotNetObject "System.Text.RegularExpressions.Regex" "" (dotnetclass "System.Text.RegularExpressions.RegexOptions").IgnoreCase
dd=ss.Split cc (substituteString findstrings "\\" "/")
sel=#()
for i = 2 to dd.count do 
if dd[i][1] ==  "\"" then append sel findstrings
	 else append sel (findstrings + substituteString (filterString dd[i] "\",")[1] "/"  "\\"  ) 
sel

#5

how “huge” are your arrays?

do you sure you need “find a sub string” instead of “check matching a pattern”?

in many cases matchpattern works faster than findstring


#6

Over 500,000 items ,I’ve tried matchpattern,it’s faster than findstring,I want to cost less time to finish the job if possible.


#7

are you always looking from the beginning of path?

at least it looks like from your examples

(
	t0 = timestamp()
	h0 = heapfree

	x = "hjihfidjsfhjihfidjsfhjihfidjsfhjihfidjsfhjihfidjsfhjihfidjsfhjihfidjsfidjsfhjihfidjsfhjihfidjsfhjihfidjsfhjihfidjsf"
	
	for k=1 to 500000 do
	(
		(substring x 1 x.count) == x
	)

	format "time:% heap:%\n" (timestamp() - t0) (h0 - heapfree)
)

this is the worst scenario in our case… but it’s only ~1 sec. Is one sec a problem?


#8

I don’t know how an exact string comparison is done in MXS … maybe using memcmp it might be faster … if we can’t use the SDK, at least .NET methods can give us comparable speed.


#9

if we need to look for substring, then something simple that comes to mind is the Boyer-Moore Algorithm

it is somewhere in my archive, but… since it is in the archive, this means that it was not really in demand


#10

Another method, which seems reasonable, is to organize everything like a TREE (which makes sense for paths). BRANCH search can be more efficient


#11

here is a fast reference, I use stringstream to build a 500000 elements string , with 100000 matched , less than 3000ms , I insert 100 other strings , less than 400ms , test on max 2021

(
	aa=#(
	@"C:\Users\Administrator\Personal",
	@"C:\Users\Administrator\Personal\3ds Max 2020",
	@"C:\Users\Administrator\Personal\3ds Max 2021",
	@"C:\Users\Administrator\Personal\3dsMax",
	@"C:\Users\Administrator\Personal\Apowersoft",
	@"C:\Users\Administrator\Personal\Apowersoft",
	@"C:\Users\Administrator\Personal\Autodesk Application Manager",
	@"C:\Users\Administrator\Personal\WeChat Files",
	@"C:\Users\Administrator\Personal\WeChat Files\test.max",
	@"C:\Users\Administrator\Personal\Windows Live"
)
global cc=""
for j = 1 to 10 do cc+=substituteString aa[j] "\\" "/"+","
	fn test5d =
	(
		local ss = stringstream"",fmt ="%"
		for i = 1 to 1000 do format fmt cc to:ss  -- for 500000 elements , set i = 1 to 50000 , if use old version max , may very slow
		ss as string
	)
	st = timestamp()
	global bb = test5d()
	format "Build Time: %ms\n" (timestamp()-st)
	aa
)
(
	st = timestamp()
	findstrings=@"C:\Users\Administrator\Personal\WeChat Files"
	ss=dotNetObject "System.Text.RegularExpressions.Regex" "" (dotnetclass "System.Text.RegularExpressions.RegexOptions").IgnoreCase
	dd=ss.Split bb (substituteString findstrings "\\" "/")
	sel=#()  -- if just need the count , no need below , the time will hundreds ms with 500000 elements , the count is dd.count - 1
	ss1 = stringstream""
	for i = 2 to dd.count  do format  (findstrings + (substituteString (substring dd[i] 1 (ss.Match dd[i] ",").Index)  "/" "\\") + ",") to:ss1
	sel=ss.split ss1 ","
	deleteitem sel sel.count
	format "Search Time: %ms\n" (timestamp()-st)
	sel.count
)

#12

It seems too complex…


#13

Where do you get that big array of filepaths from?
Maybe you could just use .net .GetFiles with a mask instead? or mxs built-in getFiles <wild_card_filename_string>


#14

Thanks for the code,I tried on old version,3ds max 2013,

500000 elements string:

Build Time: 121261ms
Search Time: 11569ms

I will try all code in this post,thanks for all


#15

I use a .dll file read all files and folders of a driver such as D:,then I want to count some directories or files which I need,all directories or files stored in a big array,that’s why I need a fastest way to do this job.


#16

In order to use bsearch() you first need to sort the array, and that will be a killer, moreover if you use qsort().

I don’t think this is a good case for using bsearch().


#17

This takes just 188ms for 500K items in an unsorted array, which in my opinion is pretty fast.

(
	fn FindStringInArray arr str =
	(
		str += "*"
		count=0
		
		for i in arr where matchpattern i pattern:str do count+=1
		
		return count
	)
)

What is your performance goal?


#18

this is exactly what I showed above … I don’t see performance issues either


#19

OK,I did a test,it’s really fast,the matchpattern is the best solution,thank you all again.