quickly find duplicated names in array of objs


#1

I have an array of objects
some objects have the same name
what is the fastest way to identify these the duplicated names?

I am trying to understand if/how bsearch can be applied in this situation but the documentation is unclear.


#2
uniques = #()
dups = #()
for node in nodes where not (appendifunique uniques node.name) do appendifunique dups node.name
dups

(written blind)

i cannot guaranty that is the fastest way, but it’s fast enough, clear and compact :slight_smile:


#3

Nice Denis. My own solution for this is easily 3x longer. I think I need to start thinking differently.

Your solution works if I change ‘nodes’ to ‘objects’


#4

nodes in my script means ‘a list of any specified objects which have a “name” property’. (selection, materials, modifiers, etc.)


#5

the binary search solution would look a bit like this…


   strings = #("grandad", "table","immutable","mutable","immutable","Dictionaries","Vectors",
   			"mutable","immutable","Dictionaries","Vectors")
   
   struct hstr ( str, hash )
   	
   fn sortcomparefn i j values: =
   (
   	v1 = values[i].hash; 
   	v2 = values[j].hash;
   	if v1 < v2 then 1 else if v1 > v2 then -1 else 0;
   )	
   
   fn hashstring str =
   (
   	h = 2166136261;
   	for i = 1 to str.count do
   		h = bit.or (h * 16777619) (bit.charAsInt str[i]);
   	h;
   )	
   
   fn CollectHStr stringlist = for s in stringlist collect hstr s (hashstring s);
   
   fn FindUniqueBinaryStylee stringlist =
 (	
 	nStrs = stringlist.count;
 	
 -- hash the strings and create an indexing array	
 	
 	hashedstrings = CollectHStr stringlist;
 	indexes =  #{1..nStrs} as array;
 
 -- sort the indexes	
 	
 	qsort indexes sortcomparefn values:hashedstrings;
 
 -- collect uniques using binary search	
 	
 	for i = 1 to nStrs where i == bsearch i indexes sortcomparefn values:hashedstrings collect i
 )	

   uniques = FindUniqueBinaryStylee strings;
duplicates = (#{1..strings.count} - uniques as bitarray) as array; 

not as concise as denis’s “brute force” method but would win hands down if when the count gets high.

compressed edit


#6

i’m absolutely sure that for long lists the solution like:

names = for node in nodes collect node.name
sort names
dups = #()
for k=2 to names.count+1 where stricmp names[k-1] names[k] ==0 and stricmp names[k] names[k+1] !=0 do append dups names[k-1]

will be faster


#7

and my solution above can be faster… the half of string comparisons are not necessary.
pseudo code is:

if current item in list is the same as previous and previous was not added to list of dups add current

#8

the calling method from BIT structure is slow

getting a structure item property is slow

but the hashing is a right idea. the sorting and comparing of integers is faster than of anything else


#9

i have two functions written on c++ - find uniques and find dups
they use almost the algorithm

using list of processing items collect list of their hash values

sort linked lists by hashed values

do linear iterating through sorted hash values to search hits


#10

easily changed but I’m not a bit fan of max’s hash function

strings = #("grandad", "table","immutable","mutable","immutable","Dictionaries","Vectors",
               "mutable","immutable","Dictionaries","Vectors")
   
   fn sortcomparefn i j values: =
   (
       v1 = values[i]; 
       v2 = values[j];
       if v1 < v2 then 1 else if v1 > v2 then -1 else 0;
   )    
   
   fn FindUniqueBinaryStylee stringlist =
   (    
       nStrs = stringlist.count;
       
   -- hash the strings and create an indexing array    
       
       hashedstrings =  for s in stringlist collect getHashValue  s 73856093;
       indexes =  #{1..nStrs} as array;
   
   -- sort the indexes    
       
       qsort indexes sortcomparefn values:hashedstrings;
   
   -- collect uniques using binary search,  collects a unique value or the first value in a run of duplicates 
       
       for i = 1 to nStrs where i == bsearch i indexes sortcomparefn values:hashedstrings collect i
   )    
   
   uniques = FindUniqueBinaryStylee strings;
   duplicates = (#{1..strings.count} - uniques as bitarray) as array;

#11

the problem of this method is that you can search by only exact coincidences. so if you want to search strings with case ignore you have to use their lower case for example… if you search close points (point2,3,4,color…) you have to round them.


#12

in c++ i’m using std::hash


#13

not had any issues using this method with points, normals and colour. It’s the same method used in our exporter, and that hasn’t failed yet… he say’s tempting fate :slight_smile: as for the exact coincidences you can vary the hash to account for those. I use my own hash functions for the most part. Besides the OP did ask how binary search could be used in this situation ! :wink:


#14

i said if you want to find ‘close enough’ points you have to round them anyhow before hashing…


#15

if you want to use a “linear” cull with the sorted hash array it can be done like this

strings = #("grandad", "table","immutable","mutable","immutable","Dictionaries","Vectors","Dictionaries","Vectors",
   			"mutable","immutable","Dictionaries","Vectors","Dictionaries","Vectors", "happy go lucky")
   
   fn sortcomparefn i j values: =
   (
   	v1 = values[i]; 
   	v2 = values[j];
   	if v1 < v2 then 1 else if v1 > v2 then -1 else 0;
   )	
   
   fn CopyUnique indices hashes =
 (
 	unique = #();
 	unique.count = indices.count;
 	j = 1;
 	unique[j] = indices[j];
 	for i = 2 to indices.count where hashes[unique[j]] != hashes[indices[i]] do
 	(	
 		j += 1;
 		unique[j] = indices[i];
 	)
 	unique.count = j;	
 	unique;
 )
   
   fn FindUniqueNotInABinaryStylee stringlist =
   (	
   	nStrs = stringlist.count;
   	
   -- hash the strings and create an indexing array the sort the indexes	
   	
   	hashedstrings =  for s in stringlist collect getHashValue  s 73856093;
   	indexes =  #{1..nStrs} as array;
   	qsort indexes sortcomparefn values:hashedstrings;
   
   -- collect uniques using binary search,  collects a unique value or the first value in a run of duplicates 
   	
   	--for i = 1 to nStrs where i == bsearch i indexes sortcomparefn values:hashedstrings collect i
   		
   	CopyUnique indexes hashedstrings;
   )	
   
   uniques = FindUniqueNotInABinaryStylee strings;
   duplicates = (#{1..strings.count} - uniques as bitarray) as array;

but I doubt it will win much over the binary search, I wouldn’t be surprised if the bsearch variation is faster.


#16

This simple solution worked for me. it finds duplicate names, stores them in an array, then appends duplicate named ones with a random number string.

uniques = #()
dups = #()
for i in selection where not (appendifunique uniques i.name) do appendifunique dups i
clearselection()
select dups
for i in dups do i.name = i.name+"_"+random 1 10000 as string

#17

It is not safe to use random if you want to get unique names exactly.
We’ve discussed several times how to make a unique name while keeping it simple. The best solution is to use the built-in uniqueName method.

The script might be as:

fn makeUniqueName name trim:"_0123456789" post:"" numDigits: = 
(
	_name = trimright name trim 
	uniqueName (_name + post) numDigits:numDigits
)

#18

Very cool. That’s why communites are awesome, always helpful info!.

Thanks!