# Thread: Set Membership Cycles: The Axiom of Regularity in ZF Set Theory.

1. So here's an attempt at a traversing algorithm. My apologies for the psuedo-code/C++ mixture here. My questions:

1. Have I got the recursion right?
2. Is this a relatively efficient algorithm?

Code:
initialize check_stack = NULL;
initialize assemblies;

boolean preorder(node x, check_stack)
{
check_stack.push(x);
if there are any repeats in check_stack
then
check_stack.pop();
return FALSE;
else
boolean check_children = TRUE;
for all y in x.children
{
check_children = check_children AND preorder(y, check_stack);
}
check_stack.pop();
return check_children;
end if
}

int main(void)
{
boolean all_checked_out = TRUE;
for all x in assemblies
{
all_checked_out = all_checked_out AND preorder(x, check_stack);
}
print(all_checked_out);
return 0;
}

2. Originally Posted by Ackbeet
So here's an attempt at a traversing algorithm. My apologies for the psuedo-code/C++ mixture here. My questions:

1. Have I got the recursion right?
2. Is this a relatively efficient algorithm?

Code:
initialize check_stack = NULL;
initialize assemblies;

boolean preorder(node x, check_stack)
{
check_stack.push(x);
if there are any repeats in check_stack
then
check_stack.pop();
return FALSE;
else
boolean check_children = TRUE;
for all y in x.children
{
check_children = check_children AND preorder(y, check_stack);
}
check_stack.pop();
return check_children;
end if
}

int main(void)
{
boolean all_checked_out = TRUE;
for all x in assemblies
{
all_checked_out = all_checked_out AND preorder(x, check_stack);
}
print(all_checked_out);
return 0;
}
Hmm I tend to avoid custom stacks if possible just to reduce complexity (hence why I usually write depth-first rather than breadth-first). So I have some difficulty following your code, which I think reflects more on me than on the code. A question on logistics: Is it easy to parse through a stack in your language of choice, e.g., to see whether it contains a certain object, or to check for duplicates? Easy both in terms of writing the code, and in terms of time complexity, as in how would the speed compare to iterating through an array, etc.

I will need time to read the code, and probably will be busy today, also I won't mind sharing the recursive approach I mentioned earlier without a custom stack, when I have time to implement.

3. Hmm I tend to avoid custom stacks if possible just to reduce complexity...
I would, too, except that at each node, I've got to know the entire list of ancestor nodes in order to check for duplicates. A stack seems the most appropriate data structure there.

Question as to parsing a stack: I've thought about that. I think that should be easy enough:

Initialize a second stack.
Pop all the values of the first stack into the second stack. As you go, also put the items in an array.
Pop all the values of the second stack back into the first stack. (Two order reversals gets you back to the original order.)
Check the array for duplicates.

I suppose you could argue that the way to go here would be to simply use an array the whole time, and pass an index into the sub-function as well as a reference to the array. That might be faster, I agree. But it's been so long since I've done C++ that I didn't want to trust my syntax with pointers and such. The important point, of course, is that at each node we're processing, we know exactly what path along the directed (hopefully) acyclic graph we traversed to get there.

Theoretically, it should take n(n-1)/2 comparisons to check for duplicates in an array of size n, correct?

I would be interested in seeing your implementation - in psuedo-code. I'm not particularly interested in the innards of Java, I'm afraid.

Thanks again!

4. Originally Posted by Ackbeet
I would, too, except that at each node, I've got to know the entire list of ancestor nodes in order to check for duplicates. A stack seems the most appropriate data structure there.

...
Understandable points regarding stacks. But with limited experience I can't say for sure how appropriate it is to use a stack here.

Originally Posted by Ackbeet
Theoretically, it should take n(n-1)/2 comparisons to check for duplicates in an array of size n, correct?
I thought the exact same thing at first, but I think we can do much much better by using a boolean array, where each item in your inventory has a unique id that gets mapped to an index in the array; initialize the array to all false; then just go through your list/stack one by one and set the corresponding boolean array index to true, checking first whether it's already true - which would indicate that there's a duplicate.

Originally Posted by Ackbeet
I would be interested in seeing your implementation - in psuedo-code. I'm not particularly interested in the innards of Java, I'm afraid.
Also very understandable! I'm thinking then of writing something in Python because it tends to be very compact and readable. (In fact, it can look a lot like pseudo-code.) I'm not as familiar with it, but it shouldn't be too hard as it won't be very specialized code.

Originally Posted by Ackbeet
Thanks again!
You're welcome! Thanks for the interesting question.

5. Originally Posted by undefined
Understandable points regarding stacks. But with limited experience I can't say for sure how appropriate it is to use a stack here.

I thought the exact same thing at first, but I think we can do much much better by using a boolean array, where each item in your inventory has a unique id that gets mapped to an index in the array; initialize the array to all false; then just go through your list/stack one by one and set the corresponding boolean array index to true, checking first whether it's already true - which would indicate that there's a duplicate.
I haven't followed your discussion of the code in detail, but it seems to me that what you are proposing here amounts to a marking algorithm to detect cycles during a depth-first traversal of the directed graph: Every node on the stack needs to have its mark bit set true, and upon return from a node one needs to set its bit to false again.
So, for efficiency reasons, you'd better not use a mark bit to just check the stack for duplicates over and over again, but, instead, just make this setting/clearing of the mark bit an integral part of the depth-first traversal (set the mark bit when you push a node, clear it when you pop a node), because then you never have to traverse the stack just in order to search for duplicates.
A cycle is detected if and only if you try to push a node onto the stack that has its mark bit already set.

6. I thought the exact same thing at first, but I think we can do much much better by using a boolean array, where each item in your inventory has a unique id that gets mapped to an index in the array; initialize the array to all false; then just go through your list/stack one by one and set the corresponding boolean array index to true, checking first whether it's already true - which would indicate that there's a duplicate.
For this to be efficient, the mapping from id to index would have to be efficient. I suppose everything in sight here could be sorted, implying that you could use binary search with O(log(n)) operations every time you indexed. That's not too bad, I suppose. Of course, you'd have to sort everything. So you're looking at an O(n log(n)) efficiency hit there, unless it's already sorted. Given the usage of the database, an insertion sort might be the way to go, depending on the internal representation. On the other hand, you'd only need to do that once every time you want to change the assemblies in the database.

7. Originally Posted by Failure
I haven't followed your discussion of the code in detail, but it seems to me that what you are proposing here amounts to a marking algorithm to detect cycles during a depth-first traversal of the directed graph: Every node on the stack needs to have its mark bit set true, and upon return from a node one needs to set its bit to false again.
So, for efficiency reasons, you'd better not use a mark bit to just check the stack for duplicates over and over again, but, instead, just make this setting/clearing of the mark bit an integral part of the depth-first traversal (set the mark bit when you push a node, clear it when you pop a node), because then you never have to traverse the stack just in order to search for duplicates.
A cycle is detected if and only if you try to push a node onto the stack that has its mark bit already set.
That sounds right, I haven't actually reviewed the algorithm of post #31 enough to see whether the recursion scheme etc. makes sense, I was just commenting on how best to search for duplicates in an arbitrary list.

8. Reply to failure at post #35:

I haven't followed your discussion of the code in detail, but it seems to me that what you are proposing here amounts to a marking algorithm to detect cycles during a depth-first traversal of the directed graph: Every node on the stack needs to have its mark bit set true, and upon return from a node one needs to set its bit to false again.
So, for efficiency reasons, you'd better not use a mark bit to just check the stack for duplicates over and over again, but, instead, just make this setting/clearing of the mark bit an integral part of the depth-first traversal (set the mark bit when you push a node, clear it when you pop a node), because then you never have to traverse the stack just in order to search for duplicates.
A cycle is detected if and only if you try to push a node onto the stack that has its mark bit already set.
Does the fact that a part can be used in multiple different assemblies (even immediate parents) change this idea at all?

9. Originally Posted by Ackbeet
For this to be efficient, the mapping from id to index would have to be efficient. I suppose everything in sight here could be sorted, implying that you could use binary search with O(log(n)) operations every time you indexed. That's not too bad, I suppose. Of course, you'd have to sort everything. So you're looking at an O(n log(n)) efficiency hit there, unless it's already sorted. Given the usage of the database, an insertion sort might be the way to go, depending on the internal representation. On the other hand, you'd only need to do that once every time you want to change the assemblies in the database.
If memory is not a concern then there's no need to make a special mapping, just use the id as your index, and let the size of the array be 1 greater than the highest possible id.

10. Originally Posted by Ackbeet
For this to be efficient, the mapping from id to index would have to be efficient.
Isn't perhaps your id a small sequential number anyway so that the id itself could be the index into the array of mark bits?
Or, another idea, could you perhaps extend the database record of every node with a mark bit field? In that case, a mapping would not even be necessary.

11. Originally Posted by Ackbeet
Reply to failure at post #35:

Does the fact that a part can be used in multiple different assemblies (even immediate parents) change this idea at all?
Well, it is just using the idea that you are doing a depth-first traversal of a directed graph and you want to detect whether this traversal loops back on itself at some point. So no, I don't think it has anything to do with the question of what the in-degree and/or out-degre of a node happens do be. The only problem is that you need to do this starting with every node in the database: thus this doesn't seem particularly efficient to me. But it would be great for what I proposed in the case of a database that one already knows to be cycle free: in that case, only in the case of modifying an existing assembly by adding an already existing assembly as a component would one have to execute this marking depth-first serach for a cycle beginning at the node that is about to be modified, because: if this modification introduces a cycle in the formerly cycle-free database, this assembly that is about to be modified must be part of the cycle.

Actually, you don't even need an explicit stack: you can just use the invocation stack of recursive calls if you like (but, of course, you don't have to: using an explicit stack might perhaps be a little more efficient).

12. Hmm. At some point, hardware efficiency ceases to be a virtue, and programmer efficiency becomes paramount. The most efficient algorithm in the world won't help me if the programmer who needs to code it up (which is not me, by the way) can't do it. The programmer in question does have reasonable chops; but extensive senior-level CS-type code is probably not suitable.

So, I guess what I'm looking for is a fairly simple, straight-forward algorithm with reasonable run-time efficiency that is also straight-forward to code. The coding language is likely going to be Visual Basic for Applications, since the database is in Microsoft Access 2010 (I know this isn't the most robust database - we'll migrate over to SQL Server, probably, when we run into scaling issues; the advantage of Access is that the learning curve is easy.) Incidentally, I'm definitely NOT interested in getting into language-specific concerns in this thread. I want the pseudo-code version of an algorithm that fits the above criteria.

So whenever you get a chance, an evaluation of the correctness of my algorithm in post # 31 would be terrific, and any suggestions for blatant inefficiencies would also be appreciated. Thanks!

13. Reply to failure at post # 41:

Well, I wouldn't count on the knowing the database is already cycle-free. We may want to import a whole bunch of data wholesale into the database, and check that out. On the other hand, we have an entire table devoted to listing all the assemblies. So we don't have to guess which parts are assemblies and which are not.

14. Regarding code language: I plan on using a real language (in this case Python) because then I'll be able to test to make sure there are no bugs. But if it comes out not looking pseudo-cody enough, then I can rewrite. Won't be able to look at algorithm in post #31 in depth for a little while.

15. Originally Posted by Ackbeet
So whenever you get a chance, an evaluation of the correctness of my algorithm in post # 31 would be terrific, and any suggestions for blatant inefficiencies would also be appreciated.
Your algorithm #31 seems correct to me. The main inefficiencies are, in my opinion, twofold:
1. The inefficiency of repeatedly checking the stack for duplicates (i.e. for the case that the depth-first traversal loops back on itself). We have talked about that and you know what I suggest to get rid of this problem (almost) entirely.
2. The inefficiency of using depth-first search for all assemblies in the database: because this will make your algorithm traverse the same subtrees repeatedly (because it cannot remember that those subtrees have already been traversed earlier). For imagine a rather deeply nested chain of assemblies of assemblies: for every assembly in that chain, your program is going to be executed, although if you had known in advance that this was a chain, you could have just done a single depth first traversal beginning at the root node of that chain, and dispensed width doing traversals for the other assemblies lower down in the chain.

Page 3 of 6 First 123456 Last