# Thread: Set Membership Cycles: The Axiom of Regularity in ZF Set Theory.

1. Originally Posted by Ackbeet
Ok, thanks for that.

Matt, undefined, and Failure: you've been great and very helpful. I greatly appreciate all the help!

I really don't see much point in spending more time on this question; I think we have a workable solution. So, unless you're dying to come up with the "ideal" algorithm, please don't spend any more time on this problem on my account.

Regards.
Excellent, this is the sort of programming I get up in the morning for. Nice one.

2. Well I found a way to do those zero-filled lists without NumPy. Also, I had an awkward bit with nextParents because I wasn't used to python list concatenation. So this is cleaner

Code:
data = [[1,5,8],[3,6],[],[3],[],[],[],[],[17],[], \
[4],[22,23,24],[],[],[],[4],[],[20],[],[5], \
[8],[29],[],[],[12],[],[5],[],[4,22],[27]]

visited = [0] * len(data)

def main():
for i in range(len(data)):
if not visited[i]:
recurse(i, [i])

def recurse(node, parents):
visited[node] = 1
if not data[node]:
return
# represent elements of current node as boolean array
curr = [0] * len(data)
for i in data[node]:
curr[i] = 1
# check for membership chain
for i in range(len(parents)):
if curr[parents[i]]:
printerr(i, parents)
# recurse
for i in data[node]:
if not visited[i]:
recurse(i, parents + [i])

def printerr(start, L):
s = ""
for i in L[start:len(L)]:
s += str(i) + " contains "
s += str(L[start])
print s

main()

3. Originally Posted by undefined
Optimised the checking for membership chain:

Code:
    # check for membership chain
for i in range(len(parents)):
if curr[parents[i]]:
printerr(i, parents)
That's the loop for which I used the "marked" bit to flag an assembly as a member of the current thread of the depth-first search: if the tip of the depth-first search encounters a node whose "marked" bit is set it has looped back on itself. That's what absolves us from the need to repeatedly loop through the chain of parents of the current state of the depth-first search. - It is interesting that you didn't think it worthwhile to take over that idea.

Of course, your visited bits can be used to improve the performance of what I had suggested in post #52 as well, to reduce the number of assemblies that have to be checked in the main-loop of checkDb() like this:

Code:
def checkAssembly(assembly, stack):
visited[assembly]  = True
if assembly.marked:
raise CycleException(assembly, stack)

assembly.marked = True
stack.append(assembly)
for component in assembly.components:
checkAssembly(component, stack)
stack.pop()
assembly.marked = False

def checkDb(db):
try:
for assembly in db:
if not visited[assembly]:
checkAssembly(assembly, [])

clear all visited bits here... (or just delete that bitarray)

except CycleException, cycle:
# if we get here, a cycle has been detected - dump it
print "cycle %s" % cycle.tip.name,
y = cycle.tail.pop()
try:
while cycle.tip != y:
print "<-%s" % y.name,
y = cycle.tail.pop()
finally:
# ignore empty stack
print "<-%s detected!" % y.name
But this only is useful if one want's to check the whole database for cycles, not so if one just want's to check wether the addition of some assemblies to a single assembly has introduced a cycle into a formerly cycle-free database.

Also: instead of raising an exception and thus abandoning any search for more cycles, one could print out the cycle found and then simply return from the checkAssembly() function, as your algorithm does.
How the information that at least one cycle has been detected gets returned to the outermost level is another question (well, maybe we can use the return value of checkAssembly() for that purpose).

4. Originally Posted by Failure
... It is interesting that you didn't think it worthwhile to take over that idea.
I didn't deliberately ignore your suggestion, it was more a result of my not being fluent in python, not accustomed to stacks, and lacking time to read as thoroughly as I would like to. Thanks for the comment. It's possible that I'm still missing what you're saying, but here's my new version (it now fits in the code box without a vertical scroll bar! EDIT: Only on the preview page..)

Code:
data = [[1,5,8],[3,6],[],[3],[],[],[],[],[17],[], \
[4],[22,23,24],[],[],[],[4],[],[20],[],[5], \
[8],[29],[],[],[12],[],[5],[],[4,22],[27]]

visited = [0]*len(data)

def main():
for i in range(len(data)):
if not visited[i]:
recurse(i, [i], [0]*i+[1]+[0]*(len(data)-i-1))

def recurse(node, parents, marked):
visited[node] = 1
for i in data[node]:
# check for membership chain
if marked[i]:
printerr(i, parents)
# recurse
if not visited[i]:
recurse(i, parents+[i], marked[0:i]+[1]+marked[i+1:len(data)])

def printerr(node, L):
s = str(node)
i = len(L)-1
while L[i] != node:
s += " is in " + str(L[i])
i -= 1
print s + " is in " + str(node)

main()
I'm understanding better how the stack implementation works; this thread continues to be very educational.

5. Originally Posted by undefined
Code:
def recurse(node, parents, marked):
visited[node] = 1
for i in data[node]:
# check for membership chain
if marked[i]:
printerr(i, parents)
# recurse
if not visited[i]:
recurse(i, parents+[i], marked[0:i]+[1]+marked[i+1:len(data)])
I'm surprised that you do actually carry marked[] along with the recursive calls. Surprised, because that's not what you do with your visited[] bits, although both bitarrays serve the same basic purpose: to remember, in an easily accessible form, what one already knows: namely, whether a node has been visited already (visited[]), and whether a node is on the depth-first search stack (marked[]).
Surprised also because, surely, the operation marked[0:i]+[1]+marked[i+1:len(data)]) must be very costly by way of computation (as opposed to just setting or clearing a bit in an existing bitarray).

- So why not treat both bitarrays the same way?

6. Originally Posted by undefined
Code:
visited = [0]*len(data)

def main():
for i in range(len(data)):
if not visited[i]:
recurse(i, [i], [0]*i+[1]+[0]*(len(data)-i-1))

def recurse(node, parents, marked):
visited[node] = 1
for i in data[node]:
# check for membership chain
if marked[i]:
printerr(i, parents)
# recurse
if not visited[i]:
recurse(i, parents+[i], marked[0:i]+[1]+marked[i+1:len(data)])
I have just noted that you did not declare visited as global in the functions that main() and recurse() that try to access it. Does this at all work? - I think it does not work (as expected), because using visited in a function without declaring it as global right at the beginning of that function automatically creates it as a local variable: that's certainly not what you intended.

7. Originally Posted by Failure
I'm surprised that you do actually carry marked[] along with the recursive calls. Surprised, because that's not what you do with your visited[] bits, although both bitarrays serve the same basic purpose: to remember, in an easily accessible form, what one already knows: namely, whether a node has been visited already (visited[]), and whether a node is on the depth-first search stack (marked[]).
Surprised also because, surely, the operation marked[0:i]+[1]+marked[i+1:len(data)]) must be very costly by way of computation (as opposed to just setting or clearing a bit in an existing bitarray).

- So why not treat both bitarrays the same way?
Hmm, I wrote the line with marked[0:i]+[1]+marked[i+1:len(data)] because it's more compact (in terms of number of lines, at least) than

nextMarked = marked[:]
nextMarked[i] = 1
recurse(i, parents+[i], nextMarked)

and hadn't thought of computational time. Being unaccustomed to python, I don't have much intuition on which operations are computationally expensive (or memory expensive), and what the best way is to accomplish various manipulations. As a result, I'm liable to write "optimisations" that slow down the code significantly on a larger scale.

Come to think of it, I should probably be using arrays with explicit booleans rather than numbers; I chose numbers also because of compactness! I tend to forget things with python's type system and don't know the ins and outs; for example, I know in Java a boolean array actually allocates 8 bits to each element even though only one bit is needed, for the sake of speed; for memory efficiency, one can import BitSet. But for python I'd have to research to see what's what.

I don't really understand your surprise that one bit array would be global and the other recursively passed along. My thought was that we need visited[] global because we're looping through all nodes and need to know globally whether a node has been visited, so as to process each node exactly once. And we need marked[] recursively passed because it holds data specific to a particular search path.

Originally Posted by Failure
I have just noted that you did not declare visited as global in the functions that main() and recurse() that try to access it. Does this at all work? - I think it does not work (as expected), because using visited in a function without declaring it as global right at the beginning of that function automatically creates it as a local variable: that's certainly not what you intended.
I did not think of this, but actually it does work as expected; here's a reference stating

'But note "x[0]=3" starts with search for "x", will use to global "x" if no local "x".'

Presumably a more official-looking reference could be found too, but empirically speaking the program behaved the same whether or not I added a global declaration.

8. Originally Posted by undefined
Hmm, I wrote the line with marked[0:i]+[1]+marked[i+1:len(data)] because it's more compact (in terms of number of lines, at least) than

nextMarked = marked[:]
nextMarked[i] = 1
recurse(i, parents+[i], nextMarked)

and hadn't thought of computational time.
But that, too, is very expensive, computationally: you do not have to create a copy nextMarked[] of the entire array marked[] in this way - just as you do not produce a copy of the entire visited[] array in order to set or clear a bit, and quite sensibly so, since those arrays may be rather large.
Therefore
Code:
marked[i] = 1
recurse(i, parents+[i], marked)
marked[i] = 0
would do just fine, and run much faster, too.

9. Originally Posted by Failure
But that, too, is very expensive, computationally: you do not have to create a copy nextMarked[] of the entire array marked[] in this way - just as you do not produce a copy of the entire visited[] array in order to set or clear a bit, and quite sensibly so, since those arrays may be rather large.
Therefore
Code:
marked[i] = 1
recurse(i, parents+[i], marked)
marked[i] = 0
would do just fine, and run much faster, too.
Now I see why you were surprised! That is a much smarter way of doing it. Thanks!

If we're going to get rid of the marked parameter, might as well get rid of parents parameter in the same manner.. then it's just a stack after all. For whatever it's worth, here's some new code

Code:
data = [[1,5,8],[3,6],[],[3],[],[],[],[],[17],[], \
[4],[22,23,24],[],[],[],[4],[],[20],[],[5], \
[8],[29],[],[],[12],[],[5],[],[4,22],[27]]

visited = [False]*len(data)
marked = [False]*len(data)
parents = []

def main():
for i in range(len(data)):
recurse(i)

def recurse(node):
if not visited[node]:
visited[node] = True
parents.append(node)
marked[node] = True
for i in data[node]:
if marked[i]:
printerr(i)
recurse(i)
parents.pop()
marked[node] = False

def printerr(node):
s = str(node)
i = len(parents)-1
while parents[i] != node:
s += " is in " + str(parents[i])
i -= 1
print s + " is in " + str(node)

main()

10. I have adapted my example code of post #52 to include undefined's idea of using a visited bit to avoid revisiting nodes that have already been checked in an earlier depth-first traversal. The cycle-checking functions checkAssembly() and checkDb() now return a boolean to tell the caller whether a cycle has been found or not. (See attachment).

Having separate arrays visited[] and marked[], as undefined's solution does, is very likely much preferable to my using permanent members visited and marked of Assembly objects (nodes), but I just wanted to sidestep the issue of how to map assemblies to bitarray indices.

P.S: This version 2 wants the path to the database specification as a commandline argument. So if you run it from IDLE it will not work as expected.
If you don't like that you must replace checkDb(argv[1]) by checkDb("example.txt") and also remove the if-statement that precedes this call of checkDb().

11. A thought: I'm not so sure that ruling out already-visited nodes is a good idea.

Try it on the following DB's (which, except for the cycle present, are certainly possibilities in our part system):

DB 1:
A={B,C}
D={E,C}
C={F,G}
F={H,A}
G={I,J}

DB 2:
A={B,C}
D={E,C}
C={F,G}
F={H,I}
G={J,A}

DB 3:
A={B,C}
D={E,C}
C={F,G}
F={H,A}
G={I,A}

12. Note: with these databases I'm presenting, it would be nice if the code flagged every cycle. But it's probably not necessary.

13. I've edited Post # 71. Please note the changes.

14. Originally Posted by Ackbeet
A thought: I'm not so sure that ruling out already-visited nodes is a good idea.

Try it on the following DB's (which, except for the cycle present, are certainly possibilities in our part system):

DB 1:
A={B,C}
D={E,C}
C={F,G}
F={H,A}
G={I,J}

DB 2:
A={B,C}
D={E,C}
C={F,G}
F={H,I}
G={J,A}

DB 3:
A={B,C}
D={E,C}
C={F,G}
F={H,A}
G={I,A}

DB 1 I translated into:

data = [[1,2],[],[5,6],[4,2],[],[7,0],[8,9],[],[],[]]

with output:

0 is in 5 is in 2 is in 0

DB 2 I translated into:

data = [[1,2],[],[5,6],[4,2],[],[7,8],[9,0],[],[],[]]

with output:

0 is in 6 is in 2 is in 0

DB 3 I translated into:

data = [[1,2],[],[5,6],[4,2],[],[7,0],[8,0],[],[]]

with output:

0 is in 5 is in 2 is in 0
0 is in 6 is in 2 is in 0

15. I'm sure you showed how the DB translation worked. I'm not interested in reviewing that right now. I'm impressed that the algorithm still flags all membership cycles. Good stuff!

Page 5 of 6 First 123456 Last