# Thread: Set Membership Cycles: The Axiom of Regularity in ZF Set Theory.

1. Originally Posted by Ackbeet
Reply to failure at post # 41:

Well, I wouldn't count on the knowing the database is already cycle-free. We may want to import a whole bunch of data wholesale into the database, and check that out.
As I wrote earlier: that's a basic problem of database maintenance. Ordinarily, you would not want to throw a bunch of possibly inconsistent garbage into an otherwise consistent database and then work very, very hard to figure out whether the entire database is still consistent (in your case: cycle free) or not.
It is usually much simpler and significantly more efficient to just prevent inconsistent modifications of a database to be made in the first place. That's what transaction handling is all about: to make sure that no application ever sees the database in an inconsistent state.

2. Reply to Failure at post # 46:

So you're saying it would be better, if I have some batch processing, to toss in one record at a time, checking the data at each step, rather than put it all in and then check once?

3. Originally Posted by Ackbeet
Reply to Failure at post # 46:

So you're saying it would be better, if I have some batch processing, to toss in one record at a time, checking the data at each step, rather than put it all in and then check once?
Well, certainly if the overall database is much, much larger than the new data that you are inserting. And I suppose, databases have a way of getting larger and larger (at least in my experience).
One could perhaps consider finding a kind of "merging" algorithm: say what you are inserting is, in itself, already known to be cycle free, just like the main database. The question is, can the addition of that cycle free addition to the existing database introduce a cycle? Certainly not, if that inseration does not lead to adding an existing assembly to an already existing assembly (which is the critical operation for cycle building).

4. Reply to Failure at post # 48:

I have a strong feeling that the sheer scale of the database at present is much too small to worry about optimizing the initialization. We have on the order of 1000 parts, and at most 30 assemblies, probably more like 15 assemblies; the maximum depth right now is 3 levels (A is part of B is part of C). We might get up to 5 levels, but I don't see more levels than that for the foreseeable future.

5. Originally Posted by Ackbeet
Reply to Failure at post # 46:

So you're saying it would be better, if I have some batch processing, to toss in one record at a time, checking the data at each step, rather than put it all in and then check once?
Another thought: I just don't quite understand your application. Of course, if building the database once and then hold it fixed for all eternity is your goal then doing the check once would perhaps be ok.

But usually, databases grow. Do you want to do that kind of check every time the database is modified?

Also: having the database in a consistent state is a good thing for all kinds of application programs that want to work with the data contained in that database. For example, if a program knows in advance, that the graph is cycle free, it need not protect itself against the possibility of a cycle occuring (by implementing a marking scheme of its own, perhaps), since such a cycle may well make it loop forever.

6. Reply to Failure at post # 50:

By "check once" in that context, I meant an initial check after dumping in the legacy data. Naturally, every time we edit the assembly list, we're going to want to check for consistency (or whenever we decide we need to check for consistency, as per previous suggestions in this thread.)

Also: having the database in a consistent state is a good thing for all kinds of application programs that want to work with the data contained in that database. For example, if a program knows in advance, that the graph is cycle free, it need not protect itself against the possibility of a cycle occuring (by implementing a marking scheme of its own, perhaps), since such a cycle may well make it loop forever.
True. For future scalability and maintainability, that might well be a concern.

7. Originally Posted by Ackbeet
Hmm. At some point, hardware efficiency ceases to be a virtue, and programmer efficiency becomes paramount. The most efficient algorithm in the world won't help me if the programmer who needs to code it up (which is not me, by the way) can't do it.
When I tried to code what I had suggested, I found that in order to easily dump the actual cycle that has been detected it seems preferable to keep an explicit stack. - Besides, the code is not really complicated at all. I attach a file that reads a database of assemblies from a text file, dumps the database to stdout and then checks it for cycles.

The cycle detection logic is contained in the following two functions:

Code:
def checkAssembly(assembly, stack):
if assembly.marked:
raise CycleException(assembly, stack)

assembly.marked = True
stack.append(assembly)
for component in assembly.components:
checkAssembly(component, stack)
stack.pop()
assembly.marked = False

def checkDb(db):
try:
for assembly in db:
checkAssembly(assembly, [])

except CycleException, cycle:
# if we get here, a cycle has been detected - dump it
print "cycle %s" % cycle.tip.name,
y = cycle.tail.pop()
try:
while cycle.tip != y:
print "<-%s" % y.name,
y = cycle.tail.pop()
finally:
# ignore empty stack
print "<-%s detected!" % y.name
# end of cycle checking
If you run this program (checkcycles.txt - should be checkcycles.py) with python and the example database (example.txt) you get the output

"cycle A <-H <-B <-A detected!"

If one were allowed to assume that the database is cycle-free and wanted to modify a certain assembly by adding some components, one could just run the much faster checkAssembly(assembly, []) on it: If no cycle is detected (i.e. CycleException is not raised), then the overall database is still cycle free after this modification.

8. I get a syntax error on line 44, right after

Code:
except CycleException
It doesn't seem to like that comma.

Also, could you please give me the exact syntax in IDLE in order to run the file? Thanks!

9. Originally Posted by Ackbeet
I get a syntax error on line 44, right after

Code:
except CycleException
It doesn't seem to like that comma.

Also, could you please give me the exact syntax in IDLE in order to run the file? Thanks!
I am sorry about that. Maybe you are running a different version of Python. I am running version 2.6.4. But the latest version of Python is in the 3.x range and, I am told, there have been significant changes as regards syntax when they moved from 2.6 to 3.x.

It is no accident that they have left version 2.7 for download on the net: that's because too many libraries and old scripts are broken by the new syntax of 3.x.
That's the reason I have not (yet) switched to 3.x: I need certain libraries that are not yet compatible with 3.x (or at least have not been compatible with it the last time I checked).

In 7. Compound statements &mdash; Python v3.1.2 documentation I read that the exception clause would have to have the following form in python 3.x:
Code:
except CycleException as cycle:
So, maybe, simply replacing the offending comma with "as" might do the trick. (Since I do not have 3.x installed on my machine, and would prefer not to at the moment, I cannot try it out.)

As to running the script from within IDLE: if the default directory contains the file example.txt, then you should be able to just select "Run Module" from the "Run" menu of the window that is showing the source code of the script (maybe you should rename the file so that it gets a .py instead of a .txt extension).

I could also give you a version of the program that takes the path to the database file from the command line if you like - but then you would have to run the program from the command shell.

Maybe another remark is in order: in the script I have uploaded, I have not cleared the "marked" bits of the assemblies that are on the stack when a cycle is detected. Thus, one might want to do that in the exception handling part.

10. Ah. I thought it might have been a different version of Python causing the problem. I uninstalled the version 3.something that I had and installed version 2.6.5. And after opening the module in Python, I could run the module on the example.txt file just fine. It flags the cycle as expected. It also does not appear to flag consistent db's, at least not with some simple experimentation. Thanks a bunch! I really appreciate all that work. I'm going to mark the thread as solved now.

11. I guess there's no need to review the algorithm in post #31 right? I'll admit I still haven't taken the time to look in detail. Here's a way without a stack:

Code:
import numpy as np

data = [[1,5,8],[3,6],[],[3],[],[],[],[],[17],[], \
[4],[22,23,24],[],[],[],[4],[],[20],[],[5], \
[8],[29],[],[],[12],[],[5],[],[4,22],[27]]

visited = np.zeros(len(data))

def main():
for i in range(len(data)):
if not visited[i]:
recurse(i, [i])

def recurse(node, parents):
visited[node] = 1
if not data[node]:
return
# check for membership chain
for i in range(len(parents)):
for j in data[node]:
if parents[i] == j:
printerr(i, parents)
# recurse
for i in data[node]:
if not visited[i]:
nextParents = parents[:]
nextParents.append(i)
recurse(i, nextParents)

def printerr(start, L):
s = ""
for i in L[start:len(L)]:
s += str(i) + " contains "
s += str(L[start])
print s

main()
Output:
Code:
3 contains 3
8 contains 17 contains 20 contains 8
Note that the chain is written in the other direction, so from left to right it's "contains" rather than "is a member of".

I'm unsure of what other comments to add, because I don't know if it will be easy for others to read the code as-is. It's the same set of data I made up for the program in post #18.

12. Optimised the checking for membership chain:

Code:
import numpy as np

data = [[1,5,8],[3,6],[],[3],[],[],[],[],[17],[], \
[4],[22,23,24],[],[],[],[4],[],[20],[],[5], \
[8],[29],[],[],[12],[],[5],[],[4,22],[27]]

visited = np.zeros(len(data))

def main():
for i in range(len(data)):
if not visited[i]:
recurse(i, [i])

def recurse(node, parents):
visited[node] = 1
if not data[node]:
return
# represent elements of current node as boolean array
curr = np.zeros(len(data))
for i in data[node]:
curr[i] = 1
# check for membership chain
for i in range(len(parents)):
if curr[parents[i]]:
printerr(i, parents)
# recurse
for i in data[node]:
if not visited[i]:
nextParents = parents[:]
nextParents.append(i)
recurse(i, nextParents)

def printerr(start, L):
s = ""
for i in L[start:len(L)]:
s += str(i) + " contains "
s += str(L[start])
print s

main()

13. What language did you use here? It looks like Python. If so, which version?

14. Originally Posted by Ackbeet
What language did you use here? It looks like Python. If so, which version?
Oh right, sorry for not mentioning. It's Python 2.6.5 with NumPy, but I only used NumPy so I could have nice one-liners for an array filled with zeros. Perhaps there's a way to do this without NumPy but I don't have enough Python experience to know it. It took me a while to install NumPy incidentally, because at first I tried with 2.7 (not compatible) and further with 64-bit (something was wrong, had to switch to 32-bit).

It would be easy to write a custom zeros() function not using NumPy.

15. Ok, thanks for that.

Matt, undefined, and Failure: you've been great and very helpful. I greatly appreciate all the help!

I really don't see much point in spending more time on this question; I think we have a workable solution. So, unless you're dying to come up with the "ideal" algorithm, please don't spend any more time on this problem on my account.

Regards.

Page 4 of 6 First 123456 Last