# Thread: Set Membership Cycles: The Axiom of Regularity in ZF Set Theory.

1. So, just to reiterate, and to produce a completed algorithm, I exhibit this new and improved algorithm:

Code:
% Construct set S:
S = 0;
For all parts P[i]
if Assembly[P[i]] == TRUE
S = S union P[i];
end if
end for

% Check Axiom of Regularity
checked_out = TRUE;
For all assemblies S[i]
For all sub-assemblies SS[i] of S[i]
If SS[i] in S Then
checked_out = FALSE;
end if
end for
end for
return checked_out;
I think this will do it. Any thoughts?

2. Originally Posted by Ackbeet
Reply to Failure at Post #14:

That would be a fantastically easy way to check on circularity, I agree. However, it's not realistic for our database.

Parts are not going to be entered in anything like a linear fashion.
But, surely, they are being entered one after another. At the very least, the database itself will enforce that, will it not?

What's more, that sort of linearity is actually undesirable for our database
Well, that restriction does not force any particular linear order on the assemblies, but it forces some linear order on the assemblies. However, if your database is at all non-circular with respect to the "is a component of" relationship, then such linearization must be possible anyway (that's actually what the so called "topological sorting" algorithm is all about: it produces a linear ordering that is compatible with a given - possibly non-linear - partial ordering).
Also, there is no problem about multiple users adding new assemblies consisting of parts only that are already in the database: provided the assemblies they define are named differently. - Is it at all possible in your application that multiple users try to define the same assembly (possibly with different components)? And if so: who wins? That is: who's definition is made to stick?

P.S: Well, I think I get it now. The problem is not the definition of new assemblies but the modification of existing ones. The question is whether such modification still makes the database (in principle) linearizable (i.e. cycle free).

P.P.S: If an assembly is modified, one only needs to check whether any new component is a direct or indirect parent of the modified assembly. If so, the modified assembly is invalid (i.e. leads to circularity in the database). - Seems easy enough to me. -

3. Originally Posted by Ackbeet
You're thinking of a theoretical realm of possible assemblies. However, in our little company here, we may accumulate maybe 10 new kinds of assemblies per year. We might have as many as 30 right now. I think with that real-world constraint, which I would agree is important, my new algorithm would be fairly efficient. What do you think?
Hmm well 2^30 is greater than 10^9 which can be time-expensive depending on what it is you're doing that many times. I probably spent too much time on this, but I didn't want to give an ill-informed guess.

Java program that uses depth-first search to produce subsets:

Code:
import java.util.ArrayList;

public class AoRAlgo30 {
static int[] S = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
20,21,22,23,24,25,26,27,28,29};
static ArrayList<ArrayList<Integer>> C;
static boolean foundError = false;

@SuppressWarnings("unchecked")
public static void main(String[] args) {
/* ~~~~~~~ test 1 ~~~~~~~ */

System.out.println("Test 1: Do nothing inside the loop.");
long t = time();
int i, stop = 1 << 30;
for(i = 0; i < stop; i++)
;
System.out.println();
System.out.println("Elapsed: " + (time() - t) / 1000.0 + " s");

/* ~~~~~~~ test 2 ~~~~~~~ */

System.out.println();
System.out.println("Test 2: Implement AoR algo with 30 components.");
System.out.println();
t = time();
ArrayList<Integer> a0 = make(1,5,8);
ArrayList<Integer> a1 = make(3,6);
ArrayList<Integer> a2 = make();
ArrayList<Integer> a3 = make(3);
ArrayList<Integer> a4 = make();
ArrayList<Integer> a5 = make();
ArrayList<Integer> a6 = make();
ArrayList<Integer> a7 = make();
ArrayList<Integer> a8 = make(17);
ArrayList<Integer> a9 = make();
ArrayList<Integer> a10 = make(4);
ArrayList<Integer> a11 = make(22,23,24);
ArrayList<Integer> a12 = make();
ArrayList<Integer> a13 = make();
ArrayList<Integer> a14 = make();
ArrayList<Integer> a15 = make(4);
ArrayList<Integer> a16 = make();
ArrayList<Integer> a17 = make(20);
ArrayList<Integer> a18 = make();
ArrayList<Integer> a19 = make(5);
ArrayList<Integer> a20 = make(8);
ArrayList<Integer> a21 = make(29);
ArrayList<Integer> a22 = make();
ArrayList<Integer> a23 = make();
ArrayList<Integer> a24 = make(12);
ArrayList<Integer> a25 = make();
ArrayList<Integer> a26 = make(5);
ArrayList<Integer> a27 = make();
ArrayList<Integer> a28 = make(4,22);
ArrayList<Integer> a29 = make(27);
C = make2(a0,a1,a2,a3,a4,a5,a6,a7,a8,a9,
a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,
a20,a21,a22,a23,a24,a25,a26,a27,a28,a29);
parseSubsets(0, new ArrayList<Integer>());
if(foundError) System.out.println();
System.out.println("Elapsed: " + (time() - t) / 1000.0 + " s");
}

static void parseSubsets(int x, ArrayList<Integer> list) {
if(x==S.length) {
if(!list.isEmpty())
checkAoR(list);
return;
}
ArrayList<Integer> option1 = new ArrayList<Integer>(list);
parseSubsets(x+1, option1);
ArrayList<Integer> option2 = new ArrayList<Integer>(list);
parseSubsets(x+1, option2);
}

static void checkAoR(ArrayList<Integer> list) {
int i, j;
for(i = 0; i < list.size(); i++) {
ArrayList<Integer> sublist = C.get(list.get(i));
boolean disjoint = true;
for(j = 0; j < sublist.size(); j++)
if(list.contains(sublist.get(j))) {
disjoint = false;
break;
}
if(disjoint) return;
}
foundError = true;
System.out.println("Problematic set of components: " + list);
}

static ArrayList<Integer> make(int... nums) {
ArrayList<Integer> ret = new ArrayList<Integer>();
for(int i : nums)
return ret;
}

static ArrayList<ArrayList<Integer>> make2(ArrayList<Integer>... lists) {
ArrayList<ArrayList<Integer>> ret = new ArrayList<ArrayList<Integer>>();
for(ArrayList<Integer> i : lists)
return ret;
}

static long time() {
return System.currentTimeMillis();
}
}
Output:

Code:
Test 1: Do nothing inside the loop.

Elapsed: 0.0040 s

Test 2: Implement AoR algo with 30 components.

Problematic set of components: [8, 17, 20]
Problematic set of components: [3]
Problematic set of components: [3, 8, 17, 20]
Problematic set of components: [1, 3]
Problematic set of components: [1, 3, 8, 17, 20]
Problematic set of components: [0, 8, 17, 20]
Problematic set of components: [0, 3, 8, 17, 20]
Problematic set of components: [0, 1, 3]
Problematic set of components: [0, 1, 3, 8, 17, 20]

Elapsed: 253.699 s
Another algorithm to generate the subsets would be to rely on the bijection between binary numbers and the elements of the power set.

A lot's been going on in this thread while I wrote this! Is this similar to what you had in mind?

4. Reply to Failure at Post #17: The way databases usually work, the first person to enter a part with a given name wins. That name is the primary key - it must be unique. So when the next guy comes along and tries to add a new part with the same name, the database won't allow that.

Also, there is no problem about multiple users adding new assemblies consisting of parts only that are already in the database: provided the assemblies they define are named differently.
But users will need to enter completely new parts, and there's simply no guarantee in what order they'll do that in. Entering new parts per se is ok, come to think of it; it's when we want to create assemblies in the DB that we need to be careful. Creating assemblies is something that happens in a different table from the parts themselves, so I suppose you could say that all assemblies are always created out of existing parts or existing assemblies.

The check I want to run is something we would run any time somebody changes the list-of-assemblies table.

5. Reply to undefined at Post #18:

Wow. That's a lot of code. I'm afraid I'm not up on my Java - I've been doing loads more LabVIEW recently. I have taken C++, but not C++-- (as they call Java).

Can you please give me a pseudo-code summary of what your program is doing?

One comment: I am really not seeing the reason why we have to check all possible assemblies - why can't we just check the ones we have? For $\displaystyle 30$ assemblies, that reduces the search space from $\displaystyle 2^{30}$ down to the total number of consitutent parts in all the assemblies, which is going to be a whole lot less than $\displaystyle 2^{30}$. That's a nice step!

6. Originally Posted by Ackbeet
Reply to Failure at Post #17: The way databases usually work, the first person to enter a part with a given name wins. That name is the primary key - it must be unique. So when the next guy comes along and tries to add a new part with the same name, the database won't allow that.

But users will need to enter completely new parts, and there's simply no guarantee in what order they'll do that in.
Entering new parts per se is ok, come to think of it; it's when we want to create assemblies in the DB that we need to be careful.
Defining new assemblies from existing ones is not a problem, never. The only problem is when someone decides to modify an existing assembly by adding some existing assembly as a new component to it: this, and only this operation might create a cycle in the database.

The check I want to run is something we would run any time somebody changes the list-of-assemblies table.
That's a general problem with databases: do you see to it that all (and I mean all) transactions maintain the consistency of the database or do you run a wholesale consistency check for the whole database every time somebody changes something?
I think, for efficiency reasons, it is not a good idea to run a wholesale check on the entire database so frequently.
Instead, what you need to do is just this: every time someone tries to modify an existing assembly by adding another assembly to it as a component, you need to check whether that new component of the modified assembly is a direct or indirect parent of the assembly about to be modified. If it is, the modification must be rejected, because it would create a cycle in the database - otherwise it can be allowed, because it will leave the database cycle-free.
P.S: Well, no: I think one must also check whether some direct or indirect child of that new component is a direct or indirect parent of the assembly about to be modified. - Requires some work...

Of course, you already have a database and this means that before you can implement what I suggest you need to make sure that the existing database is cycle free. That's a seperate problem (and I suggested using topological sorting for that purpose).

7. That's a good way of thinking about when we need to check the consistency. Thanks!

We're still in development for this database, so vee haf lots of pover!

8. Originally Posted by Ackbeet
That's a good way of thinking about when we need to check the consistency. Thanks!

We're still in development for this database, so vee haf lots of pover!
I hope you have seen that I have added a P.S to my suggestion. The problem is that it is not good enough to check whether assemblies that are to be added to an existing assembly are direct or indirect parents of the assembly about to be modified, but you also need to check, for every direct or indirect child of such a new component, whether it is a direct or indirect parent of the assembly about to be modified. So this will require some work. However, as I argued, that work need only be expended whenever an existing assembly is added to an existing assembly. Defining new assemblies and deleting components of an existing assembly is never a problem (as regards circularity of the database).

9. Originally Posted by Ackbeet
Reply to undefined at Post #18:

Wow. That's a lot of code. I'm afraid I'm not up on my Java - I've been doing loads more LabVIEW recently. I have taken C++, but not C++-- (as they call Java).

Can you please give me a pseudo-code summary of what your program is doing?

One comment: I am really not seeing the reason why we have to check all possible assemblies - why can't we just check the ones we have? For $\displaystyle 30$ assemblies, that reduces the search space from $\displaystyle 2^{30}$ down to the total number of consitutent parts in all the assemblies, which is going to be a whole lot less than $\displaystyle 2^{30}$. That's a nice step!
Regarding the last part: I'm not sure I follow. Partly maybe because of some confusion between the words assembly, part, component. I've used the word component to mean: either a stand-alone item, or an item that has one or more components under it. Then I use assembly as: a set of components. I haven't really used the word part. If this is not desired usage I can adapt.

(EDIT: Actually come to think of it, my definition of assembly is identical to that of component; I just have been using the terms in different contexts.)

Anyway there is a possible reduction if you have empty (stand-alone) components, because if your assembly contains such a component then you immediately have your disjoint set. However, I think we still need to check 2^k assemblies (for this particular algorithm) where k is the number of non-empty components. Say for example we have

A = {B, C}
B = {}
C = {D}
D = {A}

Then I think we should be checking 2^3 assemblies, because how can we know in advance that any of these assemblies won't find an error in the database? Just for completeness and to clear any possible ambiguity of terms, the 8 assemblies I mean are:

{D}
{C}
{C,D}
{A}
{A,D}
{A,C}
{A,C,D}

I didn't know whether the reduction from 2^30 to 2^k where k < 30 was in general possible because I couldn't tell if you might have 30 non-empty components in the database at any given time.

lol @ C++--, never heard that one before!

As to pseudo-code, I'll do a mix of explaining and psuedo-code

test 1 is simply doing a loop from 1 to 2^30 and doing nothing inside the loop, no need for pseudo-code.

test 2 has the main stuff.. First initialize some static data, where a0 is component 0, up through a29 is component 29. You can see for example that a3 is a problem because it contains itself. I wrote helper method make() just to make data initialization more readable, since Java can be clunky.

parseSubsets() is recursive depth-first search for getting all subsets, pseudo-code:

Code:
define a global list S that you want to get subsets of
function f (index, List):
if index = size of S:
process List
exit f
end if
call f (index + 1, List.append(S[index]) ) // add current element to list
call f (index + 1, List) // don't add current element to list
end f
So you start by calling f(0, <empty list>) and at each function call, you branch out by calling the function twice, once with adding S[index] to your list and once without.

Then process List in this case is calling checkAoR() which checks to make sure there is an element of our List that is disjoint with List. Pseudo-code:

Code:
function g (List):
for each List[i] in List:
set disjoint <- true
for each element e in List[i]:
if e has an element in common with List:
Set disjoint <- false
exit for
end if
end for
if disjoint = true:
exit g
end if
end for
end g
The logic here is careful to ensure that we are checking that there exists a disjoint element, not that all elements are disjoint. So what you wrote

Code:
% Check Axiom of Regularity
checked_out = TRUE;
For all assemblies S[i]
For all sub-assemblies SS[i] of S[i]
If SS[i] in S Then
checked_out = FALSE;
end if
end for
end for
return checked_out;
isn't checking the right thing.

10. Reply to failure at Post #23: That makes sense. Thanks!

11. Reply to undefined at Post # 24:

Ok, so help me out here. Let's say we've got

A = {B, C}
B = {D, E}
C = {A, F}.

Moving along the algorithm I posted in post #16, we construct our set S = {A, B, C}. If now I only check to see if there exists a set whose elements are disjoint with S, then I could produce B as such a set. Its members are disjoint from S. And yet, this database is not consistent. How is it that I can get away with only checking for existence, and not also checking that the disjointness holds for all of A, B, and C? This seems to me a deep question, going right to the heart of the AoR. I suppose I'm unconvinced that it should be an existential quantifier and not the universal, even though I think I've proved the non-existence of 3-cycles (at least, I think I have, in this post). What gives?

12. Originally Posted by Ackbeet
Reply to undefined at Post # 24:

Ok, so help me out here. Let's say we've got

A = {B, C}
B = {D, E}
C = {A, F}.
The set that would fail in the example you gave just now is S = {A, C}.

Originally Posted by Ackbeet
Its members are disjoint from S.
This is not what it means for B to be disjoint from S. B is disjoint from S means: none of B's members are in S. Or, B's members are distinct from S's members. Or, B intersect S = 0.

Originally Posted by Ackbeet
How is it that I can get away with only checking for existence, and not also checking that the disjointness holds for all of A, B, and C?
I don't know what you mean that the disjointedness holds for all of A, B, and C, but I think this situation illustrates why we need to consider all 2^3 sets (minus the empty set).

In addition to the other thread you linked to, you can see my post #10 of this thread which addresses 3-cycles. There are two proofs given: the first is to show it's a special case of the infinite membership chain, and the second is to construct a set containing those 3 elements and proceed in the style of Wikipedia (which is the same as what Suppes does but with a different format).

13. This is not what it means for B to be disjoint from S. B is disjoint from S means: none of B's members are in S. Or, B's members are distinct from S's members. Or, B intersect S = 0.
Oh, I see what you mean. I meant to say "Its members are not in S." Simpler. More direct.

The set that would fail in the example you gave just now is S = {A, C}.
I think I see what you mean now. But I disagree that creating the power set of the set S of all assemblies is necessary. I take the set S of all assemblies. It has member sets A, B, C, etc. I just take the singleton subsets {A}, {B}, {C}, etc., and check regularity on them. I claim that checking the singleton subsets of S against S will flag the error just as well as if I looked at all the possible subsets. Justification: Suppose we have an inconsistent set of assemblies S. That is, there is a set A in S that has an element B also in S. So, S must look like the following: S = {A,B,...}, and A must look like A = {B,...}. Your algorithm will flag the error when looking at the subset {A,B}. My algorithm will flag the error when looking at A: it will see that B is an element of A and the original all-inclusive S as well.

I think these two approaches are equivalent. I think they will both flag a membership cycle of any length whatever. But I think mine's more efficient.

For flagging a membership cycle of length n, here's where my algorithm would flag it: S = {A,...}, where A in B in C in ... in N in A. Because S contains all assemblies, all the A, B, ... , N are in S. In checking the singleton set N, it would have element A which is also in S. Flagged.

I really think my algorithm in post #16 will work. Have I convinced you?

14. Originally Posted by Ackbeet
... Suppose we have an inconsistent set of assemblies S. That is, there is a set A in S that has an element B also in S.
But this is not inconsistent. There's nothing wrong with

A = {B, C}
B = {D}
C = {}
D = {}
S = {A, B}

Perhaps it will help to think this way.

AoR: Given a non-empty set S, there exists an element x in S that is disjoint from S.

Negation of AoR: Given a non-empty set S, there does not exist an element x in S that is disjoint from S. In other words, for all elements $\displaystyle \displaystyle x_i$ of S, $\displaystyle \displaystyle x_i$ is not disjoint from S. In other words, for all elements $\displaystyle \displaystyle x_i$ of S, $\displaystyle \displaystyle x_i$ intersect S is non-empty.

So the algorithm I gave for checking AoR, appearing as checkAoR(), checks the latter condition. It exhaustively checks all elements of S and exits the method whenever it finds an element of S that is disjoint from S. If it goes through all elements without exiting, that means AoR has failed, and an error is printed.

So, what I'm seeing is this: Suppose we have a finite membership chain

$\displaystyle \displaystyle A_1 \in A_2 \in \dots \in A_n \in A_1$ for any n, including n=1 in which case it's just $\displaystyle \displaystyle A_1 \in A_1$, and require that all the $\displaystyle \displaystyle A_i$ be distinct.

Then the set $\displaystyle \displaystyle S = \{ A_1, \dots , A_n\}$ will fail AoR. But choosing a subset or superset of S will not in general cause AoR to fail.

So say we write T = the set of all assemblies in the database. We can't rule out 1-subsets of T because we might have n = 1. We can't rule out 2-subsets of T because we might have n = 2. Etc. Given any n, there is any combination that might produce an error, whereas all other combinations might produce no error. Therefore using this algorithm, I don't think there's anything more efficient that exhaustive search.

So, I don't see how this algorithm can compete with the efficiency of just checking for membership cycles and flagging an error whenever one is found, by parsing all trees, as suggested earlier on in the thread by Matt Westwood and me. Although with a small enough number of assemblies, efficiency is not needed. Note that what I've been addressing is a database-wide sweep for all inconsistencies. If we know that the database is currently consistent but are checking whether adding or modifying an assembly will break it, there are less things to check, as has been mentioned.

Originally Posted by Ackbeet
I really think my algorithm in post #16 will work. Have I convinced you?
Based on the above, I still think what you wrote won't work.

15. Reply to undefined at Post # 29.

*Light bulb moment here*

I think I finally see what's wrong with my algorithm. The problem with my algorithm isn't that it wouldn't flag an inconsistency. It would. The problem is that it would also flag any assembly that has an assembly as one of its parts! I think this is what you were getting at with your comment about "But this is not inconsistent." It could just be me, but I'm thinking that a 100% false positive rate = useless algorithm. I can be pretty dense at times. I'm no genius!

*sigh* And I'm thinking also that checking AoR for $\displaystyle 2^{(\text{\# assemblies})}$ is going to get rather slow after a while, as you've already mentioned.

So perhaps AoR isn't the most efficient method, after all, for checking consistency. Some sort of tree algorithm does seem the most efficient method. I'll revisit Matt Westwood's ideas and poke around a bit more to see what would be a good, fast algorithm. Stay tuned, if you would, please; I might post a future algorithm and ask whether it will do what I need!

Page 2 of 6 First 123456 Last