1. ## unique substrings

Hi,

well my question is: Is the following problem trivial enough not to be proved and if not then would you suggest proveing it by cases or induction.

Problem

let $\displaystyle$S$$be a string and \displaystyle |S| its size. Let \displaystyle S[i,i+k-1]$$ be a substring of S. Let $\displaystyle$\Sigma$$be the alphabet and \displaystyle |\Sigma|$$ its size. Let $\displaystyle$k \in Z^{+}$$. Then \displaystyle \forall |S| \in Z{+}, |\{S[i,i+k-1] : S[i,i+k-1] \neq S[j,j+k-1], \wedge i,j\in[1,|S|-k+1] \wedge i\neq j\}| \leq |\Sigma|^{k}$$

so waht i want to prove is that no matter how large the string is the number of different k size substrings cannot be larger then the number of possible k size substrings. To me this looks obvious enough to be assumed. but if think it is not then would you proceede by proving cases <,>,= or ....?

thnx

b

2. ## Re: unique substrings

If there were more distinct $k$-size substrings than possible $k$-size substrings, then at least one of these strings is impossible.

Impossibru!

YES! You can assume that!

I'm assuming by "different" you mean "distinct", so for example if S = AABAB, we have two strings S[2,3] and S[4,5] which are "different substrings" but not "distinct" since both are AB.

I think you'll be stuck with that as a rather coarse upper bound, because how strings might be duplicated (via letter repetition) is going to get very computationally difficult as the size of your alphabet goes up.

3. ## Re: unique substrings

The total number of size-$\displaystyle k$ substrings of string $\displaystyle S$ is $\displaystyle \max\{|S|-k+1,0\}$. So,

$\displaystyle \forall |S|\in \Bbb{Z}^+, |\{S[i,i+k-1]: 1\le i\le |S|-k+1 \text{ and }\forall j\in \Bbb{Z}^+ \text{ s.t. }1\le i\le j \le |S|-k+1, S[i,i+k-1] = S[j,j+k-1]\text{ iff }i=j\}| \le \max\{|S|-k+1,0\}$

(This is another way to write the set of all distinct size-$\displaystyle k$ substrings of the string $\displaystyle S$).

The point being, if $\displaystyle |S|-k+1$ is small in comparison to $\displaystyle |\Sigma|^k$, then $\displaystyle |S|-k+1$ is a far better upper bound.

4. ## Re: unique substrings

Originally Posted by SlipEternal
The total number of size-$\displaystyle k$ substrings of string $\displaystyle S$ is $\displaystyle \max\{|S|-k+1,0\}$. So,

$\displaystyle \forall |S|\in \Bbb{Z}^+, |\{S[i,i+k-1]: 1\le i\le |S|-k+1 \text{ and }\forall j\in \Bbb{Z}^+ \text{ s.t. }1\le i\le j \le |S|-k+1, S[i,i+k-1] = S[j,j+k-1]\text{ iff }i=j\}| \le \max\{|S|-k+1,0\}$

(This is another way to write the set of all distinct size-$\displaystyle k$ substrings of the string $\displaystyle S$).

The point being, if $\displaystyle |S|-k+1$ is small in comparison to $\displaystyle |\Sigma|^k$, then $\displaystyle |S|-k+1$ is a far better upper bound.
Especially if by "substring" we mean that sequence and adjacency are preserved (no scrambling or skipping of letters). Indeed, |S| is going to be the controlling factor (longer strings yield more substrings, if $k < |S|$ is fixed).

I do want to point out that counting the "distinct" substrings, as opposed to "any" substrings, is still quite complicated.