Three New Algorithms for Regular Language Enumeration Margareta Ackerman Erkki Makinen University of Waterloo Waterloo, ON University of Tempere Tempere, Finland 0 B 0 A D 1 1 1

C 1 0 E What kind of words does this NFA accepts? 0 B 0 A D 1 1 1 C

1 0 E 0 00 11 000 110 0000 1100 1110 00000 .... Cross-section problem: enumerate all words of length n accepted by the NFA in lexicographic order. 0 B 0 A D 1 1 1 C

1 0 E 0 00 11 000 110 0000 1100 1110 00000 .... Enumeration problem: enumerate the first m words accepted by the NFA in length-lexicographic order. 0 B 0 A D 1 1 1 C

1 0 E 0 00 11 000 110 0000 1100 1110 00000 .... Min-word problem: find the first word of length n accepted by the NFA. Applications Correctness testing, provides evidence that an NFA generates the expected language. An enumeration algorithm can be used to verify whether two NFAs accept the same language (Conway, 1971). A cross-section algorithm can be used to determine whether every word accepted by a given NFA is a power - a string of the from wn for n>1, |w|>0. (Anderson, Rampersad, Santean, and Shallit, 2007) A cross-section algorithm can be used to solve the ksubset of an n-set problem: Enumerate all k-subset of a set in alphabetical order. (Ackerman & Shallit, 2007) Objectives Find algorithms for the three problems that are

Asymptotically efficient in Size of the NFA (s states and d transitions) Output size (t) The length of the words in the cross-section (n) Efficient in practice Previous Work A cross-section algorithm, where finding each consecutive word is super-exponential in the size of the cross-section (Domosi, 1998). A cross-section algorithm that is exponential in n (length of words in the cross-section) is found in the Grail computation package. Breast-First-Search approach Trace all paths of length n in the NFA, storing the paths that end at a final state. O(dn+1), where d is the number of transitions in the NFA and is the alphabet size. Previous Polynomial Algorithms: Makinen, 1997 Dynamic programming solution Min-word O(dn2) Cross-section O(dn2+dt)

Enumeration O(d(e+t)) Quadratic in n e: the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output Previous Polynomial Algorithms: Ackerman and Shallit, 2007 Linear in the length of words in the cross-section Min-word: O(s2.376n) Cross-section: O(s2.376n+dt) Enumeration: O(s2.376c+dt) Linear in n c: the number of cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output Previous Polynomial Algorithms: Ackerman

and Shallit, 2007 The algorithm uses smart breadth first search, following only those paths that lead to a final state. Main idea: compute a look-ahead matrix, used to determine whether there is a path of length i starting at state s and ending at a final state. In practice, Makinens algorithm (slightly modified) is usually more efficient, except on some boundary cases. Contributions Present 3 algorithms for each of the enumeration problems, including: O(dn) algorithm for min-word O(dn+dt) algorithm for cross-section Algorithms with improved practical performance for each of the enumeration problems Contributions: Detailed We present three sets of algorithms 1. AMSorted: - An efficient min-word algorithm, based on Makinens original algorithm.

- A cross-section and enumeration algorithms based on this min-word algorithm. 2. AMBoolean: - A more efficient min-word algorithm, based on minWordAMSorted. - A cross-section and enumeration algorithms based on this min-word algorithm. 3. Intersection-based: - An elegant min-word algorithm. - A cross-section algorithm based on this min-word algorithm. Key ideas behind our first two algorithms - Makinens algorithm uses simple dynamic programming, which is efficient in practice on most NFAs. - The algorithm by Ackerman & Shallit uses smart breadth first search, following only those paths that lead to a final state. - We build on these ideas to yield algorithms that

are more efficient both asymptotically and in practice. Makinens original min-word algorithm A B C 1 2 3 - (3,C) (3,C) 0 1 (2,B)

(1,B) (0,A) (1,B) 2 B 0 A 1 3 C 1 S[i] stores a representation of the minimal word w of length i that appears on a path from S to a final state. Makinens original min-word algorithm A B

C 1 2 3 - (3,C) (3,C) 0 1 (2,B) (1,B) (0,A) (1,B) 2

B 0 A 1 3 C 1 The minimal word of length n can be found by tracing back from the last column of the start state. Makinens original min-word algorithm Initialize the first column For columns i = 2...n For each state S Find S[i] by comparing all words of length i appearing on paths from S to a final state. 2 1

2 3 A - (3,C) (3,C) B C 0 1 (2,B) (0,A) (1,B) (1,B) B 0 A

1 3 1 C Makinens original min-word algorithm Initialize the first column For columns i = 2...n i operations For each state S Find S[i] by comparing all words of length i appearing on paths from S to a final state. 2 1 2 3 A

- (3,C) (3,C) B 0 (2,B) (0,A) C 1 (1,B) (1,B) B 0 A 1 3

1 C Makinens original min-word algorithm Initialize the first column For columns i = 2...n i operations For each state S Find S[i] by comparing all words of length i appearing on paths from S to a final state. Theorem: Makinens original min-word algorithm is O(dn2). New min-word algorithm: MinWordAMSorted Idea: Sort every columns by the words that the entries represent. 1

2 3 A - (3,C) B 0 (2,B) (3,C) 321 (0,A) C 1 (1,B)

(1,B) B 0 A 1 031 3 120 1 C 2 New min-word algorithm: MinWordAMSorted

We define an order on {S[i] : S a state in N}. If A[1]=a and B[1]=b, where a** 1, A[i] = (a, A) and B[i] = (b, B) If a**** B[i]. New min-word algorithm: MinWordAMSorted Initialize the first column For columns i = 2...n For each state S Find S[i] using only column i-1 and the edges leaving S. Sort column i 2 1 2 **

3 A - (3,C) (3,C) B C 0 1 (2,B) (0,A) (1,B) (1,B) B 0 A 1

3 1 C New min-word algorithm: MinWordAMSorted Initialize the first column For columns i = 2...n d operations For each state S Find S[i] using only column i-1 and the edges leaving S. Sort column i s log s operations Theorem: The algorithm minWordAMSorted is O((s log s +d) n). New cross-section algorithm: crossSectionAMSorted

A state S is i-complete if there exists a path of length i from state S to a final state. To enumerate all words of length n: 1. Call minWordAMSorted (create a table) O((s log s +d) n). 2. Perform a smart BFS: O(dt) - Begin at the start state. - Follow only those paths of length n that end at a final state, by using the table to identify i-complete states. Theorem: The algorithm crossSectionAMSorted is O(n (s log s + d) + dn). New enumeration algorithm: enumAMSorted Run the cross-section algorithm until the required number of words are listed, while reusing the table. Theorem: The algorithm enumAMSorted is O(c (s log s + d)+ dt). c: the number of cross-section encountered d: the number of transitions in the NFA t: the number of characters in the output What have we got so far?

New Algorithms Sorted Previous Algorithms Makinen Ackerman & Shallit min-word O((s log s + d)n) O(dn2) O(s2.376n) cross-section O(n (s log s + d)+dt) O(dn2+dt)

O(s2.376n+dt) enumeration O(c (s log s +d) + dt) O(de + dt) O(s2.376c+dt) c: the number of cross-section encountered e: the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output New min-word algorithm: minWordAMBoolean Idea: instead of using a table to find the minimal word, construct a table whose only purpose is to determine i-complete states. Can be done using a similar algorithm to minWordAMSorted, but more efficiently, since there is no need to sort.

New min-word algorithm: minWordAMBoolean A B C 1 2 3 F T T T T T

T T F B 0 A 1 3 C New min-word algorithm: minWordAMBoolean Fill in the first column For i=2 ... n For every state S Determine whether S is i-complete using only the transitions leaving S and column i-1 Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table).

1 2 3 A F T T B C T T T T T

F B 0 A 1 3 C New min-word algorithm: minWordAMBoolean d operations Fill in the first column For i=2 ... n For every state S Determine whether S is i-complete using only the transitions leaving S and column i-1 Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table).

1 2 3 A F T T B C T T T T T

F B 0 A 1 3 C New min-word algorithm: minWordAMBoolean Fill in the first column For i=2 ... n For every state S d operations Determine whether S is i-complete using only the transitions leaving S and column i-1 Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table).

Theorem: The algorithm minWordAMBoolean is O(dn). New cross-section algorithm: crossSectionAMBoolean Extend to a cross-section algorithm using the same approach as the Sorted algorithm. To enumerate all words of length n: Call minWordAMBoolean (create a table) O(dn). Perform a smart BFS: O(dt) - Begin at the start state. - Follow only those paths of length n that end at a final state, by using the table to identify i-complete states. Theorem: The algorithm crossSectionAMBoolean is O(dn+dt). New enumeration algorithm: enumAMBoolean Run the cross-section algorithm until the required number of words are listed, while reusing the table.

Theorem: The algorithm enumAMBoolean is O(de+ dn). e: the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output What have we got so far? New Algorithms Previous Algorithms Sorted Boolean Makinen min-word O((s logs+d)n) O(dn)

O(dn2) O(s2.376n) cross-section O(n (s log s+d)+dt) O(dn+dt) O(dn2+dt) O(s2.376n+dt) enumeration O(c (s log s +d) + dt) O(de+dt) O(de+dt) O(s2.376c+dt)

c: the number of cross-section encountered e: the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output Ackerman & Shallit Intersection-Based Algorithms We present surprisingly elegant min-word and cross-section algorithms that have the asymptotic efficiency of the Boolean-based algorithms. However, these algorithms are not as efficient in practice as the Boolean-based and Sortedbased algorithms. New min-word algorithm: minWordIntersection Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. 1. Let C = N x A 2. Remove all states of C that cannot be

reached from the final states of C using reversed transitions. 3. Starting at the start state, follow the minimal n consecutive transitions to a final state. New min-word algorithm: minWordIntersection Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. 1. Let C = N x A Let n = 2 Automaton A 2. Remove all states of C that cannot be Automaton N reached from the final states of C using 1 reversed transitions. 0 1 B

1 3. Starting at the start state, follow the A minimal n consecutive transitions to a final 0 1 0 1 state. 0 C 0 New min-word algorithm: minWordIntersection Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. 1. Let C = N x A Automaton C 2. Remove all states of C that cannot be Automaton N

reached from the final states of C using 1 reversed transitions. 0 1 B 1 3. Starting at the start state, follow the A minimal n consecutive transitions to a final 0 1 state. 0 C 0 New min-word algorithm: minWordIntersection Let N be the input NFA, and A be the NFA that accepts the language of all words of length n.

1. Let C = N x A 2. Remove all states of C that cannot be reached from the final states of C using reversed transitions. 3. Starting at the start state, follow the 1 minimal n consecutive transitions to a final state. 1 New min-word algorithm: minWordIntersection Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. 1. Let C = N x A 2. Remove all states of C that cannot be1 reached from the final states of C using 1 reverse transitions. 3. Starting at the start state, follow the minimal n consecutive transitions to a final state.

Thus the minimal word of length 2 accepted by N is 11 Asymptotic running time of minWordIntersection 1. Let C = N x A Concatenate n copies of N. 2. Remove all states of C that cannot be reached from the final states of C using reverse transitions. 3. Starting at the start state, Follow the minimal n consecutive transitions to final. Each step is proportional to size of C, which is O(nd). Theorem: The algorithm minWordIntersection is O(dn). New cross-section algorithm: crossSectionIntersection To enumerate all words of length n, perform BFS on C = N x A, and remove all states not reachable from final state removed (using reverse transitions). Since all paths of length n starting at the start state lead to a final state, there is no need to check for i-completness.

Theorem: The algorithm crossSectionIntersection is O(dn+dt). Practical Performance We compared Makinens, Ackerman-Shallit, AMSorted, and AMBoolean, and Intersection-based algorithms. Tested the algorithms on a variety of NFAs: dense, sparse, few and many final states, different alphabet size, worst case for Makinens algorithm, ect Here are the best performing algorithms: Min-word: AMSorted Cross-section: AMBoolean Enumeration: AMBoolean Summary New Algorithms Previous Algorithms min-word O((s logs+d)n) O(dn)

O(dn) O(dn2) Ackerman & Shallit O(s2.376n) cross-section O(n (s log s +d)+dt) O(dn+dt) O(dn+dt) O(dn2+dt) O(s2.376n+dt) enumeration O(c (s log s +d) +

dt) O(de+dt) - O(de+dt) O(s2.376c+dt) Sorted Boolean Intersection c: the number of cross-section encountered e: the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output Makinen

: most efficient in practice Open problems Extending the intersection-based cross-section algorithm to an enumeration algorithm. Lower bounds. Can better results be obtained using a different order? Restricting attention to a smaller family of NFAs.