Enumeration Complexity and Regularization Algorithms

Florent Capelli

Université d’Artois

Yann Strozecki

Université de Versailles (UVSQ)

December 14, 2023

Complexity Days 2023

Enumeration and Complexity

Enumeration Problem

Let A(x) be a set.

EnumA is the problem of outputting A(x) given x.

Examples:

Given a database D and a query Q, outputs Q(D).
Given a graph G, outputs all vertex covers of G.
Given a CNF formula F, outputs every solution of F.

Reasonable problems

In this talk, A is an NP-predicate, that is:

y ∈ A(x) can be tested in polynomial time.
y is of size polynomial in the size of x.

Previous examples have this property.

Complexity

How to measure the complexity of an algorithm solving an enumeration problem?

Total Time

Total time needed to output every solution.

There can be exponentially many.

EnumA is in OutputP if it can be solved in time polynomial in:

|x|
and |A(x)|

Delay

Total time is not always satisfactory:

Process solutions in a stream.
Only peek at the solution set.
No repetition.

Delay: the longest time one has to wait between the output of two solutions.

Example:

Enumerate (a+b)ⁿ:

Method 1: Generates every words of length k inductively up to length n and output them.
Method 2: Start from aⁿ, output it and take next word (using Gray Code for example).

Both have linear total time but delay in Method 1 is exponential.

Holy DelayP Grail

One focus in enumeration complexity has been to design algorithms with polynomial delay.

Maximal Independant Set of a graph [@tiernan1970efficient]
Constant delay (in data) enumeration of acyclic conjunctive queries [@Bagan09]

(Unpopular?) Opinion

Bounding the delay is not necessarily what we want.

We want guarantees: on input of size n, the algorithm outputs at least t solutions after computing for a time f(t,n).

Polynomial Incremental Time Hierarchy

IncP : Problems solved by an algorithm producing t solutions in time p(t,n), with p a polynomial.

TFNP ≠ FP ⇔ IncP ⊊ OutputP

IncP_a: Problems solved by an algorithm producing t solutions in time t^ap(n), with p a polynomial.

If ETH holds, for a < b, IncP_a ⊊ IncP_b

Linear Incremental Time

For every t, after t ⋅ p(n) steps, the algorithm has output at least t solutions.

We say that p(n) is the amortized delay.

When p(n) is a polynomial, the problem is in IncP₁.

DelayP vs IncP₁

Clearly DelayP ⊆ IncP₁: after delay × t, at least t solutions output.

For the other way around: 2ⁿ delay but amortized delay of 2.

Buffer Based Regularization

Naive Regularization

Given an enumerator E with amortized delay d, we obtain an algorithm with delay O(d) using a queue to delay the output:

step = 0
q = Queue()
while(E is not finished):
    move(E)
    step += 1
    if E outputs y:
        q.add(y)
    if step == d: 
    # q is pulled every d steps, 
    # this ensures the queue is not empty
        output(q.pull())
        step=0

IncP₁ = DelayP

… but need to know the amortized delay.

Given:

an enumerator E in amortized delay d(n) as oracle
an input x of size n

There is no RAM which enumerates the elements of E(x) in delay O(d(n)).

Unknown Delay

From unknown amortized delay d to delay O(dlog(d)²):

step = 0, t = 0, S = 0, d = 4
q = Queue() 
while(E is not finished):
    move(E)
    step += 1
    t += 1
    d = max(d,t/S+1)
    # d : current value of the amortized delay 
    if E outputs y:
        q.add(y)
        S += 1
    if step >= dlog(d)^2: 
        output(q.pull())
        step=0

IncP₁ = DelayP

… but the memory may blow up.

When the naive algorithm reaches the solution 2^n, it has output 2^{n}/2 solutions but stored 2^n of them. — When the naive algorithm reaches the solution 2ⁿ, it has output 2ⁿ/2 solutions but stored 2ⁿ of them.

Geometric Regularization

Main contribution

IncP₁ = DelayP

with polynomial memory blow up.

Details

For every enumerator E with N solutions:

amortized delay d,
space s,
total time T

there exists an enumerator with

delay O(dlog(N))
space O(slog(N))
total time O(T)

Geometric Regularization

Maintain l + 1 = ⌈log (N)⌉ simulations E₀, …, E_l of E
E_i outputs solutions for steps in [2ⁱd; 2^i + 1d[.
E_i + 1 “moves faster” than E_i.

Faster how?

E_i moves by at most 2d steps.
If E_i finds a solution in its zone, outputs it and proceed back with E_l.
Otherwise, proceed with E_i − 1.
Stops when E₀ is out of its zone.

Key Lemmas

If E₀, …, E_i have output k solutions, E_i + 1 has moved by at least 2dk steps.
There are at least 2ⁱ solutions in [0;2ⁱd].

When E₀ is finished, E_i has moved by at least 2^i + 1d steps: it has explored all its zone.

Delay?

Between two outputs: at most l × 2d simulated steps.

Cost of simulation: depends on the model of computation (here RAM with uniform cost model).

“Easy”: known bounds on delay, space and number of solutions used by the algorithm.

→ Keeping track of the number of steps simulated.

Works without prior knowledge of space and number of solutions (slight overhead).

→ Allocating the memory for new simulations.

Additional Limit

Two hard limits:

Prior knowledge of the amortized delay
The order of the solutions is changed

Given:

an enumerator E with polynomial amortized delay as oracle
an input x

There is no RAM which enumerates the elements of E(x) in the same order as E in polynomial delay and polynomial space.

Take away

Space is as important as time.

The right notion is amortized delay not delay.

⚠Do not use geometric regularization!

Enumeration Complexity and Regularization Algorithms

Enumeration and Complexity

Enumeration Problem

Reasonable problems

Complexity

Total Time

Delay

Example:

Holy DelayP Grail

(Unpopular?) Opinion

Polynomial Incremental Time Hierarchy

Linear Incremental Time

DelayP vs IncP1

Buffer Based Regularization

Naive Regularization

IncP1 = DelayP

Unknown Delay

IncP1 = DelayP

Geometric Regularization

Main contribution

Details

Geometric Regularization

Faster how?

Key Lemmas

Delay?

Additional Limit

Take away

Thanks you for your attention!

DelayP vs IncP₁

IncP₁ = DelayP

IncP₁ = DelayP