Enumeration Complexity and Regularization Algorithms

Florent Capelli
Université d’Artois
Yann Strozecki
Université de Versailles (UVSQ)

December 14, 2023

Complexity Days 2023

Enumeration and Complexity

Enumeration Problem

Let A(x) be a set.

EnumA is the problem of outputting A(x) given x.

Examples:

  • Given a database D and a query Q, outputs Q(D).
  • Given a graph G, outputs all vertex covers of G.
  • Given a CNF formula F, outputs every solution of F.

Reasonable problems

In this talk, A is an NP-predicate, that is:

  • y ∈ A(x) can be tested in polynomial time.
  • y is of size polynomial in the size of x.


Previous examples have this property.

Complexity

How to measure the complexity of an algorithm solving an enumeration problem?

Total Time

Total time needed to output every solution.

There can be exponentially many.

EnumA is in OutputP if it can be solved in time polynomial in:

  • |x|
  • and |A(x)|

Delay

Total time is not always satisfactory:

  • Process solutions in a stream.
  • Only peek at the solution set.
  • No repetition.


Delay: the longest time one has to wait between the output of two solutions.

Example:

Enumerate (a+b)n:

  • Method 1: Generates every words of length k inductively up to length n and output them.
  • Method 2: Start from an, output it and take next word (using Gray Code for example).


Both have linear total time but delay in Method 1 is exponential.

Holy DelayP Grail

One focus in enumeration complexity has been to design algorithms with polynomial delay.

  • Maximal Independant Set of a graph [@tiernan1970efficient]
  • Constant delay (in data) enumeration of acyclic conjunctive queries [@Bagan09]

(Unpopular?) Opinion

Bounding the delay is not necessarily what we want.

We want guarantees: on input of size n, the algorithm outputs at least t solutions after computing for a time f(t,n).

Polynomial Incremental Time Hierarchy

IncP : Problems solved by an algorithm producing t solutions in time p(t,n), with p a polynomial.

TFNP ≠ FP ⇔ IncP ⊊ OutputP


IncPa: Problems solved by an algorithm producing t solutions in time tap(n), with p a polynomial.

If ETH holds, for a < b, IncPa ⊊ IncPb

Linear Incremental Time

For every t, after t ⋅ p(n) steps, the algorithm has output at least t solutions.

We say that p(n) is the amortized delay.

When p(n) is a polynomial, the problem is in IncP1.

DelayP vs IncP1

Clearly DelayP ⊆ IncP1: after delay × t, at least t solutions output.

For the other way around: 2n delay but amortized delay of 2.

Buffer Based Regularization

Naive Regularization

Given an enumerator E with amortized delay d, we obtain an algorithm with delay O(d) using a queue to delay the output:

step = 0
q = Queue()
while(E is not finished):
    move(E)
    step += 1
    if E outputs y:
        q.add(y)
    if step == d: 
    # q is pulled every d steps, 
    # this ensures the queue is not empty
        output(q.pull())
        step=0

IncP1 = DelayP

… but need to know the amortized delay.

Given:

  • an enumerator E in amortized delay d(n) as oracle
  • an input x of size n

There is no RAM which enumerates the elements of E(x) in delay O(d(n)).

Unknown Delay

From unknown amortized delay d to delay O(dlog(d)2):

step = 0, t = 0, S = 0, d = 4
q = Queue() 
while(E is not finished):
    move(E)
    step += 1
    t += 1
    d = max(d,t/S+1)
    # d : current value of the amortized delay 
    if E outputs y:
        q.add(y)
        S += 1
    if step >= dlog(d)^2: 
        output(q.pull())
        step=0

IncP1 = DelayP

… but the memory may blow up.

When the naive algorithm reaches the solution 2^n, it has output 2^{n}/2 solutions but stored 2^n of them.

Geometric Regularization

Main contribution

IncP1 = DelayP

with polynomial memory blow up.

Details

For every enumerator E with N solutions:

  • amortized delay d,
  • space s,
  • total time T

there exists an enumerator with

  • delay O(dlog(N))
  • space O(slog(N))
  • total time O(T)

Geometric Regularization

  • Maintain l + 1 = ⌈log (N)⌉ simulations E0, …, El of E
  • Ei outputs solutions for steps in [2id; 2i + 1d[.
  • Ei + 1 “moves faster” than Ei.

Faster how?

  • Ei moves by at most 2d steps.
  • If Ei finds a solution in its zone, outputs it and proceed back with El.
  • Otherwise, proceed with Ei − 1.
  • Stops when E0 is out of its zone.

Key Lemmas

  1. If E0, …, Ei have output k solutions, Ei + 1 has moved by at least 2dk steps.
  2. There are at least 2i solutions in [0;2id].

When E0 is finished, Ei has moved by at least 2i + 1d steps: it has explored all its zone.

Delay?

Between two outputs: at most l × 2d simulated steps.

Cost of simulation: depends on the model of computation (here RAM with uniform cost model).

  • “Easy”: known bounds on delay, space and number of solutions used by the algorithm.

Keeping track of the number of steps simulated.

  • Works without prior knowledge of space and number of solutions (slight overhead).

Allocating the memory for new simulations.

Additional Limit

Two hard limits:

  • Prior knowledge of the amortized delay
  • The order of the solutions is changed

Given:

  • an enumerator E with polynomial amortized delay as oracle
  • an input x


There is no RAM which enumerates the elements of E(x) in the same order as E in polynomial delay and polynomial space.

Take away

Space is as important as time.


The right notion is amortized delay not delay.


⚠Do not use geometric regularization!

Thanks you for your attention!