Enumeration Complexity and Regularization Algorithms

14 février, 2023

Complexity Days 2023

Enumeration and Complexity

Enumeration Problem

Let A(x) be a set.

EnumA is the problem of outputting A(x) given x.

Examples:

  • Given a database D and a query Q, outputs Q(D).
  • Given a graph G, outputs all vertex covers of G.
  • Given a CNF formula F, outputs every solution of F.

Reasonable problems

In this talk, A is an NP-predicate, that is:

  • y ∈ A(x) can be tested in polynomial time.
  • y is of size polynomial in the size of x.

Previous examples have this property.

Complexity

How to measure the complexity of an algorithm solving an enumeration problem?

Total Time

Total time needed to output every solution.

There can be exponentially many.

EnumA is in OutputP if it can be solved in time polynomial in:

  • |x|
  • and |A(x)|

Delay

Total time is not always satisfactory:

  • Process solutions in a stream.
  • Only peek at the solution set.
  • No repetition.

Delay: the longest time one has to wait between the output of two solutions.

Example:

Enumerate (a+b)n:

  • Method 1: Generates every words of length k inductively up to length n and output them.
  • Method 2: Start from an, output it and take next word (using Gray Code for example).

Both have linear total time but delay in Method 1 is exponential.

Holy DelayP Grail

One focus in enumeration complexity has been to design algorithms with polynomial delay.

  • Maximal Independant Set of a graph [@tiernan1970efficient]
  • Constant delay (in data) enumeration of acyclic conjunctive queries [@Bagan09]

(Unpopular?) Opinion

Bounding the delay is not necessarily what we want.

We want guarantees: by waiting f(t,n), t solutions have been output.

Linear Incremental Time

IncP1

For every t, after t ⋅ p(n) steps, the algorithm has output at least t solutions.

We say that p(n) is the incremental delay.

DelayP vs IncP1

Clearly DelayP ⊆ IncP1: after delay × t, at least t solutions output.

For the other way around: 2n delay but incremental delay of 2.

Naive Regularization

Given an IncP1-enumerator E with incremental delay d, one can regularize the delay using a queue to delay the output:

step = 0
q = Queue()
while(E is not finished):
    move(E)
    step += 1
    if E outputs y:
        q.add(y)
    if step == d: 
    # q is pulled every d steps, 
    # this ensures the queue is not empty
        output(q.pull())
        step=0

IncP1 = DelayP

… but need to know the incremental delay.

Given:

  • an IncP1-enumerator E in incremental delay p(n) as oracle
  • an input x of size n

There is no RAM which enumerates the elements of E(x) in delay (|E(x)|−2)p(n).

Unknown Delay

From unknown incremental delay d to delay O(d1 + ϵ):

step = 0, t = 0, S = 0
q = Queue() 
while(E is not finished):
    move(E)
    step += 1
    t += 1
    if E outputs y:
        q.add(y)
        S += 1
    if step >= C(t/S)^(1 + epsilon): 
    # C = 1/(1-2^(-epsilon))
        output(q.pull())
        step=0

IncP1 = DelayP

… but the memory may blow up.

When the naive algorithm reaches the solution 2^n, it has output 2^{n}/2 solutions but stored 2^n of them.

Geometric Amortization

Main contribution

IncP1 = DelayP

with polynomial memory blow up.

Details

For every IncP1-enumerator E with N solutions:

  • incremental delay d,
  • space s,
  • total time T

there exists a DelayP-enumerator with

  • delay O(dlog(N))
  • space O(slog(N))
  • total time O(T)

Geometric Amortization

  • Maintain l + 1 = ⌈log (N)⌉ simulations E0, …, El of E
  • Ei outputs solutions for steps in [2id; 2i + 1d[.
  • Ei + 1 “moves faster” than Ei.

Faster how?

  • Ei moves by at most 2d steps.
  • If Ei finds a solution in its zone, outputs it and proceed back with El.
  • Otherwise, proceed with Ei − 1.
  • Stops when E0 is out of its zone.

Key Lemmas

  1. If E0, …, Ei have output k solutions, Ei + 1 has moved by at least 2dk steps.
  2. There are at least 2i solutions in [0;2id].

When E0 is finished, Ei has moved by at least 2i + 1d steps: it has explored all its zone.

Delay?

Between two outputs: at most l × 2d simulated steps.

Cost of simulation: depends on the model of computation (here RAM with uniform cost model).

  • “Easy”: known bounds on delay, space and number of solutions used by the algorithm.

Keeping track of the number of steps simulated.

  • Works without prior knowledge of space and number of solutions (slight overhead).

Allocating the memory for new simulations.

Additional Limit

Two hard limits:

  • Prior knowledge of the incremental delay
  • The order of the solutions is changed

Given:

  • an IncP1-enumerator E as oracle
  • an input x

There is no RAM which enumerates the elements of E(x) in the same order as E in polynomial delay and polynomial space.

Generalizations and Applications

Incremental time

The approach generalizes to collapse the following classes:

  • Usual k-incremental time: the delay between solution i and i + 1 is O(poly(n)ik).
  • Relaxed (k+1)-incremental time: for every i, at least i solutions have been output after time poly(n)ik + 1.

Easier proofs for DelayP

… in theory.

IncP1 requirements are weaker than DelayP.

In practice: we have few examples where it helps, any idea?

Applications

Branch and bound based enumeration algorithms (flashlight) with average delay d can be modified to:

  • to have incremental delay d (naive amortization)
  • to have delay dlog (N) (geometric amortization)

Enumeration of models of a DNF in delay n2m1 − log3(2).

Enumeration of k-colorable graphs in polynomial time and polynomial space.

Take away

Space is as important as time.

The right notion is incremental delay not delay.

⚠Do not use geometric amortization!

Thanks you for your attention!

Bibliography