Random insomniac thoughts about sudoku, CSPs and SAT

(Writing in English because for some reason (probably the fact that my courses on the subject are in English) my inner monologue is in English on this subject, so…)

This post will probably be fairly obscure to… err… I’d say anyone but me, but that would be kind of pretentious 😉 Let’s just say that if you haven’t heard the acronyms SAT or CSP before, you can probably skip this post, unless you’re very curious. If you have questions, remarks, don’t hesitate. I have no idea what I’m doing either.

Last night I did some Project Euler problems before going to bed, and I inadvertantly clicked on this one: Problem 96 – Devise an algorithm for solving Su Doku puzzles. And while I know that I shouldn’t go to bed with a problem I’m able to easily formulate in my head, I did anyway (because it was late), and an hour or two of mild insomnia followed, because I was thinking of said problem. Typical. Or, as I put it a while ago, « don’t derive and try to sleep ».

This is a collection of random thoughts on the subject; I obviously didn’t give it A LOT of thought, it’s more of a general interrogation post, because I think the exercise in itself isn’t bad. Also, I haven’t hit the litterature at all – so there’s a fair chance I’m only saying stuff that’s either trivial, stupid or wrong 😉

The problem of regular 9×9 sudoku, per se, is « easy ». Well, it’s not « easy » in that I’d probably need some hours of thought and a some hours of sweat implementing it, but I’m pretty confident I could get there, at least for a decent approximation of a decent algorithm. But anything I’m saying from now on doesn’t have anything to do with the question of solving a given 9×9 sudoku in a « reasonable » amount of time – I’m only interested in the « theoretical » problem.

What’s not obvious to me is the « best » way to model and solve the associated CSP or SAT instance. Again, for a « regular » sudoku, one could argue that the size of the problem is constant and that the « best » way to model and solve doesn’t matter much since it will also be constant. However, since I’m (originally) interested in solving the Project Euler question, the size of the constant actually matters to me – if it’s « gazillions », it’s not good enough for me. Moreover, the Sudoku problem for nxn problems is known to be NP-complete (haven’t read that paper yet, intend to). Also note that I’m not that interested in the decision problem (« is there a solution to this instance of sudoku ») but in finding an assignment that actually solves said puzzle.

There are two straightforward ways to model sudoku:

  • as a regular CSP problem, where each variable represents a cell and can take a value between 1 and n.
  • as a SAT problem where we define n boolean variables for each cell, each of which saying « does this cell take this value? »

A priori, both models have their merits; the CSP has the advantage of compactness and (human) readability, and the SAT has the advantage of being boolean (duh). Note that in any case, any nxn problem can be modelled in same way (by expressing the constraints of the grid, which is very regular), and the particular instance of the puzzle can be seen as an initial partial assignment on the problem (which will remove some constraints and some variables in some constraints). I’m not entirely clear on the number of constraints in both cases, I’d need some time with a pen and paper for that. Also, considering a SAT problem, the exact number of variables in a « reasonable » 3SAT reduction would need some work also. Probably not « hard » work, but « careful » work (and I’m not that good at careful 😉 ).

Now if I consider that I have a model, what are my options? (Again, this is a worst-case analysis – there’s a fair chance that most « real-world » puzzles won’t need that kind of machinery, but some instances ARE hard (the problem IS NP-complete). But by the way, what makes an « easy » sudoku puzzle, and is it possible to decide whether a given instance is easy or not? Anyway, for the general case, the three possibilities that I’m thinking of right now are randomized algorithms, so any runtime I’m talking about is expected runtime.

  • Apply PPSZ (beware, it bites) to the (3-)SAT problem. A bound on the runtime here would be 1.308^k, where k is the number of variables that I haven’t computed yet.
  • Use a PPZ-like approach on the CSP instance – consider a random permutation of the variables and treat them in order; pick, for each variable, a value uniformly at random in those that are not forbidden by the current partial assignment.
  • Use a « random downsampling » approach (for each cell, keep two values uniformly at random) and apply PPZ to the resulting instance.

Those two last approaches were part of an exercise in the latest SAT graded sheet I (tried to) solve(d), so I have some idea on how the analysis would go through in that case. It would still need some work, obviously (especially since the problem is slightly different).

Another question is, since the problem is very structured, can I use this structure to my advantage to make the algorithm faster, prove that the general algorithm is faster in this particular setting, or to get an easier analysis?

Still in the « random questions » – sudoku can be seen as the problem of « rainbow coloring » a n-regular hypergraph where each vertex is part of exactly 3 edges. How would previous algorithms behave on a more general case? (non n-regular and/or with an arbitrary number of edges per vertex). This kind of reflexion gives me the intuition that indeed, the structure of the problem can be used to get better algorithms and/or bounds. But I may obviously be wrong.

Last one, and then I’m leaving you 😉 – this is a more general question regarding SAT vs CSP. SAT domains and constraints and stuff can be represented in a hypercube where every possible assignment is a vertex and two vertices are connected iff there differ by exactly one coordinate. The hypercube model is a nice way to look at the SAT problem from a different perspective, and definitely has its uses. Is there any geometrical structure that would be useful in the same way (or in some way) for CSPs and variables whose domain is not reduced to 2 boolean values?