P vs NP, Practical Guide for Software Engineers

12 min readFeb 27, 2024

One of the key questions in computational complexity is P=NP or P!=NP. This topic was covered in Algorithm MicroMasters. I found it really interesting and so different to anything else that I have encountered.

There was a clear gap between my software engineering experience and computer science theory. In this article I will try to close this gap to aid comprehension for my and hopefully your benefit. I will lead with a story and use language that software engineers are familiar with.

Research

You are a researcher back in the day and you are trying to figure out if the Hamiltonian Path (Ham Path) has a polynomial time algorithm. You have been stuck on this for weeks, so you figure it is time to see if the Ham Path is in something called NP and NP-Hard.

Showing that Ham Path is NP-Hard will show that your problem might not have a polynomial time algorithm if P!=NP. You don’t fully understand why this is the case as your mind is still digesting the research papers that you have read, but you decide to give it a go as everything else leads to a dead end.

Problem

Given a connected graph, find a possible path where each vertex of the graph is visited exactly once.

Brute Force

Your eyes are tired from staring at the Ham Path brute force Python algorithm. You were waiting to be inspired, for something clever to pop into your head, but apart from cleaning up syntax and making it look better, nothing, no big idea.

Your simple algorithm runs depth-first search and if it cannot find a solution it backtracks one vertex at the time and tries other paths from that vertex:

def hamiltonian_bf(adj_graph, source=None, sink=None):
    v_map, e_map = {}, {}
    i, j = 0, 0

    for u, u_edges in adj_graph.items():
        if u not in v_map:
            v_map[u] = i
            i += 1
        for v in u_edges:
            e_map[(u, v)] = j
            j += 1
            if v not in v_map:
                v_map[v] = i
                i += 1

    v_first = source if source is not None else next(iter(adj_graph))
    v_len, e_len = len(v_map), len(e_map)
    start, path = v_first, [v_first]
    u, solved, t = start, False, None
    v_visited = [False] * v_len
    e_visited = [False] * e_len
    counter = 0

    while not solved:
        v_visited[v_map[u]] = True
        u_edges_count = 0

        if sink is not None and u == sink and v_visited[v_map[u]] and counter + 1 == v_len:
            break
        elif sink is not None and u == sink and v_visited[v_map[u]] and counter + 1 != v_len:
            v_visited[v_map[u]] = False
            path.pop()
            u, t = t, path[-2]
            counter -= 1

        for _, v in enumerate(adj_graph[u]):
            u_edges_count += 1

            if not v_visited[v_map[v]] and not e_visited[e_map[(u, v)]] and counter + 1 == v_len:
                path.append(v)
                solved = True
                break
            elif v_first == v and not e_visited[e_map[(u, v)]] and counter + 1 == v_len:
                solved = True
                break
            elif not v_visited[v_map[v]] and not e_visited[e_map[(u, v)]]:
                e_visited[e_map[(u, v)]] = True
                t, u = u, v
                counter += 1
                path.append(v)
                break

            if u_edges_count == len(adj_graph[u]):
                for _v in adj_graph[u]:
                    e_visited[e_map[(u, _v)]] = False
                v_visited[v_map[u]] = False
                path.pop()
                if len(path) > 1:
                    u, t = t, path[-2]
                elif len(path) == 1:
                    u, t = t, None
                counter -= 1
                break

    return path

#problem instance represents figure 1
problem_instance = {
    0: [1, 2, 3],
    1: [0, 2, 5],
    2: [0, 1, 3, 4],
    3: [0, 2, 4, 5],
    4: [1, 2, 3, 5],
    5: [3, 4]
}
solution = hamiltonian_bf(problem_instance)
print(solution)

#output
[0, 1, 2, 4, 5, 3]

Non-deterministic Polynomial (NP)

Most of us are used to writing deterministic algorithms, “if this then that, loop around, if this then that again” and a lot of our algorithms run in polynomial time, this is what makes them practically useful. Let’s call these type of algorithms deterministic polynomial or DP for short, you will see why this is important shortly.

Why is it called NP?

Well, what if there was a “magic machine” or “lucky machine” that could always guess the right way towards a solution? It would solve certain problems that seem to be solvable only in superpolynomial time in polynomial time. Imagine you had this machine, if it was to do a jigsaw puzzle it would find the piece it needed next for placement immediately. If this machine was to traverse the graph in figure 1, it would know exactly which edge to take at each step, even if there were millions of edges, it would not get lost.

This theoretical magic machine is called a non-deterministic machine. This machine would solve superpolynomial “hard” problems in polynomial time, making them “easy”. This is the place of origin for the non-deterministic polynomial, NP, and why it has such a strange name. It would feel more intuitive if P was called DP and the problem was stated as DP vs NP instead.

Most scientists conjecture that “magic/lucky machine” does not exist and therefore P!=NP.

How is NP used in practice?

In practice NP is used to show how hard your problem is and it has nothing to do with the “magic machine”. This “magic machine” became a demarcation. If you can show that your problem lives in NP, then only a “magic machine” can solve it easily.

Here is the criteria to demonstrate that your problem is in NP space:

Answers can be checked in deterministic polynomial time.
Output is TRUE or FALSE. Your problem is a decision problem.
SAT (or other NP-Complete problems) can be reduced to your problem in deterministic polynomial time.

For your problem to be in NP it needs to fulfil 1 and 2. For it to be in NP-Hard it needs fulfil 3 only. To be NP-Complete it needs to be both NP and NP-Hard.

In the next few sections of this article you will continue your quest as a researcher to show (informally) that Ham Path is NP, NP-Hard and therefore NP-Complete.

NP

You quickly realise that you can write code that can check Ham Path answer in polynomial time, this would fulfil criteria 1 and 2:

def hamiltonian_verify_in_poly_time(graph, path):
    if not path:
        return False  # Empty path is not valid

    # Check if the path is a permutation of the vertices
    num_vertices = len(graph)
    if set(path) != set(range(num_vertices)):
        return False  # Path does not include all vertices

    # Check if the edges in the path are valid in the graph
    for i in range(len(path) - 1):
        current_vertex = path[i]
        next_vertex = path[i + 1]
        if next_vertex not in graph[current_vertex]:
            return False  # Invalid edge in the path

    return True  # Valid Hamiltonian path

solution = [0, 1, 2, 4, 5, 3]
problem_instance = {
    0: [1, 2, 3],
    1: [0, 2, 5],
    2: [0, 1, 3, 4],
    3: [0, 2, 4, 5],
    4: [1, 2, 3, 5],
    5: [3, 4]
}
verified = hamiltonian_verify_in_poly_time(problem_instance, solution)
print(verified)

True

Great, you have just shown that your problem is in NP!

NP-Hard

Now you want to see if your problem is NP-Hard. Before you can do this you need to understand: 1. What does it mean to reduce 3SAT to Ham Path and 2. Why doing this shows that your problem is at least as hard as the problem you have reduced it from.

Reductions

At this point, you are gently spinning on your chair with your head resting on the back, your eyes staring at the ceiling, you are thinking about reductions. At that moment you understand what reduction really is for your particular problem:

“Take a satisfiability problem 3SAT and find a way to construct a graph for your Hamiltonian Path so that a valid path through this graph would represent a valid satisfiability (answer) for the 3SAT expression”

This “construct” verb is called a “gadget” ChatGPT 3.5 defines it as:

In the reduction from one problem to another, gadgets are used to encode the elements or constraints of the original problem instance in such a way that a solution to the modified problem (usually represented by the constructed graph or structure) can be translated back into a solution for the original problem.
Gadgets are often carefully designed to ensure that the reduction preserves the computational complexity properties of the original problems, allowing us to prove that if one problem is computationally difficult (e.g., NP-complete), then so is the other.

A bit of time goes by and you have a big idea, you finally figure out how to construct a gadget in the graph that would encode 3SAT complexity and allow Ham Path to solve it.

Here is how a Ham Path encoding for expression (x1 v x2 v ¬x3)^(¬x2 v x3 v x4)^(x1 v ¬x2 v x4) would look like:

Figure 2. 3SAT reduction to Hamiltonian Path. Taken from here. Ignore t to s edge, this is not required.

To understand above check out this slide deck or watch this video.

You are sitting there, amazed by this concept. If Ham Path algorithm will find a path in that graph that means 3SAT expression (x1 v x2 v ¬x3)^(¬x2 v x3 v x4)^(x1 v ¬x2 v x4) can be satisfied.

What this proves

As a researcher you now have something to show your supervisor. By creating this reduction in polynomial time you shown that if Ham Path could be solved in polynomial time you would then solve 3SAT in polynomial time. Meaning that Ham Path is at least as hard as 3SAT.

At the back of your mind you are wondering, is the reverse true? Is it possible to reduce Ham Path to 3SAT? So if I had solved 3SAT I would have solved Ham Path? You make a note and move on to something more within your immediate grasp.

3SAT to Hamiltonian Path

Before you show above gadget to your supervisor you decide to write a reduction algorithm based on the figure 2:

def SAT_to_HamPath(expression):
    vars = list(set([abs(item) for sub_list in expression for item in sub_list]))
    n_vars = len(vars)
    n_clauses = len(expression)
    n_clauses_gadget = n_clauses * 2
    adj = {}

    #initial setup
    for var in range(0, n_vars):
        for clause_indx in range(0, n_clauses_gadget):
            idx = (var, clause_indx)
            adj[idx] = []
            #add an edge to the next vertex in the line
            if clause_indx+1 < n_clauses_gadget:
              adj[idx].append((var, clause_indx+1))
            #add an edge to the previous vertex in the line
            if clause_indx-1 >= 0:
               adj[idx].append((var, clause_indx-1)) 
        if var+1 < n_vars:
           #add edge to the first vertex and the last vertex below including first and the last
           adj[(var, 0)].append((var+1, 0)) 
           adj[(var, 0)].append((var+1, n_clauses_gadget-1)) 
           adj[(var, n_clauses_gadget-1)].append((var+1, 0)) 
           adj[(var, n_clauses_gadget-1)].append((var+1, n_clauses_gadget-1)) 

    adj['s'] = [(0,0),(0, n_clauses_gadget-1)]
    adj[(n_vars-1,0)].append('t')
    adj[(n_vars-1, n_clauses_gadget-1)].append('t')

    #adding gadgets 
    for i, clause in enumerate(expression):
       adj[(i)] = []
       for literal in clause:
          var = abs(literal)-1
          #for positive literal move in positive direction
          if literal > 0:
             adj[(var, i*2)].append((i))
             adj[(i)].append((var, i*2+1))
          #for negative literal move in negative direction
          else:
            adj[(var, i*2+1)].append((i)) 
            adj[(i)].append((var, i*2))
    
    return adj

#(x1 v x2 v ¬x3)^(¬x2 v x3 v x4)^(x1 v ¬x2 v x4)
expression = [[1, 2, -3], [-2, 3, 4], [1, -2, 4]]
graph = SAT_to_HamPath(expression)
print(graph)

{(0, 0): [(0, 1), (1, 0), (1, 5), 0], (0, 1): [(0, 2), (0, 0)], (0, 2): [(0, 3), (0, 1)], (0, 3): 
.... (Redacted for brevity)

3SAT to Ham Path reduction converts the satisfiability expression to a graph. You then take the output and give it to your hamiltonian_bf algorithm (see above) to solve:

path = hamiltonian_bf(graph, 's', 't')
print(path)

['s', 
  (0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), #Going forward = True
  (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), #Going forward = True
  (2, 5), (2, 4), (2, 3), (2, 2), (2, 1), 0, (2, 0), #Going forward = False
  (3, 0), (3, 1), (3, 2), 1, (3, 3), (3, 4), 2, (3, 5), #Going forward = True
 't']

You take the output and check to see if expression is satisfiable:

x1=True, x2=True, x3=False, x4=True
(x1 v x2 v ¬x3)^(¬x2 v x3 v x4)^(x1 v ¬x2 v x4) 
(True v True v True)^(False v False v True)^(True v False v True)

Expression is satisfiable, you have shown that Ham Path is in NP and it is NP-Hard, meaning it is NP-Complete.

Reverse Reduction

You have decided to reduce the Hamiltonian Path to SAT (CNF-SAT), as you just could not let this go.

Gadget construction

After some mental gymnastics, approach you have gone for is to represent your graph as a vertex matrix where:

Each column represents the same vertex. Only one vertex can be picked here.
Each row represents an edge path with a selected vertex. Only one vertex can be picked here.
Red connections represent illegal selections as there are no edges between these vertices.

Figure 3. Vertex matrix with illegal edges.

Check out full formal explanation here.

Hamiltonian Path to CNF-SAT

Now that the reduction method is known, it will be possible to take advantage of SAT Solvers. SAT Solvers are great in practice, they can solve problems with a lot of variables and complexity.

You write the following algorithm to reduce your graph to a satisfiability expression:

import numpy as np
import functools
from itertools import combinations

def HamPath_to_SAT(n, edges):
    edge_set = set()
    for u, v in edges:
        if (v,u) not in edge_set and (u,v) not in edge_set:
            edge_set.add((v,u))
            edge_set.add((u,v))

    #0. Create matrix that will represent different possible permutations 
    # Order does not matter
    Xij = np.arange(1,(n**2)+1).reshape((n,n))
    clauses = []

    for i in range(n):
        #1. At least one vertex must be selected per row
        clauses.append(Xij[i])

        #2. Only one vertex can be selected per row
        for combination in combinations(Xij[i, :], 2):
            clauses.append([-combination[0], -combination[1]])

    for i in range(n):
        #3. At least one vertex must be selected in the column 
        clauses.append(Xij[:, i])

        #4. Only one vertex must be selected in the column
        for combination in combinations(Xij[:, i], 2):
            clauses.append([-combination[0], -combination[1]])

    #5. Nonadjacent i and j cannot be adjacent in the path.
    for u, v in combinations(np.arange(1,n+1), 2):
        if (u,v) not in edge_set:
            for i in range(n-1):
                clauses.append([-((i*n)+u), -(((i+1)*n)+v)])
                clauses.append([-(((i+1)*n)+u), -((i*n)+v)])

    SAT = []
    for clause in clauses:
        for term in clause:
            SAT.append(str(term))
        SAT.append('\n')
    return functools.reduce(lambda sum, term: sum+term+' ' if term != '\n' else sum+term, SAT, '')

#problem instance represents figure 1, this time it is using list 
#of edges instead of adjacency list
problem_instance_edges = [(0, 1), (0, 2), (0, 3), (1, 2), (1, 5), (2, 3), (2, 4), (3, 4), (3, 5), (4, 1), (4, 5)]
number_of_vertices = 5
CNFSAT = HamPath_to_SAT(number_of_vertices, problem_instance_edges)
print(CNFSAT)

1 2 3 4 5 
-1 -2 
-6 -8 
-6 -9 
.... (Redacted for brevity)

Running above full CNF-SAT output through the SAT Solver can provide the following response:

SATISFIABLE 
1 7 13 19 25 -2 -3 -4 -5 -6 -11 -16 -21 -8 -9 -10 -12 -17 -22 -15 -14 -18 -23 -20 -24

This means that if 1, 7, 13, 19 and 25 are set to 1 and everything else is set to 0 path will look like this:

Which translates to 0, 1, 2, 3, 4, 5. By cross-referencing this against figure 1 you will see that this is a valid path.

What this proves

As far as I know, reducing from problem A to NP-Complete problem does not prove anything concrete, please let me know if this is incorrect.

From a practical perspective SAT solvers can solve real-world problems faster than traditional brute force algorithms as they use clever heuristics, you could think of them as almost a “magic machine”.

If I was to speculate, I would say that a SAT Solver would solve a long real-world SAT expression “significantly” faster than my hamiltonian_bf algorithm. I need to run some experiments to show that, but that is a whole different article.

Conclusion

There are important problems that as far as we know (if P!=NP) can only be solved in superpolynomial time.

If you are finding it difficult to find a polynomial time algorithm for your problem, you can check that it is not your cognitive limitation by reducing (constructing gadgets) NP-Complete problem to your problem. By doing so you will show that your problem is at least as hard as NP-Complete problem and you can label your problem as NP-Hard.

If answers for your problem can be checked in polynomial time, your problem is in NP. If your problem is in NP and NP-Hard, your problem is NP-Complete.

Reducing NP-Complete problem to your problem is mostly useful to show that it is hard, apart from that I am not sure how practical that reduction is.

On the other hand, it can be more practical to reduce your problem to SAT instead as you will be able to use SAT Solvers, but remember that this reduction does not show that your problem is NP-Hard.

P vs NP, Practical Guide for Software Engineers

Research

Problem

Brute Force

Non-deterministic Polynomial (NP)

Why is it called NP?

How is NP used in practice?

NP

NP-Hard

Reductions

What this proves

3SAT to Hamiltonian Path

Reverse Reduction

Gadget construction

Hamiltonian Path to CNF-SAT

What this proves

Conclusion

Written by Zan Kavtaskin