Euler #3: Largest Prime Factor

The prime factors of 13195 are 5, 7, 13 and 29.

What is the largest prime factor of the number 600851475143?

https://projecteuler.net/problem=3

In the last two problems, I was able to use some neat properties of summation to avoid programming. However, when it comes to prime numbers, all bets are off. Otherwise, I could break RSA and become filthy rich. Why are prime numbers so intractable? The simplest answer is that mathematicians haven’t found any pattern to them. Our prime-finding algorithms, such as the Sieve of Eratosthenes, have to trawl through \N one by one.1Not really one by one, but there will be always a N+1 number to check.

The problem touches on one of the most important theorems in number theory: the Fundamental Theorem of Arithmetic. It says that every composite natural number has a unique prime factorization. In other words, take any old whole number. It is always the product of prime numbers, unless that number itself is prime.

Let’s do the example in the problem statement, 13195. We start with our first candidate divisor: 2. Not a clean divisor, so we move on to 3. Nope. Since we’ve ruled out 2, all other even divisors are ruled out, so we can skip 4. Trying 5 yields 2639. Success! We add 5 to our list of prime factors, and start over. 2? Nope, not even. 3? Nope. 4? Nah. 5? No way. 6? Even, so we don’t even try. 7? Yes! We get 377 and add 7 to our list of prime factors. Rinse and repeat.

So what should we do regarding that humongous number in the problem? We need to harness the blistering speed of computers. Generally, we divide by increasing candidate divisors till we happen on a clean divisor. The first clean divisor we encounter will always be prime, since composite divisors are necessarily larger than its constituent primes.

Implementation

The problem is asking for the largest prime factor, not all prime factors, but I suspect we will be doing a lot of prime factorizations in future Project Euler problems. It’s handy to make a reusable function factorize that takes our number n as an argument. We also need a list to store the prime factors as we go. Let’s call this factors. Since we’re starting with 2 as our first candidate divisor, let’s store 2 in divisor. When the function finishes, it’ll return the list of prime factors.

def factorize(n):
    factors = []
    divisor = 2
    return factors

Right now, factorize doesn’t do anything except return an empty list. We need to add logic that checks if divisor divides n cleanly. We do that with the modulo operator, which returns the remainder not the quotient of a division. If it’s a clean division, the modulo will return 0.

if n % divisor == 0:
    factors.append(divisor)
    n = n // divisor
    divisor = 2
elif divisor % 2 == 0:
    divisor += 1
else:
    divisor += 2

A clean division means divisor is prime and it should be added to our factors. If we don’t get a clean division, we increment divisor to the next candidate depending on if it’s odd or even. Now, we just need to add a loop to ensure we try all divisors up to \sqrt{n}. Trying divisors past that point is wasteful. A quick mental example: 9 divides 45 cleanly, giving 5. But to get to 9, we would have tried 5. Thus, it’s only necessary to try divisors up to the square root.

import math

def factorize(n):
    factors = []
    divisor = 2
    while divisor <= math.sqrt(n):
        if n % divisor == 0:
            factors.append(divisor)
            n = n // divisor
            divisor = 2
        elif divisor % 2 == 0:
            divisor += 1
        else:
            divisor += 2
    factors.append(n)
    return factors

If we have exhausted all candidate divisors up to \sqrt{n}, that means the pared down n is prime, so it is added to factors. Since factorize returns a sorted list of prime factors of the original n, we can query just the last element for the solution to Problem 3.

>>> factorize(600851475143)[-1]
6857

Complexity

When writing algorithms, it is paramount to consider its time and space costs. We’re mortal after all. We can’t be waiting till the heat death of the universe for an algorithm to finish. It’s best to be pessimistic and assume the worst case. What if we tried to factorize a very large prime? The algorithm would have to go through \sqrt{n} divisors before it realized n is prime. In computer science, this case is said to have a time complexity of \mathcal{O}(\sqrt{n}), i.e. operating time is proportional to \sqrt{n}.

Now, what if we tried to factorize a very large composite with only two prime factors? In the search for the first prime factor, it’ll take roughly \sqrt{n} operations. In the second round, n will have been pared down to \sqrt{n}. The algorithm will try all divisors up to \sqrt{\sqrt{n}} before realizing the new n is prime. This case’s time complexity is \mathcal{O}(\sqrt{n} + \sqrt{\sqrt{n}}) In big-O notation, we are concerned with the most dominant term. This means we can just drop \sqrt{\sqrt{n}}.

To conclude, this algorithm runs in \mathcal{O}(\sqrt{n}) time, also called polynomial time. There are several hacks to improve the algorithm, but that’ll be explored next time we need a prime factorization algorithm.