I enjoy using Hackerrank to sharpen my understanding of fundamental computer science concepts.
I recently solved their “Big Sorting” problem and would like to share my thoughts on it.
I will assume you know understand the problem already. Here is the problem statement in case you’re unfamiliar with it..
The problem seems to be fairly straightforward: sort a list of numbers represented as strings.
It’s easy to write a one liner in Python to accomplish this:
numbers_array = ['9', '1', '3'] sorted(list(map(lambda number: int(number), numbers_array)))
It’s just three easy steps:
- Convert each numeric string to an actual int object
- Create a list with the newly created ints
- Sort the list
This approach seems too easy since the problem only has a ~63% solve rate on Hackerrank. It can’t really be this easy, right?
Unfortunately, this naive approach doesn’t pass several of Hackerrank’s test cases because it’s too slow.
In what cases do you think this approach would be too slow?
It turns out that very large numbers make this approach too slow. I bought one of Hackerrank’s test cases that was failing, and realized that one of the numeric strings in the list of inputs was 988k characters long. That’s a really big number! Thankfully, Python 3’s
int class can handle ints that size, but there’s a big catch: it takes a while to instantiate them. To be precise, it took 5.96 seconds to instantiate the 988k digit long number as an
The script below takes 6.16 seconds to execute, so about 96.8% of the execution time was taken up instantiating that massive
numbers_array = ['9', '1', '3', '9'*998000] sorted(list(map(lambda number: int(number), numbers_array)))
int class just wasn’t created to handle numbers of that length. But the
Decimal class handles it much faster. To be precise,
Decimal('<insert 988k digit long number here>') only takes 0.06 seconds. This is ~98.7% faster than using
So, to solve the “Big Sorting” problem just use
Decimal instead of
from decimal import Decimal numbers_array = ['9', '1', '3', '9'*998000] sorted(list(map(lambda number: Decimal(number), numbers_array)))
The smarter implementation takes 0.07 seconds to run on my CPU.