PortiBlog

A beginner's guide to binary search

5 november 2018

Searching. One of the most crucial activities performed on datasets. Think of a world with lots of data, but no possibility to filter relevant info for yourself. Sounds pretty stone-age right? Thankfully we have lots of options to perform the craziest search queries on huge datasets. These options are available to us for quite some time, and we got used to perceiving lots and lots of data. In this digital age we tend to thrive for the most efficient ways to gather data, but besides that: we want it fast. So how do we search through data in a quick way? In this technical blogpost I will shine my light on one of the most underrated searching methods: Binary searching. It is one of the quickest ways to retrieve data, but it has its downsides.

Binary searching - The basics

As the name gives away binary searching is a about 0 and 1: yes or no, true or false. This method doesn't aim to look for an array item by looping through an array and waiting for it to bounce back a result. Instead it uses something what looks like a "warmer, colder"-kind of approach. But then without the "colder". Imagine playing a game of hide and seek and you get to ask the person that is hiding to shout their name every 5 seconds. You'll get their location in no time!
Binary searching loops through an array by looking for an item and asking the array if it is nearing it. To summarize the method in one sentence:

"Search an array by splitting it in half - each iteration."

Let's say we have an array of 10 integers and we're looking for the number 2. Normally we would have to loop through the entire array until the iteration finds '2'. For this number it will take 3 iterations to get to number 2 (3rd position in the array) but it could also take 11 loops if the number is 10. Binary searching tries to optimize the efficiency of searching this array. Instead of looping through the entire array it splits the array in half each time. It does this by determining a minimum and a maximum. Both variables will change each iteration until the number is just between the two, at which the number has been found.

Binary searching - Code example

In the below example I have created a binary searching method which retrieves an array of integers and a number that it needs to find. It defines a leftIndex (the index of the minimum position it needs to look at) and a maxIndex (which is always the length of an array minus one). It will iterate with the condition that the minimum index plus one needs to be smaller than the maxIndex. This will prevent us from looping outside of the array (in case we're looking for number 11). Inside the loop we define between what positions we need to check. Then we split that up by dividing by 2. Finally we will define the current position we're looking at which is the minimum plus the (max) position of the divided variable (lookTill). We check its value with the given number and decide if it is bigger than or smaller than the current value. Inside the if else statement we basically tell our method to move to the left or to the right by determining the minimum and maximum look up positions.
However, if the current number is the number we're looking for there is no need to iterate any further!

private static bool FindNumberNormallyInArrayBinaryStyle(int number)
{
int leftIndex = -1;
int maxIndex = nums.Length-1;
while (leftIndex + 1 < maxIndex)
{
nonRecursiveBinaryAttempts++;
int lookBetween = maxIndex - leftIndex;
int lookTill = lookBetween / 2;
int lookAt = leftIndex + lookTill;
long currentValue = nums[lookAt];

if (currentValue > number)
{
maxIndex = lookAt;
}
else
{
leftIndex = lookAt;
}
if (currentValue == number)
{
return true;
}
}
return false;
}

Binary searching - To recurse or not to recurse

The above example demonstrates how to use binary searching without using recursion. But perhaps we could make this code cleaner, more efficient or just way more awesome by using recursion (another recursion example here)!  The below code shows how to make it recursive.

private static bool FindNumberRecursivelyInArrayBinaryStyle(int number, int leftIndex, int maxIndex)
{
if (leftIndex + 1 < maxIndex)
{
recursiveBinaryAttempts++;
int lookBetween = maxIndex - leftIndex;
int lookTill = lookBetween / 2;
int lookAt = leftIndex + lookTill;
long currentValue = nums[lookAt];

if (currentValue > number)
{
maxIndex = lookAt;
}
else
{
leftIndex = lookAt;
}
if (currentValue == number)
{
return true;
}
else
{
return FindNumberRecursivelyInArrayBinaryStyle(number, leftIndex, maxIndex);
}
}
else
{
return false;
}
}

Binary searching - But.. IS it faster?

So we now know how to make a binary searching method. But does it do the trick? Is it worth creating your own searching method instead of using general methods like .Where, .filter, .any or using brute force techniques like foreach and while(i<nums.Length). I have created a demo console application that shows the execution speed per techniques. I have tested 5 techniques:
- Binary search with recursion
- Binary search without recursion
- Old-school style (for looping)
- Searching with a lambda expression (general)
- Linq style searching (C# only)

The results? See below. Or try it out using the dotnetfiddle I have created here.

Finding number 90 in an array with numbers between 0 and 9999999. Recursive binary style takes 23 loops, 0.0026 milliseconds. Non-recursive binary style takes 23 loops, 0.0008 milliseconds. Old-school style takes 91 loops, 0.0008 milliseconds. Lambda style finished in 42.0569 milliseconds. Linq style finished in 33.912 milliseconds.

Here we have given the program the number 90 to find. This number is pretty close to the start of the array. No wonder that the old-school/brute-force method works pretty quick (It runs with O(91). The two binary searches are also quick. But looping an array 90 times apparently gives us no reason to pick binary search as iteration method.Finding number 5050505 in an array filled with numbers between 0 and 9999999. Recursive binary style loops 22 times, 0.0026 milliseconds. Non recursive binary style loops 22 times, 0.0017 milliseconds. Old-school style loops 5050506 times, 22.5992 milliseconds. Lambda style finished in 34.9058 milliseconds. Linq style finished in 35.2623 milliseconds.

The number 5050505 is further away from the start of the array than the previous test. And we can clearly see the difference: 22.5992 milliseconds for Old-School looping and 0.0017-0.0026 milliseconds for binary searching! Finding number 99999999 in an array filled with integers between 0 and 9999999. Recursive binary style loops 24 times, 0.0031 milliseconds. Non recursive binary style loops 24 times, 0.0017 milliseconds. Old-school style loops 10000000, 37.0759 milliseconds. Lambda style finished in 35.3435 milliseconds. Linq style finished in 35.149 milliseconds.

What about a number that does not exist in the array? This means the brute force loop will run through the array nums.Length-1 times. The array is filled with numbers between 0 and 9999999. Adding another 9 shows us that the binary searches are way quicker than the other three methods.

The reason why Lambda and Linq show slow results are simple. Methods like .find and .where or using data query structures with Linq pretty much do the same as brute force looping: iterating the entire list. The real benefit in using lambda and linq is readability and clean code. Linq contains lots of overheads. But sure; it's perfect if you want to retrieve data in complex structures. What Linq really does when using a data query like I used in my code is transforming it into a more complex lambda expression. That's why both lambda and linq approaches are quite similar in results.

Conclusion

I hope you have enjoyed this quick tour into binary searching. As I tried to emphasize in my last post about runtime growth, it's a requirement that every programmer should think about the why's and when's on using filter/search algorithms. As we have seen in the examples lambda en native linq are pretty much the same. And I can tell you; using linq with lambda will give us no significant performance boost. If we have a small array the brute force approach is just as efficient as using a binary search. But, if we have a large list of numbers its better to use a binary search. Recursive or non-recursive I hear you say? As we have seen it's pretty much the same. But if you're really interested in the tiniest of milliseconds I'd say go for a non-recursive method. Recursive methods are most fun when using nested data. Keep in mind however that binary searching methods only work in sorted arrays!

Questions? Or do you have suggestions? Be sure to mail me by using the mail button under my avatar!

Submit a comment