Searching. One of the most crucial activities performed on datasets. Think of a world with lots of data, but no possibility to filter relevant info for yourself. Sounds pretty stone-age right? Thankfully we have lots of options to perform the craziest search queries on huge datasets. These options are available to us for quite some time, and we got used to perceiving lots and lots of data. In this digital age we tend to thrive for the most efficient ways to gather data, but besides that: we want it fast. So how do we search through data in a quick way? In this technical blogpost I will shine my light on one of the most underrated searching methods: Binary searching. It is one of the quickest ways to retrieve data, but it has its downsides.
Binary searching - The basics
As the name gives away binary searching is a about 0 and 1: yes or no, true or false. This method doesn't aim to look for an array item by looping through an array and waiting for it to bounce back a result. Instead it uses something what looks like a "warmer, colder"-kind of approach. But then without the "colder". Imagine playing a game of hide and seek and you get to ask the person that is hiding to shout their name every 5 seconds. You'll get their location in no time!
Binary searching loops through an array by looking for an item and asking the array if it is nearing it. To summarize the method in one sentence:
"Search an array by splitting it in half - each iteration."
Let's say we have an array of 10 integers and we're looking for the number 2. Normally we would have to loop through the entire array until the iteration finds '2'. For this number it will take 3 iterations to get to number 2 (3rd position in the array) but it could also take 11 loops if the number is 10. Binary searching tries to optimize the efficiency of searching this array. Instead of looping through the entire array it splits the array in half each time. It does this by determining a minimum and a maximum. Both variables will change each iteration until the number is just between the two, at which the number has been found.
Binary searching - Code example
In the below example I have created a binary searching method which retrieves an array of integers and a number that it needs to find. It defines a leftIndex (the index of the minimum position it needs to look at) and a maxIndex (which is always the length of an array minus one). It will iterate with the condition that the minimum index plus one needs to be smaller than the maxIndex. This will prevent us from looping outside of the array (in case we're looking for number 11). Inside the loop we define between what positions we need to check. Then we split that up by dividing by 2. Finally we will define the current position we're looking at which is the minimum plus the (max) position of the divided variable (lookTill). We check its value with the given number and decide if it is bigger than or smaller than the current value. Inside the if else statement we basically tell our method to move to the left or to the right by determining the minimum and maximum look up positions.
However, if the current number is the number we're looking for there is no need to iterate any further!
Binary searching - To recurse or not to recurse
The above example demonstrates how to use binary searching without using recursion. But perhaps we could make this code cleaner, more efficient or just way more awesome by using recursion (another recursion example here)! The below code shows how to make it recursive.
Binary searching - But.. IS it faster?
So we now know how to make a binary searching method. But does it do the trick? Is it worth creating your own searching method instead of using general methods like .Where, .filter, .any or using brute force techniques like foreach and while(i<nums.Length). I have created a demo console application that shows the execution speed per techniques. I have tested 5 techniques:
- Binary search with recursion
- Binary search without recursion
- Old-school style (for looping)
- Searching with a lambda expression (general)
- Linq style searching (C# only)
The results? See below. Or try it out using the dotnetfiddle I have created here.
Here we have given the program the number 90 to find. This number is pretty close to the start of the array. No wonder that the old-school/brute-force method works pretty quick (It runs with O(91). The two binary searches are also quick. But looping an array 90 times apparently gives us no reason to pick binary search as iteration method.
The number 5050505 is further away from the start of the array than the previous test. And we can clearly see the difference: 22.5992 milliseconds for Old-School looping and 0.0017-0.0026 milliseconds for binary searching!
What about a number that does not exist in the array? This means the brute force loop will run through the array nums.Length-1 times. The array is filled with numbers between 0 and 9999999. Adding another 9 shows us that the binary searches are way quicker than the other three methods.
The reason why Lambda and Linq show slow results are simple. Methods like .find and .where or using data query structures with Linq pretty much do the same as brute force looping: iterating the entire list. The real benefit in using lambda and linq is readability and clean code. Linq contains lots of overheads. But sure; it's perfect if you want to retrieve data in complex structures. What Linq really does when using a data query like I used in my code is transforming it into a more complex lambda expression. That's why both lambda and linq approaches are quite similar in results.
I hope you have enjoyed this quick tour into binary searching. As I tried to emphasize in my last post about runtime growth, it's a requirement that every programmer should think about the why's and when's on using filter/search algorithms. As we have seen in the examples lambda en native linq are pretty much the same. And I can tell you; using linq with lambda will give us no significant performance boost. If we have a small array the brute force approach is just as efficient as using a binary search. But, if we have a large list of numbers its better to use a binary search. Recursive or non-recursive I hear you say? As we have seen it's pretty much the same. But if you're really interested in the tiniest of milliseconds I'd say go for a non-recursive method. Recursive methods are most fun when using nested data. Keep in mind however that binary searching methods only work in sorted arrays!
Questions? Or do you have suggestions? Be sure to mail me by using the mail button under my avatar!