# Binary search algorithm in PHP

Binary search is a search algorithm that is dramatically faster than PHP’s internal functions (such as array_search) when searching **ordered** data.

## How does it work?

PHP’s internal function **array_search **is an example of a linear search; it will iterate over the entire data set until it finds a match working from front to back. This is great if your data set is small and unordered, but is incredibly inefficient when working over large data sets, especially if your match is toward the back of the set, or doesn’t exist at all.

### A different approach – divide and conquer

Binary search approaches this problem in a different way. It divides the data set to find the match starting from the middle, to narrow the range that the match can be found within (hence the requirement for your data to be ordered).

## A PHP implementation

Below is a PHP example of how to implement a binary search.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
function binarySearch($needle, array $haystack, $compare, $high, $low = 0, $containsDuplicates = false) { $key = false; // Whilst we have a range. If not, then that match was not found. while ($high >= $low) { // Find the middle of the range. $mid = (int)floor(($high + $low) / 2); // Compare the middle of the range with the needle. This should return <0 if it's in the first part of the range, // or >0 if it's in the second part of the range. It will return 0 if there is a match. $cmp = call_user_func($compare, $needle, $haystack[$mid]); // Adjust the range based on the above logic, so the next loop iteration will use the narrowed range if ($cmp < 0) { $high = $mid - 1; } elseif ($cmp > 0) { $low = $mid + 1; } else { // We've found a match if ($containsDuplicates) { // Find the first item, if there is a possibility our data set contains duplicates by comparing the // previous item with the current item ($mid). while ($mid > 0 && call_user_func($compare, $haystack[($mid - 1)], $haystack[$mid]) === 0) { $mid--; } } $key = $mid; break; } } return $key; } |

We can utilise this function in the following example by searching an array of email addresses for a specific one. Any test data that appears here was generated by Faker.

1 2 3 |
$emails = [/* array of emails */]; $searchEmail = 'wkeeling@gmail.com'; $key = binarySearch($searchEmail, $emails, 'strcmp', count($emails) - 1, 0, true); |

## Benchmarks

So let’s benchmark the performance of binary search against PHP’s internal array_search function over a variety of data set sizes and match positions.

### Small data set – 100 items

**Item exists as the first entry in the data set:**

- PHP’s array_search: 0.02599999999997ms
- Binary search: 0.018999999999991ms
- Binary search is
**1.37**times faster than array_search

**Item exists around the middle of the data set:**

- PHP’s array_search: 0.029999999999974ms
- Binary search: 0.020999999999993ms
- Binary search is
**1.43**times faster than array_search

**Item does not exist in the data set:**

- PHP’s array_search: 0.03000000000003ms
- Binary search: 0.019000000000047ms
- Binary search is
**1.58**times faster than array_search

### Medium data set – 10,000 items

**Item exists as the first entry in the data set:**

- PHP’s array_search: 0.032000000000032ms
- Binary search: 0.023999999999968ms
- Binary search is
**1.33**times faster than array_search

**Item exists around the middle of the data set**

- PHP’s array_search: 0.12000000000001ms
- Binary search: 0.02000000000002ms
- Binary search is
**6**times faster than array_search

**Item does not exist in the data set:**

- PHP’s array_search: 0.19099999999994ms
- Binary search: 0.021999999999966ms
- Binary search is
**8.68**times faster than array_search

### Large data set – 1,000,000 items

**Item exists as the first entry in the data set:**

- PHP’s array_search: 0.037000000000009ms
- Binary search: 0.035000000000007ms
- Binary search is
**1.06**times faster than array_search

**Item exists around the middle of the data set**

- PHP’s array_search: 8.734ms
- Binary search: 0.026000000000082ms
- Binary search is
**335.92**times faster than array_search

**Item does not exist in the data set:**

- PHP’s array_search: 15.676ms
- Binary search: 0.031999999999921ms
- Binary search is
**489.87**times faster than array_search

## In summary

The results of the benchmarks show that binary search is slightly faster than array_search in most scenarios, but as the data set grows, the performance difference becomes huge. Binary search should be used when you know the data set is large and ordered.