Talk:Binary search
This is the talk page for discussing improvements to the Binary search article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: 1, 2 |
This article is written in American English, which has its own spelling conventions (color, defense, traveled) and some terms that are used in it may be different or absent from other varieties of English. According to the relevant style guide, this should not be changed without broad consensus. |
Binary search is a featured article; it (or a previous version of it) has been identified as one of the best articles produced by the Wikipedia community. Even so, if you can update or improve it, please do so. | ||||||||||||||||||||||
This article appeared on Wikipedia's Main Page as Today's featured article on October 29, 2018. | ||||||||||||||||||||||
|
This level-5 vital article is rated FA-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
|
Other talk page banners | |||
|
On 1 June 2024, it was proposed that this article be moved from Binary search algorithm to Binary search. The result of the discussion was moved. |
Primary Procedure can Yield Position in "unsuccessful" case
[edit]I observe that if the search terminates in Step 2 as "unsuccessful", the target value 'T' would be located between AR and AL. That is, R is the index (possibly -1) of the array element less than T and L is the index of the array element (possibly n) of the array element greater than T.
I find this property of the procedure useful when searching ranges. It is, I believe, one step faster than using the right-most or left-most variants on average.
I'm not sure if a note on the Article page is justified, but having this information would be helpful to those doing "range" searches. It may be that a note on the Article page violates some policy WRT cited work vs. original work, other Talk topics make me think it might.
This is not a comment on the utility of the left-most or right-most variants. They are useful when the array can contain duplicates. Usually no duplicates are present when the problem is to determine which range a value is in.
Glennp17321 (talk) 18:52, 17 May 2020 (UTC)
Ranges should have exclusive upper bounds
[edit]I've taught Binary Search multiple times at the university level. I've also asked it as an interview question about 50 times from people with Masters in CS. I've determined that a major reason that people fail my interview question is because binary search is usually taught with inclusive upper bounds, rather than exclusive ones. This Wikipedia article continues that habit. I think the article would be substantially improved by switching to exclusive upper bounds.
By "exclusive upper bounds", I mean that the initial state for an element array would become and instead of having . Almost all good array code I've seen uses an exclusive upper bound. All of Java's libraries use an exclusive upper bound. One nice property of the exclusive upper bound is that is the number of items. A corollary is that is a test for an empty range and is a test for a non-empty one. Another good property of exclusive upper bounds is that for any where ≤≤, it is easy to split the range into two subranges and . These properties make reasoning about the code much easier.
When you use the exclusive upper bound, the code for binary search becomes pretty natural:
function binary_search_exclusive_upper_bound(A, n, T) is L := 0 R := n while R-L ≥ 2 do m := L + floor((R-L)/ 2) if T < A[m] then R := m else: L := m if R-L=1 and A[L] = T then return L return unsuccessful
Nota Bene: I am assuming the "and" operator uses short-circuit evaluation.
Please compare the code above to the code on the main page. I hope you'll see that this code is much easier to reason about. You can see that (1) the loop runs only when it can split the range, (2) that is clearly not equal to and not equal to , (3) that it doesn't matter if I used or to compute , (4) that range variables are correctly handled without off-by-one errors and (5) the code handles all cases, such as when is zero.
BTW, I believe that handling is zero, an empty range, is an important case. The paper by Pattis that is cited on the main page also considers it important. The main page's version of "binary_search_alternative" goes into an infinite loop if is zero. I see that as a consequence of using non-exclusive upper bounds, because the code on the main page is harder to reason about.
I believe binary search is usually taught with an inclusive upper bound because it is such an old algorithm. When it was created, we didn't have the habit (which I learned with C in the early 80's) of using an exclusive upper bound. But almost all array code now uses the exclusive upper bound. Using the exclusive upper bound is easier to reason with. I believe strongly that this article would be substantially improved by using exclusive upper bounds.
Michael Nahas, former Adjunct Faculty of University of Virginia, former Instructor at Johns Hopkins Center for Talented Youth
P.S. If you need an external reference, I wrote up my opinions on my website.
Mdnahas (talk) 23:07, 17 June 2021 (UTC)
- Thank you very much for sharing your input! I definitely agree that the exclusive lower bound is a better and more logical way of implementing and reasoning about this algorithm. However, Wikipedia is not a place to "right great wrongs"; there at least needs to be a reference that confirms that this is actually the proper way to implement the algorithm. We can't use the thing you put on your website as a source, because it's self-published. Also, as you also mentioned, the inclusive upper bound is also still commonly used. I quickly looked up the .NET source code and it seems that they also use the inclusive upper bound in their binary search implementation (source). So I think that, if we were to add your version, the current one at least also needs to be mentioned, maybe under the "alternative procedure" section. ―Jochem van Hees (talk) 00:07, 19 June 2021 (UTC)
- My quick impression is that having a half-open search range is likely to confuse more readers, when they start reading the article, than it helps later on with pseudocode simplifications. Also, my impression from other similar articles is that such a change is likely to lead to a nightmare of maintenance where we keep having to revert well-meaning freshmen who think they see an inconsistency in the inequalities and change it to look more consistent, without changing the rest of the code to match. So the situation is a little different from in a classroom, where you can control better the whole presentation. —David Eppstein (talk) 01:04, 19 June 2021 (UTC)
Using an exclusive upperbound is just good programming practice. I don't think I am trying to "right great wrongs" --- we're talking about adding 1 to a variable. I am just applying good practice to an algorithm. And, as I said, good practice is important to apply here. Many experienced programmers get this program wrong, because it is hard to reason about code that uses an inclusive upper bounds.
In fact, the exclusive upper bound is such good programming practice that C# version 8.0 has added a language feature where array ranges are specified with an exclusive upper bound.
Since Jochem looked up an implementation for the #5 programming language, let's look at what the other 4 do.
C's bsearch interface uses start + length. I couldn't find LLVM's implementation. The GCC implementation uses start + length.
The C++ STL function has an interface using exclusive upper bounds. The algorithm is implemented in std::lower_bound. The reference code works with start + length.
Python's bisect has an interface using an exclusive upper bound. It's implementation works using an exclusive upper bound.
Java's Arrays.binarySearch interface uses exclusive upper bounds. The OpenJDK implementation and Android implementation use an inclusive upper bound. The GNU implementation uses exclusive upper bounds.
I don't deny that the inclusive upper bound is used. It is how the literature first wrote about the algorithm. It is how most textbooks present it. It seems to be a leftover from before we knew that it was bad programming practice to use it. I'd like you to notice that none of these languages use the inclusive bounds for their interface. None. If there "needs to be a reference that confirms that this is actually the proper way to implement the algorithm", I would say that that is it.
I don't care if the Wikipedia page uses start+length or exclusive upper bounds. Either is fine programming practice. If David is worried about well-meaning freshman modifying code with the exclusive upper bounds, can we agree on start+length?
Mdnahas (talk) 19:09, 5 July 2021 (UTC)
- If the inclusive upper bound is "how most textbooks present it" then that is a strong argument for how we should present it here. In this sort of thing, Wikipedia should be a follower, not a leader. —David Eppstein (talk) 19:39, 5 July 2021 (UTC)
- Wikipedia "should be a follower" is an argument for Wikipedia not being a place for research or new scholarship. Changing a inclusive upper bound to start+length is not a new idea, nor is it significant to the algorithm itself. Moreover, when saying "Wikipedia should be a follower", I think our reference should be the code being run, and not what's in textbooks. That the code departs from the textbooks is an important detail, in my opinion.
- I think the prime goal of Wikipedia is to clearly communicating an idea. I am confident that using an inclusive upper bound does not clearly communicate this algorithm. Mdnahas (talk) 17:07, 28 July 2021 (UTC)
- Wikipedia is not the "place for research or new scholarship." Wikipedia:No original research covers that. Wikipedia reports what reliable sources report. ~Anachronist (talk) 17:22, 28 July 2021 (UTC)
Apart from agreeing with David about following sources, I disagree with a lot that is written here. At the moment, what I think is being called the "inclusive" method, the upper and lower positions of where the target may be are treated the same. The algorithm in concept is symmetrical with respect to upper and lower. The proposed "exclusive" method proposes that the upper limit and the lower limit be treated differently, and this is claimed to be better. Well, BS. Adding extra concepts to an algorithm makes it harder to understand, not easier. I'm not convinced by the experience argument either, as I taught binary search to undergraduates more than 30 times without a similar experience. McKay (talk) 04:04, 29 July 2021 (UTC)
Approximate matches doesn't belong in this article
[edit]First of all, someone needs to define which regex means "Approximate matches"
I believe that they are exclusively talking about "begins with..." / "starts with..."
but it has almost nothing to do with binary searching. Even though it's interesting, it's not related to the Binary Search Algorithm.
AlwaysAngry (talk) 18:46, 25 November 2022 (UTC)
Examples of binary search described in ancient texts
[edit]Here are two instances of ancient texts that describe finding an item (person) in a list (group) by halving the size of the list in each iteration and examining a predicate over each. These instances may be earlier than the currently listed reference to sorted lists in a Babylonian tablet.
1. The Losaka story in the Jataka tales:
So in time it came to pass that the people fell into a wretched plight. Reflecting that such had not been their lot in former days, but that now they were going to rack and ruin, they concluded that there must be some breeder of misfortune among them, and resolved to divide into two bands. This they did; and there were then two bands of five hundred families each. Thence-forward, ruin dogged the band which included the parents of the future Losaka, whilst the other five hundred families throve apace. So the former resolved to go on halving their numbers, and did so, until this one family was parted from all the rest.
src: https://www.sacred-texts.com/bud/j1/j1044.htm
2. 1 Samuel 14:41 in the Old Testament, as described in Urim and Thummim#Form_and_function.
src: https://en.wikisource.org/wiki/Bible_(King_James)/1_Samuel#14:41
- 152.44.156.175 (talk) 07:56, 4 July 2023 (UTC)
Section on galloping binary search?
[edit]Should we consider adding a section on galloping binary search, which operates differently, but it fundamentally the same algorithm? See for example Cormack's book on search systems. DMH43 (talk) 17:05, 29 December 2023 (UTC)
Requested move 1 June 2024
[edit]- The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.
The result of the move request was: moved. Favonian (talk) 20:43, 8 June 2024 (UTC)
The page was moved in 2005. I believe this extra word is not necessary. Here are what sources say:
- CLRS vol. 1, 3rd edition[1]
- "binary search" 21 times (excluding 253 counts of "binary search tree")
- of which "binary search algorithm" appears 2 times (upon first mention)
- Programming Pearls[2]
- "binary search" 32 times (excluding 5 counts of "binary search tree")
- no mention of "binary search algorithm"
- TAOCP vol. 3 (searching and sorting), 2nd edition[3]
- "binary search" 62 times (excluding "binary search tree")
- of which "binary search algorithm" 4 times,
- and "binary search procedure" once.
References
- ^ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.
- ^ Alexandrescu, Andrei (2010). The D Programming Language. Upper Saddle River, New Jersey: Addison-Wesley Professional. ISBN 0-321-63536-1. Bentley, Jon (2000). Programming pearls (2nd ed.). Addison-Wesley. ISBN 978-0-201-65788-3.
- ^ Knuth, Donald (1998). Sorting and searching. The Art of Computer Programming. Vol. 3 (2nd ed.). Reading, MA: Addison-Wesley Professional. ISBN 978-0-201-89685-5.
IntGrah (talk) 20:42, 1 June 2024 (UTC)
- Support per nom. "Binary search" is clearly the WP:COMMONNAME and more WP:CONCISE. Malerisch (talk) 22:20, 1 June 2024 (UTC)
- Support. I don't have strong feelings either way and I'm not convinced that one or the other is the common name, but more concise is better. —David Eppstein (talk) 22:28, 1 June 2024 (UTC)
- Support: Commonly used, sufficiently clear, and more concise. — BarrelProof (talk) 18:24, 2 June 2024 (UTC)
- Support. Clearly "algorithm" is necessary sometimes, such as to make clear that Floyd–Warshall is a graph algorithm and not a pop band or a law firm. But it isn't as commonly used for algorithms such as binary search and long division. Quale (talk) 04:53, 3 June 2024 (UTC)
- Wikipedia articles that use American English
- Wikipedia featured articles
- Featured articles that have appeared on the main page
- Featured articles that have appeared on the main page once
- Old requests for peer review
- Wikipedia Did you know articles that are featured articles
- FA-Class level-5 vital articles
- Wikipedia level-5 vital articles in Mathematics
- FA-Class vital articles in Mathematics
- FA-Class Computer science articles
- High-importance Computer science articles
- WikiProject Computer science articles