A bug is a slang term for a software error. Any undocumented program behavior can be called an error. But the word "bug" is specifically used to denote a particular type of errors - those which are detected only during program execution instead of the earlier stages of software design, coding, or debugging.
Etymology of this term dates back to September 9, 1947, when the 'Mark II' computer was undergoing some tests. Operators traced an error in the 'Mark II' to a moth trapped between pins of an electromechanical relay. The insect was removed and taped to the log book with a notation "First actual case of bug being found" written next to it.
Bugs arise from programmer mistakes made by people in a program's source code or its design, or compiler errors. An application's behavior may also be affected by bugs in third-party libraries employed by the application. Improper use of libraries may also lead to bugs. Compiler errors usually occur because of incorrect optimization of programming language constructs. However, these are very rare. Usually those bugs programmers are accustomed to blame the compiler for, appear to be their own fault if you look closer (see the post "The compiler is to blame for everything").
Special tools called debuggers are used in most cases to detect code fragments containing bugs. They allow you to track variable values, processor register values, and other parameters affecting program execution, as well as some other crucial parameters. There also exist other methods of bug detection: static analysis and dynamic analysis.
Many software errors may not reveal themselves in any way but be exploited by a malicious user to hack a computer system. The most common bugs that can be used for this purpose are a buffer overflow and an integer overflow. When these bugs occur in system software, they can be used to obtain administrator rights in order to execute user applications. Additional check of input data is required to avoid such errors.
It is a common myth that a device employing a software system (a digital device) is much safer than a device fulfilling the same functions but not using any software (an analog device). The examples below are meant to disprove this myth - it's no way easier to develop a reliable digital system than an analog one.
Software bugs are sometimes directly responsible for human casualties or million-dollar damages. There exist many examples of that. For instance, the "Ariane 5" rocket was destroyed almost immediately after launch, which happened due to a set of reasons. The rocket reused the inertial reference platform from the "Ariane 4", while that platform was not designed for the new rocket's greater horizontal acceleration. The platform used a software unit that converted floating-point data into 16-bit signed integer. It is where the greater horizontal acceleration caused an overflow to occur. Range checks for this particular variable were omitted due to efficiency considerations that required an 80% maximum processor load. This factor therefore also contributed to the failure. The most ironic thing is that the unit that caused the failure wasn't necessary anymore when the crash happened. Its output data had been required during the first 7 seconds of the flight and they had been correct at the time. The rocket crashed at the 37-th second.
Between 1985 and 1987, 6 patients were given massive overdoses of radiation due to bugs in the Therac-25 radiation therapy machine. Previous models also had bugs, but they also had hardware interlock to prevent overdosing, so no one had been given lethal overdoses. One of the main bugs responsible for the accidents was a race condition occurring when inputting data for a therapy session.
As we see, the both accidents occurred due to reuse of software containing bugs from previous models. Lack of testing of the units' software was also a critical factor. This is just inadmissible in safety-critical systems! The consequences of these accidents could have been avoided if all the bugs responsible for those had been revealed at the early stages of software development and testing.
However, the cherished dream of having software absolutely free of bugs is far from being fulfilled. One of the methods to somehow deal with this issue in safety-critical systems is to classify bugs into critical and non-critical ones. Protection against critical bugs can be provided by using hardware protection mechanisms, backup units, and other means.
We should also keep in mind that only complex software testing and quality enhancement can prevent most bugs. You cannot take just a couple of methods that could provide the highest quality of software.
- Wikipedia. Software bug.
- Wikipedia. Cluster (spacecraft).
- Wikipedia. Therac-25.