By contrast, I've only more recently started hearing more and more about model checkers such as TLA+. The case studies in this blog post serve to give a good motivation for why such systems might be useful (as well as honestly presenting drawbacks; something that advocates of e.g. PFP rarely do). I would love to use something like that one day.
As an aside, I still think it's a disgrace how irrelevant empirical research is to most aspects of today's software development. Another blog post by the author cites a paper claiming that very simple testing, especially of error conditions, would be able to prevent a huge chunk of production bugs, but things like that are rarely talked about. The whole profession is unfortunately amateurish and always obsessed with fads and trends instead of taking a more analytic and empirical approach which makes me sad.
Somewhat old numbers from seL4: 
- 2,500 lines of C code
- 100,000 lines of proofs
- ~7.6 person-years of full time work (with a lot of involvement from PhDs)
Interactive theorem proving definitely will not scale to significantly-sized C codebases anytime soon.
Formal Methods (for software assurance) is the name of a research discipline studying the following methods, as well as programming language semantics:
* Formal specification of either code or design
* Model checking
* Sound static analysis
* Machine-checked/assisted deductive proof systems and dependent type systems
* Concolic testing and automatic test generation
* Program synthesis
Some of the verification methods above (like some model checkers and sound static analysis) can be used without any explicit specification whatsoever (e.g. they're used to verify lack of undefined behavior in C programs). So the name "Formal Methods" covers a wide range of tools and techniques, with widely varying goals, costs and appropriate domains, that it doesn't really mean much to talk of "formal methods" as if it were a specific thing. Rather, it is a general approach used in constructing widely differing tools and techniques. That's why it's called formal methods, not a formal method.
One can of course object that requiring the use of a language where verification is "baked in" like this might be undesirable; but either way, it seems clear even to formal-methods proponents that well-chosen language semantics are going to be very important if code verification is to be feasible. So it's six of one, half a dozen of the other.
I would suspect that (subsets of) Ada, or, Rust, or maybe OCaml would be more fitting while staying on the industrial side. (Likely Haskell or Idris would be even easier targets, but their industrial adoption is quite limited.)
I can't remember any large company where Haskell would be front and center, but a number of smaller companies doing so exists. There was a post about running Haskell in production recently here on HN.
Formally proving complex properties is just an inherently difficult and time-consuming task. That is the main reason it is not often undertaken, even in mathematics if I understand correctly.
But some use cases require safe client code that will nevertheless run on unsafe OS and hardware. Client-side and some server-side cryptocurrency apps are one example that comes to mind. Probably banking apps too. It's still useful to write them in strongly typed languages amenable to formal verification, even though they'll be running on unverified Windows, Linux, MacOS and smartphones.
Also we really need OS's, firmware, etc. written in languages amenable to formal verification, but given the massive lockin of the mainstream options, that's a pipedream for now. But a worthy long-term goal.
You can also look at existing formally verified software and probably confirm my assumption - there is likely as much formally verified C code (a weakly-typed, memory unsafe language) as formally-verified Haskell code.
And I do not agree at all that a formally verified banking app is a good idea. It would likely be so difficult to do that by the time it is finished it no longer runs on any current version of any OS.
Formal verification is just too hard to be worth it. There are better methods of ensuring software correctness that are far less costly. Maybe something will be discovered in the future that will make it easier, but for now formal verification of any piece of software more complex than a few hundred lines of code is a multi-year research-level project.
And then there are the hundreds of software libraries I've seen throughout my career that I never got to use because they were just too buggy.
From where I stand, C doesn't scale to significantly-sized codebases. Nobody I know uses it except when they need to glue to some legacy C code that's been tested in production for the past few decades.
It's easy to say something is a lot of work, but in this case every one of the alternatives looks worse. I don't have the time or resources to write 1000x my program in test code, to debug it in production for 10 years, or to throw it all away because I never achieve a minimum usable level of quality. Formal methods aren't looking so bad to me right now!
For most programs, an 'informal' proof would be a huge improvement already. The effort you have to put in non-formal proofs is much lower, but the main drawback is that they are not checked, so you might make mistakes in the proof (on the other hand, you can make mistakes in the specification as well).
The holy grail would be formal methods that work mostly like informal proofs. With todays methods you would only prove the smallest, most critical parts of the code correct.
To paraphrase Edmund Clarke's Turing lecture, the problems of verifying software include "floats, strings, user-defined types, procedures, concurrency, generics, storage, libraries…" If you're not worried about verifying concurrency in your tool, you've got a more resources to devote towards verifying the other things.
That's also a space I don't have as much familiarity with, so I don't know what the best tools are there. My vague understand is that at that stage the problem "simplifies" enough that it's a lot easier to verify the code itself. Like how CompCert isn't just a model of a compiler, it's an actual correct compiler. So you can apparently work much closer to the code when verifying serial things.
At one point I got a bunch of people in FM to write provably correct versions of Leftpad. Maybe one of those would work best for you? Note also that one of the examples uses proves Java code correct via inline JML annotations.
 Or burritos. Mmm, burritos. /s
For something that relies on finding problematic inputs (like for compilers), SMT solvers like Z3 are a more natural fit
It doesn't prove that they provide a good ROI for a business looking to ship CRUD apps yesterday.
I could see formal methods for things that might kill people as possible but even there spending all of your money to deliver perfect code vs still being in business with something good enough is an easy math for most CEOs.
An ROI or IRR calculation might not have anything like the elegance of a proof, but it's a method and it's formalised.
This considerably derisks TLA+ because you cannot end up depending on TLA+, or put another way any deficiencies in TLA+ or its toolchain will not become production blockers for your team. You cannot lose more time than the time you spent writing your spec.
You can use TLA+ to specify as much or as little of your system as you want. You can upgrade your system independently of TLA+. You don't have to wait for TLA+ to support or integrate with your language.
I've said this elsewhere, but the closest competitor to TLA+ on most teams is decent high-level documentation, which is precisely what most teams lack.
Software tools, like many other things, suffer from an uncanny valley effect. If you're not tempted at all depend on a tool to generate code for you, you know the limits of the tool and can be quite happy. If the tool integrates smoothly with the rest of your code it's even better. However, if your tool integrates in a somewhat bumpy way with the rest of your code than it becomes worse than the two previous alternatives.
The TLA+ community already has limited resources that I think can be better spent on making TLA+ an even better design language before trying to cross the much bigger and more arduous gap of trying to make TLA+ executable (a goal which I suspect most of the community would actually oppose anyways).
There is a better way:
We have some very interesting advances in formal verification, but that is probably not the way.
So now you move from debugging the production code to debugging the TLA+. That's still an improvement.
 Unless you take the TLA+ code to be the definition of "correct". That's a possible position, but I don't adhere to it. It seems to me more reasonable to be able to say that the spec is wrong if it doesn't correspond to what is actually needed, and the TLA+ is wrong if it doesn't accurately encode the spec. (For example, the MCAS software did exactly what the spec said. But the spec was wrong.)
So I wonder if instead of proving that some hand-written code corresponds to a specification, wouldn't it be better to think of the specification as a high-level program and then somehow translate into some lower level conventional programming language?
You don't. You follow the scientific method of building a model specification for "what is needed", throwing it into the world, and tweaking and improving it where you find that some of its properties are not a good fit for the problem it's intended to solve - primarily by gathering feedback from the people that are using it and finding its flaws.
What you're proposing is that all engineers now effectively learn a new language and translate requirements into it. This expects requirements to be fully fleshed out themselves (a luxury few have these days) and relatively static (or everything needs to be reworked).
Good luck with that...
You can't test what you don't have. With design-based formal methods you can "fuzz" the high-level design and spot very real problems in it, before writing a single line of ordinary code.
The article makes the very strong claim that QA aren't going to catch the bug that arises from this design omission and it's going to cost your company money as everything goes wrong in the field.
I don't believe it would get that far IRL as any dev with a brain is going to spot it, and even so QA should certainly have test cases for this sort of situation if they're doing their job right.
You can argue this may be a better approach, and catch potential problems earlier, have at it. You can argue that there are classes of error that this will catch that normally you wouldn't (some of the preamble and larger scale stuff does seem to show this) But the illustrated example doesn't hold up to scrutiny and is overstated. As such I think it undermines the point that's the author is trying to make, because it looks to me like something we'd catch anyway.
The illustrated example is intended to be comparatively trivial. It's just clarifying what this kind of verification actually involves in practice. It makes no sense to describe it as "overstated", since the article actually mentions several case studies as demonstrating the usefulness of this approach; it's just that these cannot be examined in detail in an introductory article.
I think perhaps the example has just been simplified too far.
Check out the fallacies of distributed computing. If your testing system can simulate all of those edge cases, it probably looks a lot like TLA+.
I'm sure there are good cases for using TLA+, I'm sure there are situations where it's not only useful for catching errors before they even happen, but in which this more than offsets the upfront costs of the exercise.
I guess I just came away from the article not feeling that such had been demonstrated, in fact I came away with the feeling that the example was contrived to fit the agenda and didn't actually show much.
I'll go out on a limb and say it is nearly impossible to design and build a test environment that can simulate all network conditions, so that even in trivial cases where a dev might know for a fact that there is an issue, it'll be incredibly hard to reproduce it.
Maybe put another way, formal methods give cover to dev/QA to avoid shipping known
but hard to prove buggy code. Bugs they will ultimately be held responsible for.
Agree to disagree. If your system is distributed it requires you to share data and act concurrently. Otherwise it isn't a distributed system. No amount of retries will cover all of the screwed up things that can happen on an unreliable network.
I'm not sure how I can respond to that, other than pointing you to the Intro section (it's only a single, short paragraph). It seems clear to me that this is what the article is supposed to be about.
> The only symptom is credits mysteriously disappearing from people’s accounts. Your QA and monitoring won’t find this.
> It will cause your customers to lose money and stop trusting you.
Thread over I guess... :shrug:
Most people believe FMs are not resilient due to their coupling with the spec. What if the spec changes?
Saying that FMs are a debuggable form of specification, i.e. very close to debugging the design itself, turns that criticism on its head and shows that it is trivially the case that you can iterate on or prototype with FMs.
You're right in a limited sense, except that it's not any language, it's particular languages that can count as a Formal Method in the sense of the OP (i.e. can do model checking in some kind of termporal logic).
That goes for verifying code as well, however that is even harder to sell as it's so much more expensive and even though I 'feel' it is a good idea, I cannot really estimate if the TCO would be lower than just patching with bandaids over the years.