Fully homomorphic encryption (FHE) looks to me like the WET DREAM of every business that wants to control the software and content on your hardware without ever having to decrypt sensitive code or data residing on that hardware.

That's because FHE-encrypted software is locally unhackable as long as its implementation of FHE remains unbroken.

Think:

* FHE-encrypted mobile operating systems, applications, and data.

* FHE-encrypted desktop and server operating systems, applications, and data.

* FHE-encrypted "smart home" devices -- from light bulbs to dishwashers to fridges.

* FHE-encrypted transportation infrastructure -- from automobiles to trains to airplanes.

* FHE-encrypted industrial machinery of all kinds.

If FHE ever becomes practical, we're looking at a very unhackable future.

Remarkably, I've never seen this issue mentioned anywhere else before.

FHE gets brought up all the time. There are great discussions down thread about why this isn’t practical for most use cases.

The killer thing is the performance penalty for doing FHE on any dataset of meaningful size. This is because if you want to do an operation on one member of the set you need to do an operation on all members of the set otherwise you leak which member you care about. For sets of a very small size this performance penalty might be ok. For sets of a large size, like the set of all users of a service, the performance penalty will grow at least linearly to the size of the set.

This is the problem at the core of ZK-style proving systems. You can push the proof construction work into the client and have a sub-linear verification time relative to the proof, but the proof construction in the first place is going to be slow and expensive (and slower as the size of the set increases!).

The edge of research here is being done by cryptography labs all around the world. If someone does manage to solve this in a purely-software based manner with a performance penalty that is acceptable for a monotonically increasing set, that’s the holy grail. I remain unconvinced that this will be solved in a reasonable fashion for this use case in my lifetime (but I would be thrilled to be proven wrong!!). If you know of research which makes inroads on this problem, please point me at it.

FHE style systems are possible if you use a hardware-based encryption approach like a Secure Enclave, but then you have a trust narrative that includes a Secure Enclave. For most use cases this is unacceptable.

Note: I work on a system design that I believe is acceptable that uses Intel’s SGX which we detail in this tech talk entitled “why sgx”: youtube.com/watch?v=Hwf_Q31woLo&utm .

If you believe in a Secure Enclave then there is no need for Fully Homomorphic Encryption anymore as all the computation can be carried securely in the SE. The question is the SE is not trustable for many people.

It’s more complicated than “do you believe in SE”. There’s a difference between using a SE for storing durable secrets (a bad idea in most implementations I’ve seen) and using a SE for provably destroying a piece of information. The former means that a break is full jackpot, the latter means that a break compromises the system for a window in time and forward secrecy can be restored with a patch.

I'm not sure if FHE is valid for any of these use cases, as far as I can understand, FHE machines/evaluators can operate on encrypted data, but can not derive any useful information from that data without the key.

In all of the use cases you have mentioned, FHE machine on the user device, can not perform interactions(IO, etc..) with the outside world (can't connect to a network, can't display things on a screen, etc..) without the key. So for these devices to be useful, the keys will have to be embedded in the IO paths(in which case the device would be hackable) or FHE machine should be implemented as subsystem that does things like DRM(I'm not sure if this would be possible either, but it could be).

I might be completely wrong here, but can someone sketch out how FHE can be used to make an unhackable consumer device?

The original - unencrypted - data from a user, or sensor, pass through encryption layer before going to FHE system.

I'm not sure I understand the question, but to me FHE is purely computational thing - only CPU and RAM are involved, and also operations include input and output data. To prevent the system from learning anything about the data, all operations are of kind

so all data in RAM get there in the encrypted form in the first place. Thus the FHE system never see unencrypted data.

Coming back to an unhackable consumer device - if you measure a physical property, like temperature, or have a digital model of anything - you have to provide this data to FHE system only in the properly encrypted form. That encryption is on the consumer, not on FHE, as well as decryption of results of FHE operations. It's only the processing of data which FHE does.

The part where FHE requires 42X cpu and 10-20X RAM of non-FHE operations will initially keep the use cases relatively small, and we don't have Moore's progression to bail them out in a few years. Neither are FHE operations in these kinds of use cases heavily parallelizable.

I'm also seeing some questions over side channel attacks by throwing enough data at the polynomial transforms that make up the NAND gate you're hiding the FHE'd data in, some people are claiming there has to be a way to eventually derive the encrypted value of the data you're attacking, but I don't understand any of the maths around FHE so I can't tell how big a concern those types of attacks are. From my layman's interpretation of the comments IBM'ers are responding to, I don't think a single operation under FHE is intrinsically vulnerable to side channel attacks, but I bet how FHE is used in real-world applications with iterative back-and-forth operations does potentially open up specific poorly-designed usage patterns to the classic inferential side channel attacks.

If FHE becomes practical at the scale you imagine, it would dramatically raise the bar on hacking for sure, but the implementers still only need to make one mistake to give a wedge to the hackers to work with.

The tokenization market (tokenize/detokenize sensitive data) just turned up the "interesting" knob to 11, though. Would take a lot of work between vendors to make real and practical, but some real promise there.

Huh. Never thought about it from that point of view, thanks for bringing it up.

What I hoped for is that FHE would usher in the era of more end-user control over the data, not less off it. That's because it would allow to decouple and isolate storage from compute, in such a way that I could invite company's code to process my data, encrypted by me, at the location of my choosing, and the code would have no way to exfiltrate the data. At the same time, I would have no way to get at the service company's software. I somehow never realized it would be easier for software providers to just skip the "my data at my location" part, and just force us to use their FHE software while also owning the data entered into it.

I agree this is a novel interpretation. Could you elaborate on what the computation performed involves ?

I'm of the understanding that you send me ciphertext, I transform it with some operations you define, and I send back ciphertext-v2.

I can discern how this could exfiltrate data safely. They send in a bunch of zeros and I fill in a one for each minute I used this lightbulb, by dynamically creating the FHE circuit to reflect the data saved.

It's not clear to me how this could change my lightbulb. By 'change' I mean 'affect the behavior' or update the firmware or something. I say this with the expectation that the blobs I received and produced are ciphertext on my side, without keys to decrypt it (unlike, say, dvd drm).

I think the use cases of FHE are much more limited that you're thinking, specifically because it allows you to execute code on someone else's machine without them being able to see _anything_ about the data they're operating on, and send the results (which again, don't mean anything to that machine) back to you. So, for example, you could mine cryptocurrency on someone's computer, but the only benefit you get over running FHE-encrypted code on the client machine over running it on your own server is compute cost. It wouldn't be useful for e.g. DRM.

As I understand it, in order to interface with a FH-encrypted software the data needs to be encrypted too. So every input needs to be sent to the mothership, encrypted there, downloaded, then computed locally, then the output is sent to the mothership, decrypted, then downloaded to present the results to the user -- which is no different than just doing the computation "in the cloud", which already is a thing. So I don't see how it brings anything new to the table.

Unless the key is kept locally, which would be hackable.

FHE operates under asymmetrical encryption and the one running the software needs the public key to perform the FHE operations on the data. The private key is known to the manufacturer (although there are some cool applications if no one knows the private key, provably) and is part of the logical circuit of the FHE so that it can decrypt and freshen it's state.

> Once one can do multiplication and addition all other operations can be bootstrapped from there so you can indeed do anything a Turing complete computer can do.

This is apparently iterated on many articles about FHE, but it seems false to me: to do Turing-complete stuff you also need comparisons, and the ability to change your flow depending on comparisons. But clearly you cannot do that with FHE, otherwise you could extract the encrypted content, one bit at a time.

My understanding is that an FHE computer can only execute a fixed net of additions and multiplications (or whatever operations they've got). So you can emulate an "if" clause by computing both branches and then selecting one of the two multiplying by 0 and 1 appropriately; you can do bounded "for" loops always executing the maximal number of times, but you can't do unbounded loops, therefore bye bye Turing completeness.

Of course there are a lot of interesting algorithms that can be executed without being Turing complete, but the general statement is false. Am I missing anything?

Imagine a FHE circuit that executes a single webassembly instruction. This is a finite circuit that takes as input the program, machine state and external input (all encrypted). As output it returns the (encrypted) new machine state and a single unencrypted bit indicating if the program terminated or not.

Now you can run the FHE step for as many times as the algorithm requires, giving you a FHE-VM that can execute any WebAssembly program from your favorite Turing complete programming language.

The limitations are that you need to define a bounded machine state, which technically breaks the Turing completeness. In practice this is no different from your laptop having a finite amount of memory which disqualifies it as a Turing machine. Real Turing machines don't exist.

The other limitation is that the FHE executor can now see the total run time of the algorithm, which was secret before. This is similar to a timing side channel and similar mitigations apply.

As others have pointed out, if you have a concrete algorithm you can do something much simpler than implement a VM. The Collatz sequence is good toy example of an algorithm with unknown loop bounds. You would make a FHE circuit that iterates a single step using the same overall design.

Totally agree. Just let me point out that this is not, theoretically speaking, an FHE computation. It is an "FHE plus a termination oracle" computation. Maybe no big difference in practice, but totally a different thing in theory, which was the point of my comment.

But modern computers let you send them instructions entirely in a Turing-complete language, with no requirement that you first know how to unroll them into a static circuit (constant-bounding every loop). That seems like a bigger difference than "oh sometimes you get out-of-memory errors".

This depends on your exact definition of an FHE computation. You could imagine a definition under which you can consider the whole scheme a single Turing complete FHE. But this definition would have to accept that the FHE computation is unbounded and leaks computation length (as would any Turing complete scheme).

I don't like the name "termination oracle", it sound too much like "halting oracle" which is not what is going on here. All we do is add an outer while-loop, a simple algorithmic transformation. I wouldn't even consider that an oracle.

After putting some thought into this during a deeper comment chain somewhere, I thought about it this way:

FHE is equivalent to circuits, i.e., every circuit has a representation as a FHE. I feel like the idea is: Circuits can perform arbitrary computations on a limited size input, but not on an arbitrary size inputs. Loops are not a problem, as they are an implementation detail. Unbounded output is not possible on a touring machine that halts for all inputs of size s limited by some constant. There exists a circuit that would compute the same thing, as you can take the (binary encoded) results of the touring machine and just create a truth table from that, which describes a circuit. Circuits are interesting because they are usually limited in size.

So you compile your program into a circuit of a fixed size and can then compute solutions of this size. If you are interested in larger problems, you compile your program into a larger circuit. E.g., a function returing an 32bit integer and taking 20 32bit integers as parameters can be represented this way (without infinite loops, but those aren't covered by turing machines either), no matter the computation within that function.

edit: correction, obviously a circuit can not represent all partial functions, only functions (i.e., all inputs have an output).

No, you cannot convert an arbitrary Turing machine to a circuit. Not even if you assume a bounded input size. Not even if you assume just one single allowed input, because you might not be able to know if that Turing machine ever terminates on that input.

And loops are not an implementation details. Bounded loop are, one might argue, an implementation detail, but unbounded loops are precisely _the_ problem: in general it is impossible, given an arbitrary Turing machine (or an arbitrary C, Python, whatever Turing-complete language program), to give an upper bound on the number of iterations its loops will require.

ketzu isn't claiming that one can always describe a Turing machine using addition and multiplication only, only that one can describe a Turing machine's map on bounded inputs using addition and multiplication on the input only.

And that's precisely what I challenged. You can't even describe a Turing machine on one single input (using whatever operations you like) if you're not able to determine if that machine is going to terminate. And in general you are not (at least, I am not; if you are, I happen to have a few Turing machines for which I'd be happy to pay to know if they're going to terminate or not; good money, I promise).

The task is not that; instead we assume that we already have a table that correctly describes the TM's maps from bounded input to output when they terminate. His claim is that this table can be expressed as a function in addition and multiplication with the input as operands. To be more precise, with the input and constant numbers (arbitrarily choosable while designing the description of the function) as operands.

Incidentally,

> You can't even describe a Turing machine on one single input (using whatever operations you like)

is false, we can simply encode it as the encoded tuple of the TM and the input; this is a standard construct when we talk about simulating TMs with a UTM.

Alan Turing proved that a solution to the halting problem cannot exist. Per Wikipedia:[0]

> A key part of the proof was a mathematical definition of a computer and program, which became known as a Turing machine; the halting problem is undecidable over Turing machines.

It cannot be possible to “map” a Turing machine with a finite number of operations. Without true decision trees and loops, FHE isn’t Turing complete. At minimum, there needs to be a concept of a conditional jump—if X, jump to instruction Y.

You can unravel some programs into a finite set of instructions, but that doesn’t make FHE Turing complete.

Take the following code, for example:

function f(x):
while x == 1:
do nothing
return x

For a machine to be Turing complete, it must be able to run that function. FHE can’t do that, by definition; it would reveal information about the input.

Some of the original work around FHE implies the researchers were applying ML models. If you had a complicated enough ML model (e.g. of a person) you could simulate a computer in that model. The assertion that FHE with just multiplication and addition is Turing-complete is true, and this is one proof.

If you can think of something besides encoding a Turning machine's state as NN weights I am very interested. Perhaps encoding a quantum circuit somehow? A technically correct but intractable solution would be simulating the quantum state of every atom in a silicon circuit.

Would you say that this circuit was fully homomorphic if, for all inputs, the processor appeared to be doing the same work?

That is to say? If X = 1 or X = 2 the processor did not appear to “do nothing” but, instead, performed an operation on an encrypted string that was constant time?

Seems like I didn't put enough thought into it and my computability class has been too long ago!

In hindsight it's kind of obvious: All functions computed by circuits must be a function an can not be a partial function, because a circuit can't output nothing.

Okay but that example is just one conditional branching. Turing-completeness requires that you be able to e.g. keep looping back until some condition is met.

If you want to build arbitrary functions like you've described, it reduces to the question of whether you can "unroll" the function, given its inputs[1], into a (stateless) circuit. But that would still IIUIC just get you primitive recursive functions (i.e. those where you can know the upper bound on the number of loop iterations for any loop before it starts), not the general recursive functions you need for Turing-completeness.

Anyone know how you pull off general recursive functions from just addition and multiplication?

[1] and in FHE you don't know the inputs either, which is another kink.

> that would still IIUIC just get you primitive recursive functions (i.e. those where you can know the upper bound on the number of loop iterations for any loop before it starts)

The primitive recursive functions cannot be expressed using circuits. Circuits by definition can only express polynomials, which have a much smaller growth rate than what primitive recursive functions are capable of.

[edit]

Actually, probably a better reason is that in primitive recursion, the running time can depend on the input. For a circuit, the running time is independent of the input.

FHE is just a math equation. Can you accept the requirement that you process the state more than once? If so you could do something like encode the atomic state of a silicon circuit somehow and step it forward in time.

That is probably what the researcher is referring to. You can encapsulate computation, but not in a single math equation.

The big question isn't branching, it's unbounded loops.

It seems to me the only option is unrolling those loops. But that has a massive performance penalty. Because you need to always run through however deep you unrolled.

You can't unroll an unbounded loop, you don't know how many times to unroll. To have Turing machine you need to need to know from your program state if you reached the end of your loop or not (i.e., you need to evaluate the "while" condition), and by definition FHE shouldn't allow you to extract information from your program state.

You can unroll a sufficient number of times. There would be an upper limit on iterations and it would presumably be horrible inefficient, but I can't see why it wouldn't work.

If you know how many iterations are sufficient, the unbounded loop could have been written as a bounded loop in the first place. For truly unbounded loops, you’re still stuck.

I am not saying there are not practical ways to do useful computation with this stuff. I am commenting on the theoretical remark that any algorithm (as in any Turing machine) can be executed with FHE, which is false, as it can be proved that it is impossible to compute, given an abstract Turing machine, an upper bound on the length of its execution.

I believe that's not the issue: the remote machine which performs the calculations on the encrypted data is TM complete and can perform unbounded loops. Comparison is the puzzler for me too: e.g., how do you take min(a, b) if you don't know how a and b compare? We know that F(a) + F(b) = F(a + b), so the remote machine can use "normal" addition operations, but still a > b and F(a) < F(b) can both be true, so it can't use "normal" comparisons, unless there's some implication of the properties of FHE I didn't see.

In theory we could compute the full table of values of a function like less(x: F, y: F) -> {0, 1}, interpolate a bivariate polynomial containing all those values, and build a circuit to evaluate that polynomial. Its degree would be around 2|F| though, so it would be completely impractical.

In SNARK programming, we would usually decompose field elements into bits, then apply a binary comparison circuit. Binary decomposition can be done non-determinstically, by having the prover give the purported bits as "advice" inputs, then having the circuit accumulate those bits and check the sum.

I haven't done any FHE circuit programming, but I imagine that approach wouldn't work, since the circuit must be deterministic. I imagine any values that might need to be compared would just be encoded as binary from the start. (Or is there a better approach?)

Interesting idea, but then it's not TM complete, since the problem space is by definition finite, isn't it? So it seems to me the importance of comparison for TM-completeness is still unsolved.

Then again, it just occurred to me that perhaps fully homomorphic implies that F(0) = 0 and F(1) = 1, so a > b could become F(a - b) ρ 0, and you only need to determine beforehand if ρ is > or <.

This is generally solved by unrolling the program into a single loop. Essentially having a function state(N) => state(N+1), shouldContinue.

This exposes the number of iterations needed to finish the computation. That could further be limited by underlying code rounding the iterations to mask the real count.

TLDR: either the run time is fixed or you leak some info via the run time.

Part of what's going on here is conflicting definitions of what "Turing complete" means in a formal context and how it's often used among practitioners.

A specific circuit has a fixed running time, but it can do arbitrary computations in that runtime, including emulating a fixed number of steps of an arbitrary Turing machine.

Our physical computers also have limitations (particularly space) on what Turing machines they can simulate, however they are often called "Turing complete" in general parlance. Obviously a laptop though has limitations that are on completely different orders of magnitude though, and are much more efficient.

Many "surprisingly Turing complete"[1] systems have similar limitations, but the more interesting question is "can you use this system to simulate a bounded number of steps of a Universal Turing machine?" and the answer for those systems, and FHE is yes.

Like a physics simulation you iterate the cryptographic math over time. The physical laws themselves are incapable of expressing computation, it is the effect they have on a system evolving over time that produces what we call computation.

FHE just means that you can do addition and multiplication on ciphertexts. The ability to do that implies that you can build logical circuits. Those logical circuits are highly expressive, but not Turing complete.

Arbitrary computation can still be bounded. [1] for example cites the definition of FHE (Def. 9) as a scheme that can compute all circuits, which are fairly powerful [2].

Arbitrary computation is also, afaik, not well defined.

edit: After putting some thought into this, I feel like the idea is: Circuits can perform arbitrary computations on a limited size input, but not on an arbitrary size input. Loops are not a problem, as they are an implementation detail. Unbounded output is not possible on a touring machine that halts for all inputs of size s limited by some constant. There exists a circuit that would compute the same thing, as you can take the (binary encoded) results of the touring machine and just create a truth table from that, which describes a circuit. Circuits are interesting because they are usually limited in size.

With addition and multiplication you can do a lot. I think there is a term for circuits that can be “arithmetized”. A loop which iteration count is based on the value of the encrypted input seems to be out of scope.

The issue with such a system that would allow arbitrary computation is that it would certainly allow to extract the value, which mean it would be simpler not to encrypt it in the first place :)

Result has no bit set = both terms equal
Result's sign bit is set = second term bigger
Result's sign bit isn't set = first term bigger

Multiplying by the result of comparisons (aka 0 and 1, as you state) is actually a common performance optimization. Does not work for everything, but when it works it works well.

Yes, but you can't branch on the result, because you don't know if the result is zero or not. So your program needs to have fixed control flow, i.e., you cannot implement arbitrary programs.

Sure, that was already mentioned on another comment, but that's not enough for proper branching. You also need while loops. See also comment https://news.ycombinator.com/item?id=24018284.

If you constrain your computer to always take the same amount of time to execute any while loop (which a homomorphic computer needs to do not to leak any data), your computer would also have unbounded circuitry complexity and thus it’s not any better.

Also unbounded loops can’t practically finish even on a Turing machine :)

Personal pet peeve that whenever someone brings up the movfuscator, all the responses seem to of the “well akshhhhuallly...” variety, I’ve never understood it. The responses here are generally incorrect; x86 mov does not have conditionals (that would be ccmov), cannot load the program counter, and can’t do general arithmetic beyond very standard indirect addressing ( the same forms allowed by many RISC architectures). movfuscator actually includes a version that uses only two forms of mov (a load and store form) that would work on nearly any common RISC architecture. So the assertions that “this only works because x86 mov is ___” are generally incorrect, and needlessly dismissive of an interesting idea - the addressing used by movfuscator is just addition/subtraction. Certainly at least, at a broader level, the idea that it may be desirable to reduce general computation to some extremely simple form is interesting.

Movfuscator only works because x86's MOV is like the Swiss Army knife of move instructions. It doesn't just copy bits from one location to another, but has parameters for conditionality and arithmetic, which is what allows it to be so general. That still doesn't address the parent's confusion (that I share) of how you get Turing-completeness just from addition and multiplication.

> That still doesn't address the parent's confusion (that I share) of how you get Turing-completeness just from addition and multiplication.

I wouldn't say arithmetic circuits are Turing complete since they can only perform bounded computations, but we could say that they're functionally complete. It is known that boolean AND and OR gates are functionally complete, and we can reduce them to multiplication and addition in any finite field:

x and y = x * y
x or y = ¬(¬x and ¬y) = 1 - (1 - x)(1 - y)

Informally, I mean reducible from time-bounded Turing machines. Given a Turing machine TM and a time bound T, a (binary or arithmetic) circuit can be constructed to simulate TM for T steps. Or in other words, a circuit can be constructed to solve any particular instance of FNTIME(T).

The reason why Intel mov is so powerful is because the mnemonic actually maps to tons of different things. In particular conditionals are accomplished using indirect memory accesses.

Any program can be compiled to a single `while` loop with a `switch`. I can think of two ways to "drive" this `while` loop:

- If the program is meant to run forever, or as long as possible, then the system performing the computations can drive the loop. This might be useful for anytime algorithms (e.g. an AI/optimisation problem): e.g. "keep iterating this computation (unless my billing account runs out of credit), I might ask for the current state at some point in the future".

- Programs which eventually halt could be checked by the key holder: the result gets decrypted, and if it isn't in the 'halt' state then it's sent back for another 'step'. In practice this could be made more efficient by e.g. making the computation idempotent, so it can be iterated many times without having to ask for confirmation every time; by having the encrypted computation perform many iterations, so each 'step' does more work; etc. If the job is kept on the encrypted system, with the key holder sending it step/stop instructions, that leaks details of the computation (i.e. an upper-bound on how many steps it takes). If we're willing to round-trip the data instead, then the key-holder can re-encrypt it each time (and jitter its timings), so the server couldn't tell the difference between stepping the same job and computing multiple jobs. This could be further obfuscated by throwing our jobs into a pool of others, e.g. on a shared proxy.

This model of having some external entity "driving" a computation's loop appears in a few other places too:

- Smart contracts, like Ethereum, give programs "fuel" that decreases with each step, and calling another program requires us to give it some of our fuel.

- Optimisation algorithms can also use "fuel" to avoid infinite loops. Some compilers use this to avoid transformations getting stuck in a cycle (rewriting the code between one form and another).

- Total (functional) programming languages, like Agda and Coq, don't allow general recursion; yet they do allow co-recursion. This lets us wrap each recursive call (i.e. step) in a constructor, then we can "drive" the program by having an external program unwrap these constructors (AKA a main loop).

- Coq's Mtac language lets us use general recursion in our proofs, but the results get wrapped up (just like the co-recursive example above). The compiler will run these proofs to completion: if they don't halt then neither does the compiler.

I agree. These are totally legitimate ways to still use FHE even if you need to compute a Turing-complete thing. It doesn't thwart my theoretical point, because you're not doing pure FHE anymore: you're using FHE for the "Turing-incomplete" part of your algorithm, plus some external oracle to check for termination.

Practically this might still be an acceptable way to tackle the problem. I mean, "practically" as much as FHE itself is.

But just 2 months ago there was an article on HN that stated:

The database is a key value store prepopulated with the english names of countries and their capital cities from the continent of Europe. Selecting the country will perform a search of the matching capital. On a 2019 Macbook Pro laptop, the example searches take under 80 seconds.

I think the 40x overhead is a case of comparing throughput overhead (from what I know, FHE based secure inference protocols have poor latency, but can process many predictions in parallel, improving throughput)

Doing a exact string match on 200-ish rows in 80 seconds on a modern computer is so inefficient that I have a hard time seeing any less complex but useful operations whatsoever. Perhaps I'm just not clever enough, but for now homomorphic encryption seems like it isn't useful for common, real world usecases to me.

Someone please correct me if I'm wrong, wouldn't this still be vulnerable to side channel or pattern analysis attacks?
Also, how can all operations on a DB like projection, and equality be represented by addition and multiplication? Are we in the peano arithmetic space here? Can somebody break it down from me?

You answered your own question accidentally: side-channel attacks are not possible in this model of computation because everything is represented as addition and multiplication. The homomorphic ciphertext doesn't tell you when it's done, you sort of just do all the operations you're told and hand back the answer gaining no knowledge about the computation you just did.

That's not to say that a badly-designed usage of homomorphic computation can't introduce a timing channel, though. If there's a point in time where the result is handed back to the client, they inspect it, and then conditionally send more computations, you might be able to infer something with either the client's timing or if they happened to send back more computations or not. However, that assumes that we'd be able to trace multiple API requests associated with the same data set (as opposed to unrelated computations), and that we know enough about the computations we're being told to do to infer a timing channel from them. That's why I use the qualifying term "badly-designed", which in practice will probably be most uses of this technology.

So basically I tell you to do a bunch of stuff and give it back, and you have no idea if the computation is a partial application/full application etc. Neat. Next question: if my previous statement is correct it seems that it may push a lot of the work back onto the client, otherwise I don't get how a server couldn't figure out relationships from access patterns on blocks. This would limit the utility of the server right?

You can get a lot of the benefit of a linear scan using a technique like oblivious RAM but I’m not convinced it’s possible to build such a system without using a Secure Enclave. There are some papers that purport to implement this but once you’re dancing in the realm of Secure Enclave’s you’re out on a different limb.

I don’t think it’s possible for a circuit to operate over a large set quickly and use a fully homomorphic pure software design.

With some minor interaction between client and server, you can use ORAM in conjunction with FHE to enable sub linear database scans. It’s not practical for deployments, but is probably feasible for implementations

There can always be side channels. Side channel means some information channel that your model doesn't account for. By definition, it's not possible to be sure an implementation of a model doesn't have any side channels. The best known class of side channels is timing attacks, where the amount of time that an operation takes leaks some information about its content. This is a well-known area of research, and most modern implementations of cryptographical algorithms do take it into account; I expect this one does, too. If you can model the time an operation takes, you can prove that a given implementation has no timing channel. But any number of other side channels could potentially exist in this (or any other) implementation.

That's a lot of questions, but AFAIK it is not vulnerable to side channel analysis. That's why the author spent time on the database search use case; you must visit every row because you don't want an attacker to be able to segment your data at all. It would be possible to make trade-offs in that specific case, but cryptographic engineers are loathe to give the common programmer such a foot-gun.

Yeah, this stuff is cool so I'm trying to wrap my head around it. It sounds from what you're saying that you're trading a lot of efficiency for protection, but I can't see any way around it.

> Above, we can see charts indicating the additional compute power and memory resources required to operate on FHE-encrypted machine-learning models—roughly 40 to 50 times the compute and 10 to 20 times the RAM that would be required to do the same work on unencrypted models

Are those multiplicative factors specific to machine learning tasks, or is it the same for general purpose computation?

In other words, is there something about the structure of machine learning tasks which makes them more suited for FHE than other computing tasks?

> In other words, is there something about the structure of machine learning tasks which makes them more suited for FHE than other computing tasks?

Yes, FHE is not (currently) good at general-purpose computing. Complex conditional logic for example is a non-starter because every branch has to be evaluated. But tensor multiplication is well supported with existing FHE schemes, and many kinds of machine learning tasks consist of non-conditional tensor multiplications.

No, it's pretty fundamental. If the executor were able to know which side of the branch to execute it would mean the executor knows the value of the conditional.

So [1] claims "A variant of fully homomorphic encryption scheme for Turing machines, where one can evaluate a Turing machine M on an encrypted input x in time that is dependent on the running time of M on input x as opposed to the worst-case runtime of M".

I admit understanding this paper is a bit beyond me. But, does their construction evaluate Turing machine conditionals by evaluating both branches? If it did, how could the running time be "dependent on the running time of M on input x as opposed to the worst-case runtime of M"? It seems to me that having to evaluate every branch irrespective of the input would result in the running-time always being M's worst case, not varying runtime depending on the input.

I don't think there's much that can be done, except trying to identify any circuitry shared among branches and moving it out. Like if we have "if (cond) { hash(x) } else { hash(y) }", we can move the hash circuit outside of the conditional.

There are some circuit compilers that try to do optimizations like that automatically, like xJsnark, though it's not meant for FHE.

Re your last question, I don't think so. It's just an attractive application, hidding neural network weights is valuable in some settings, for instance.

Its less about structure and more about use cases. In my opinion there are only two major uses of data collection by organizations - Machine learning training data and targeted advertisements. A federated fully homomorphic system will allow these companies to train their models without invading personal privacy.

It will also allow organizations to train models collaboratively in a privacy preserving way, which if you think about it has lot of consequences.

I found the comment at the bottom of the page from the "Ibmresearch" user helpful in understanding FHE.

To quote his TLDR:

> TLDR Huge polynomials are what are operated on instead of plaintext values by hiding the real data in the polynomials with intentional noise to make it infeasible to extract the real data without the decryption key that knows how to remove it. All operators are bootstrapped from addition and multiplication and elegant modulus trickery. Noise is managed at each operation because the noise compounds with every logic operation you do. The result of an operation chain or circuit be it one operation or infinitely many, therefore, has some noise but can be decrypted by the person with the encryption key. The decryption effectively drops all the noise and hones in on the bits that matter. The person doing the computation is doing a lot of adds and multiplies on big polynomials so they cant understand what you are really trying to do.

I think this is exciting stuff. We're actively building prototypes around FHE concepts. Basically asking the question: "Can we make meaningful care and hospitality predictions, without ever seeing sensitive data?"

We've banged out some stuff with HElib, and some other interesting implementations, but are just beginning.

And we're hiring for both backend and data science positions. :) jobs@theembassies.com

> With SEV enabled, an operator who has root privilege on a host system can't inspect or meaningfully alter the contents of RAM in use by a virtual machine running on that system.

Is that true? I would like to know more on what kind of garanties SEV gives / how it works high level, any resources you can recommend?
I assume that at least when the VM is being launched, the sysadmin can mess up with the VM?

It is true if you assume SEV has no sidechannel vulnerabilities and that noone can uncap your CPU and read out the cryptographic material with an electron microscope.

The article claims that oblivious query, set intersection, and machine learning on private data are not possible without FHE. However, aren't they all possible either with secure MPC or hardware based enclaves e.g. AMD SEV?

I would argue that secure enclaves do not exist in practice. You have to assume a physical device where the private key cannot be extracted and the operation cannot be observed. The threat model is much weaker than for FHE, and imo not really useful for operating in large sets of private data.

MPC introduces a trust assumption. MPC is only private if some threshold of the multiple parties are honest and destroy their secrets. Though often this threshold is just one, efficient MPC often only has 3 participants total.

FHE gives you a better security model than either of these, as it neither relies on physics for safety nor needs to assume any level of honesty from the person running the computation.

I was thinking MPC might work for set intersection. For oblivious query, can't you do it by sending the query directly into the remote trusted execution environment, encrypted with the TEE's public key?

The extra computational power requirement is still pretty large... The math behind it doesn't seem to be anything new, there's a reason why these solutions aren't very commonplace yet

Even if it worked, I wouldn't want the government scanning through data for whatever they decided was subversive or suspicious through undefined filters and unaccountable agencies.

this still isn't making sense to me, is there a simple math example of how this is even possible if encrypt(1) = a, then how can encrypt(1) + 2 ever give me the correct answer?

You need encrypt(1) + encrypt(2). You would not be sent instructions to add 2. No information about what you are doing is sent, except the operations + and x.

As the comments in the article state, it's actually all done with operations on polynomials with encrypted coefficients. As + and x can be done on polynomials, it all works out.

The maths behind it is of course much more complicated.

As for a simple maths analogy, consider adding 1 + 2.

I'll encrypt your values by multiplying by 3 mod 7. So encrypt(1) = 3x1 mod 7 = 3 mod 7 and encrypt(2) = 3*2 mod 7 = 6 mod 7. Only the encrypted values are sent, along with the operations I want to perform on the values.

Now this scheme is homomorphic, as I can just add the encrypted values, encrypt(1) + encrypt(2) = 3 + 6 mod 7 = 2 mod 7. That is the only computation that would be done before sending back the result.

To decrypt it, I would have to divide by 2 mod 7 by 3, which gives me 3 (you can easily check that encrypt(3) = 2 mod 7). And indeed 1 + 2 = 3. So the decrypted answer is correct.

Of course this scheme is too easy to crack. The FHE scheme is not.

> roughly 40 to 50 times the compute and 10 to 20 times the RAM that would be required to do the same work on unencrypted models

Too bad Moore's Law is dead, otherwise it would have been possible to run everything we run now in 10-12 years using Fully Homomorphic Encryption for the same cost!

To me the practical problem with FHE is that, if it costs 20 times doing the equivalent non-encrypted computations, it is cheaper to do it on your premises rather than rent 20x the same computational power from a cloud host, unless the cloud host is able to provide computational power at 1/20 of the cost it has for you (which seems too much to me).

IMO a more practical approach would be to use ordinary encryption inside a CPU, so it wouldn't be possible to extract any data without extremely advanced methods.

Rough description: CPU have secure memory which contains a private key. Also CPU have certificate with corresponding public key. That certificate is signed by Intel.

Hosting provider publishes an API which allows remote user to communicate with that secure CPU, basically just transmitting encrypted stream to and from CPU.

When you're starting your communication, you establish an encrypted link between your machine and target CPU, located in a data center. You can check authenticity of that CPU by checking certificate signature. Then you're sending any code you would like to execute (encrypted by session key). CPU executes that code. It encrypts RAM in process, so it's not possible to read its contents. And sends back some information (again, encrypted) which you're interested in.

So you have computing resources which are protected from anyone except very few who could extract private key from secure chip. I think that this kind of protection should be enough for many uses. And it would not compromise on performance. The only real world issue would be side channel attacks.

You're essentially describing AMD's SEV[1] as mentioned in the article. It piggybacks on the memory encryption implemented in the Zen architecture by giving a separate key for each VM. Ostensibly, the host can't interfere or snoop in the VMs, assuming you trust AMD. I'm surprised it hasn't been more widely adopted.

> protected from anyone except very few who could extract private key from secure chip

The way I understand chip manufacturing it would be hard to diffuse a separate key into each chip. This means it'd only take one of those very few people to extract and leak the key (or cut out the middleman and leak it from inside AMD) to break it for everyone.

> The only real world issue would be side channel attacks.

This is probably a much bigger problem in practice. Spectre & related attacks have been effective against Intel SGX[2], another "trusted environment" inside a larger system.

> That certificate is signed by Intel.
> So you have computing resources which are protected from anyone except very few who could extract private key from secure chip.

…and Intel. This has obvious problems in who the root of trust should be, as well as putting all the eggs in the same basket.

Also, what you propose is I think equivalent to the SEV technology mentioned in the article.

control the software and contenton your hardware without ever having to decrypt sensitive code or data residing on that hardware.That's because FHE-encrypted software is

locally unhackableas long as its implementation of FHE remains unbroken.Think:

* FHE-encrypted mobile operating systems, applications, and data.

* FHE-encrypted desktop and server operating systems, applications, and data.

* FHE-encrypted "smart home" devices -- from light bulbs to dishwashers to fridges.

* FHE-encrypted transportation infrastructure -- from automobiles to trains to airplanes.

* FHE-encrypted industrial machinery of all kinds.

If FHE ever becomes practical, we're looking at a

very unhackablefuture.Remarkably, I've never seen this issue mentioned anywhere else before.

reply