Hacker News new | comments | show | ask | jobs | submitlogin
GCC-Rust: Rust frontend for GCC (github.com)
288 points by pcr910303 9 months ago | hide | past | web | 138 comments | favorite

I have talked to GCC developers about integrating a Rust frontend and they said, the main blocker is the fact that the language specification isn't stable and a fast moving target.

As long as the Rust language specification isn't stable, GCC's Rust implementation would just fall behind all the time.

As an alternative, it has been suggested providing a GCC frontend for the LLVM-IR language which is generated by the Rust compiler since that specification is stable.

If Rust's specification becomes stable at some point, it would probably a good idea to start a Bountysource campaign (like we recently did for the m68k [1] and AVR backends [2]) to help the Rust frontend brushed up and ready for being merged into the GCC source tree.

> [1] https://www.bountysource.com/issues/80706251-m68k-convert-th...

> [2] https://www.bountysource.com/issues/84630749-avr-convert-the...

I do not think that LLVM-IR is stable either, so I do not think that is a good option. I think this project does the right thing by using the Mir which while not stable either should be more stable than Rust and not any worse than LLVM-IR, while retaining more high level information (e.g. generating LLVM-IR loses MIR's knowledge of restrict).

I think the instability the GCC devs are concerned about is the release schedule. Rust has 9 releases per year while GCC has 4-5 releases. Unless you update your compiler quickly, you won't be able to use much of the crates.io ecosystem as crates are quick at requiring a new compiler version. Often the implementation of a new language feature is still getting last-minute fixes up to 6 weeks before the release, and sometimes even after that beta period.

Compare that to C/C++ which has new releases every 3 years. A compiler that only supports C++14 is still perfectly usable for most C++ codebases out there. In fact, Godot engine still hasn't migrated to C++11 yet, and it's no exception from the norm at all. Even if your compiler has parity with Rust 1.31, released one year ago, you'll have trouble with most projects as even if the project itself doesn't depend on newer compiler releases, one of the transitive dependencies will.

I could be wrong since I am not a Rust developer but I do not think the MIR changes 9 times per year. I would imagine most releases do not touch the MIR these days.

You'd still have to do the releases in a separate schedule from the main gcc releases, something that the gccgo compiler doesn't do.

I think we might be talking past each other, because I never claimed otherwise. My point was that LLVM-IR is not a solution to anything becase LLVM-IR would have the same problem, LLVM has a separate release schedule from gcc. I think using the MIR is the best alternative which does not require writing and maintaining a separate frontend from rustc.

Since this implementation shares the frontend with rustc, there is no reason it would fall behind.

Ah, interesting. But can this implementation actually be built for architectures not supported by LLVM? I looked at the build instructions and the first build step involves building parts with the standard Rust compiler.

I also just realized that it's not the same backend as the one written by "redbrain" here [1].

> [1] https://github.com/redbrain/gccrs

Yes, it's a different attempt from redbrain's.

gcc-rust currently needs LLVM to build because that's the only good way to build Rust code, and gcc-rust (unlike, say, mrustc) uses Rust code. Once gcc-rust is on track, Rust code used by gcc-rust can be built by gcc-rust, and LLVM becomes unnecessary.

Sounds very good. I'll keep an eye on the project. I would love to see such a frontend merged into GCC upstream and I would definitely shell out some money for a Bountysource campaign to support the effort.

Most likely yes. This is also the same approach taken by D, with the same frontend being shared across dmd, gdc and ldc.

D comparison is very apt: Rust also would have one frontend with three backends, GCC, LLVM, and its own (Cranelift).

> its own (Cranelift)

Interesting. This is the first I've heard of Cranelift being used as anything other than a code generator for wasm runtimes. When would Cranelift be used instead of LLVM, and vice versa?

If I'm not mistaken, the pipeline is Rust -> MIR -> LLVM -> Binary. The GCC backend probably uses MIR, which is simpler than Rust itself.

Considering that it's more difficult to install upstream GCC than it is to install upstream rustc, there's ample opportunity for the distro snapshot that users would see to fall behind.

Were you able to get a sense of what counts as stable in this case? Is it about the rate at which new features are added, or more about keeping old code working?

I believe all old Rust code is kept 100% compatible with compiler updates. It's more about new features being added to the language at a rapid clip compared to languages like C++.

That will slow down I'm sure, but the Rust developers and users are aware of a number of things which are still holding the language back and need to be designed and implemented in the next few years. Big functionality stuff and usability stuff. But also a lot of work on the tooling and ecosystems which won't have anything to do with any language changes.

But I would expect in a few more years we'll see the rate of Rust language change will be significantly lower.

The title is somewhat misleading as to the scope of this project, as this looks to be a project for adding GCC backend support to the existing Rust compiler (though this isn't a criticism, that's exactly the approach that I would take!).

The (simplified) way that rustc works is that it translates Rust source code into a custom intermediate language called MIR, does a bunch of analysis and a few optimizations, and at the very end translates MIR to LLVM IR and passes that into LLVM. This project looks to add support for translating MIR to GCC's own IR instead of LLVM IR, allowing it to reuse all the other work that has been done writing rustc.

For a from-scratch reimplementation of (most of) a Rust frontend, see mrustc ( https://github.com/thepowersgang/mrustc ), which is written in C++, and served to demonstrate that (a prior version of) rustc did not feature a reflections-on-trusting-trust attack (via diverse double compilation).

Would be interesting to have a fully functional Rust compiler that's not based on llvm.

Sega genesis development

And Amiga, Atari, 68k-Macintosh etc.

FWIW, there is an out-of-tree m68k backend for LLVM [1]. I have already done some experimental work on Rust for m68k based on this backend [2].

> [1] https://github.com/M680x0/M680x0-llvm/

> [2] https://github.com/glaubitz/rust/tree/m68k-linux

Also, GCC has Xtensa ISA used on ESP8266/ESP32.

LLVM Xtensa port is in progress: https://github.com/espressif/llvm-xtensa

Lots of things are "in progress" (including rust support for gcc code generation, heh). But gcc has production-ready, mature backends for a ton of architectures that LLVM doesn't. It's a real feature.

Cranelift may become that in the future.

Although note the Cranelift backend has quite different goals from GCC and LLVM, being optimized for compiler speed over runtime performance, and is very unlikely to support many of the obscure backends that GCC can target.

Not likely, as the community is trying to get Rust to get ISO 26262. If they changed the compiler and they will have a whole lot more work to get it certified.

Unrelated. gcc-rust can be developed while rustc is ISO 26262 certified.

Out of curiosity, why would the community maintain two compilers. Wouldn’t that be double the work?

GCC supports several targets like alpha, hppa, ia64, m68k, sh, v850 and vax (and more) that the LLVM-based Rust does not support because LLVM lacks support for these targets.

Okay, but:

* Alpha is more than a decade past end of life. * Hppa is EOL as of 6 years ago. * ia64 goes EOL next month.

.. and the list goes on. These aren't going concerns outside of some hobbyist work, and GCC has had difficulty retaining maintainers for some of the hardware and has threatened deprecation of them (in fact IA-64 is facing this exact situation and is going to be marked deprecated in GCC 10, and vanish in 11 https://gcc.gnu.org/ml/gcc/2019-06/msg00125.html). With instruction set support in that state, the odds are reasonable that bugs will be exposed, and difficulty will be had trying to support and fix them (exasperated by scarcity of hardware)

Adding support for obsolete hardware is not much of a good argument for taking on the work of supporting two compilers.

Having independent language implementations is a good way to flush out unwarranted assumptions and other warts in the spec.

This is not an independent implementation, though.

You mean like LLVM and GCC? Both compilers for C and C++ (and a lot more with various front ends). I can only imagine they're both maintained because people enjoy working on them and find them interesting projects to develop. When I find something interesting to work on, I certainly don't care that someone else is doing something similar somewhere else.

LLVM got where it is primarily because of GCC's licensing (and because GCC's design is deliberately compromised to support that licensing). It's unlikely that LLVM would have been maintained otherwise, and I would expect GCC to fall by the wayside in due course.

Is it the same folks for the two compilers?

I would welcome another rust compiler as it would give us choices.

No mention of mrust in README. How do those two compare?

mrustc has a new frontend independent of rustc. This one reuses rustc frontend, just substituting GCC backend.

As far as I remember mrustc is not a full rust compiler/frontend, I.e. it is only usable with rust code which had be "proven" to be valid/correct rust by another compiler so you can't really use it for development. It's still useful for bootstrapping rustc as it's a rust compiler not written in rust. It probably can also be useful to compiler rust for some exotic hardware architecture which had no llvm support.

That's correct. Moreover, it's only attempting to compile rustc. Which means it could be completely broken for some other existing rust code. It is not recommended to use it for anything else than bootstrapping rustc (but this is already an incredible achievement).

It seems like it would be easier to make a WebAssembly front end rather than MIR. Just compile rust to WASM and use GCC compile to unusual instruction sets. There might be a small performance hit, but the WASM frontend could be likely be upstreamed into GCC.

Why was MIR chosen instead of WASM?

WASM is designed for a particular sandbox environment, which currently is 32-bit-only. AFAIK WASM has its own calling conventions and sections, so recompilation back to something that works with platform-specific C would involve difficult guesswork or glue code.

MIR is Rust's accurate low-level representation with much more type information, and it's aware of native target's specification, so it can be optimized better, and has better interoperability with native code.

I agree with you that there would be glue code and I don't doubt your right that MIR can be optimized more.

But rust in the general case is agnostic about 32 vs 64 bit pointers and explicitly targets WASM. I'm not familiar with GCC's IR, but unsandboxed AOT WASM compiled thru LLVM IR is astoundingly fast.

> This average benchmark has speed in microseconds and is compiled using GCC -O3 –march=native on WSL. “We usually see 75% native speed with sandboxing and 95% without. The C++ benchmark is actually run twice – we use the second run, after the cache has had time to warm up. Turning on fastmath for both inNative and GCC makes both go faster, but the relative speed stays the same”, the official website reads.

> “The only reason we haven’t already gotten to 99% native speed is because WebAssembly’s 32-bit integer indexes break LLVM’s vectorization due to pointer aliasing”, the WebAssembly researcher mentions. Once fixed-width SIMD instructions are added, native WebAssembly will close the gap entirely, as this vectorization analysis will have happened before the WebAssembly compilation step.


Im new to this, but why is this a thing and important?

For the same reason that clang/llvm is a thing and important?

curious why people want this? I was under the impression that llvm is better than gcc?

> curious why people want this?

It's important for core infrastructure to have multiple competing implementations. On a related note, does Rust have a standard yet or are they still doing the reference implementation thing?

> I was under the impression that llvm is better than gcc?

And I thought that tabs were better than spaces, BSD beat Linux, Emacs was the one true god... what were we arguing about again?

My impression is that gcc produces smaller code that is roughly comparable with clang in terms of runtime performance (with a slight advantage when compiling the codebase I care about the most). Gcc has better debug info, especially when optimizing. I don't know about compile speeds. Clang has better infrastructure for writing static analysis tools. Clang is a much more realistic alternative to msvc on windows than gcc is. I don't know about their development velocities. Clang seems to have the edge in mindshare.

I'd be curious to know whether this would provide cross-language outlining during LTO using gcc. I believe some form of this is possible with llvm?

My understanding is that LTO (and thus any cross-language inlining) takes place using a low level IR where language barriers aren't relevant. The GCC and LLVM backends both have full support as far as I know. Hypothetically it's simple to implement support in a given frontend, but apparently it proved to be a bit tricky in practice for Rust (http://blog.llvm.org/2019/09/closing-gap-cross-language-lto-...).

I don't think Rust is even defined by a reference implementation, given that they release a new compiler every six weeks.

For many practical purposes I think the closest thing to a language definition is the set of testsuites visible to Crater.

(That is: when the compiler people are considering a change, they don't say "we can't change this because we're past 1.0 and the change is technically backwards-incompatible", or "we can't change this because the Reference specifies the current behaviour"; they say "let's do a Crater run and see if anything breaks".)

This is not correct. We do use crater to help with questionable cases, but we often say “we can’t change this because we’re past 1.0 and the change is backwards incompatible.”

Which is still not the same as having a spec or even a reference implementation.

It’s a bit weird how laser-focused the Rust community is on backwards compatibility, not seeming to believe that forward compatibility is also important.

e.g., if I write code targeting C++17, I can be reasonably sure it compiles with an older version of the compiler, as long as that version also claims to support C++17, modulo bugs. Not the case if I write code targeting Rust 2015 as they’re still adding features to that from time to time. Let alone Rust 2018 which changes every 6 weeks.

Will there ever be a version of Rust that the community agrees “OK, this language definition is not changing unless we find some major soundness bug” ?

This is a big blocker for mainstream adoption in Linux distributions since the maintainer wants to be able to land one specific version of rustc in the repositories, not rely on people downloading new versions with rustup continuously. But old versions of rustc are effectively useless due to the lack of forward compatibility guarantees.

It's funny you cite C++, which has the best example of forward compatibility breakage in terms of impacting people.

g++ 4.4 implemented several key parts of C++11, including notably rvalue references, and adapted libstdc++ to use rvalue references in C++11 mode. However, the committee had to make major revisions to rvalue references subsequent to this implementation, to the point that you can't use libstdc++ 4.4's header files (such as <vector>) with a compliant C++11 compiler. So when you try to use newer clang (which prefers to use system libstdc++ for ABI reasons) on systems with 4.4 installed (because conservative IT departments), the result is lots and lots of pain.

Furthermore, it absolutely is the case that newer versions of compilers will interpret old language standards differently than older versions of the compiler. You don't notice it for the most part because the changes tend to revolve around pretty obscure language wordings involving combinations of features that most people won't hit. Compilers are going to try hard not to care about language versions past the frontend of the compiler--if the specification change lies entirely in the middle or backend, then that change is likely to be retroactively applied to past versions because otherwise the plumbing necessary is too difficult.

It's not that Rust community ignores forward compatibility, it's just that right now is not the right time. Things recently landing on Rust is not new stuffs. Most of them were designed in, like, 2017. It just took years to stabilize.

I write standard compliant C++ code that only works with the latest compiler versions because older ones incorrectly claim support for C++ standard, but their implementation is too buggy.

Rust has no reasonable specification yet although it is being worked on.

Which is the main reason why no usable alternative implementations exist yet and why Rust hasn't found its way to more low-level software projects like the Linux plumberland yet.

Rust dearly needs a stable specification, it is the main blocker why the language hasn't been more widely adopted.

I agree. That, and no need to rush new features, stabilize old ones and fix their bugs. I am still waiting for RFC0066[1] to be fixed. It is from 2014-05-04! Here is its GitHub Issue: https://github.com/rust-lang/rust/issues/15023. Backstory: I started writing a relatively simple Rust program many years ago when I ran into this issue. It was my first attempt at writing Rust, and I did not like the workaround.

[1] https://github.com/rust-lang/rfcs/blob/master/text/0066-bett...

Rust has LTS like releases based on years, but the last one didnt have async await so everyone just kept tracking. I think the nexts LTS type release may snag some people and slow stuff down. Cargo should handle that fine, but not sure about maintainers. I cant get cargo in my day job, just the rustc 1.3x shipped with rhel, so I'm already standing still. Rhel's version migh make a good defacto lts in the absence of cargo, but the thin std lib makes that hard, and random old rustc versions sometimes don't play nice with every crate.

Rust doesn't have LTS releases. Editions are both a way to market the new features introduced in the past years as a "bundle" to outside users, and a way for us to introduce some breaking changes for the users opting into the new edition. The release an edition is stabilized in (1.31 for Rust 2018) does not get any special treatment from the Rust team though, and that includes no LTS support (for example we won't backport bug fixes, even security ones).

As I pointed out in another comment, the definition of the 2015 edition is still changing (i.e., features from 2018 are getting backported to 2015), severely limiting the usefulness of the "edition" concept.

E.g., if someone thinks "I'm going to target 2015 because I want my code to run on the rustc shipped with various slow-moving Linux distros", it doesn't help, because you might still not be able to target their code, unless they specifically target an older version of rustc, which nobody does.

I would say that rust doesn’t have a formal specification, what it does have is close to or better than many languages “specs” with the “Rust Reference”: https://doc.rust-lang.org/stable/reference/

It really depends on how strictly you define the term specification. The Rust Reference is not required to be accurate. Though many other language compilers/implementations don’t fully implement their respective specs so, :shrug:.

The Rust Reference is very far from being complete, or even correct in what it does cover.

If I have a question whose answer isn't obvious, it's far more likely that I have to go trawling around in RFCs than that there's an answer in the reference.

I think most languages of a similar age (eg Go, Swift) are doing better.

Rust is younger than Go (released in 2015 vs 2012) and way more ambitious, especially Rust 1.0 was released as kind of an MVP and many things have changed since then, which made the maintenance of such a reference an issue. The pace of change is slowing nowadays (that's especially visible if you look new and accepted RFCs), so I hope the reference will catch up eventually.

There are some people studying Rust with formal verification. For example in this paper https://plv.mpi-sws.org/rustbelt/rbrlx/paper.pdf However I do not know if the whole language is covered or only a core.

The big gap with RustBelt is that it does not cover traits. This actually does matter: some bugs were missed because they interact with traits.

Better how? There are plenty of benchmarks showing that gcc's optimizations are more performant than llvm's.

And here are plenty that support the converse ;) I don't think there's anything that points to one being definitively better than the other in performance.

>And here are plenty that support the converse

Could you site sources please?

Like the one where they turned off null checks?


Every performant compiler on the SPECcpu benchmarks does this.

Clang has this too.

That’s a bug, not an optimization.

Llvm is better than gcc at modularity and extensibility (or at least it was when llvm was released, I haven't followed gcc evolutions in a while). People who work on new languages typically use llvm because it's designed to make such things simple.

Now, in terms of end results, llvm and gcc each have their qualities. When llvm was released, gcc typically produced faster binaries but llvm optimizations were easier to understand. Since then, both have evolved and I haven't tried to catch up.

Bottom line: having two back-ends for rust or any other language is good. Among other things, it's a good way to check for bugs within an implementation of the compiler, it can be used to increase the chances of finding subtle bugs in safety-critical code, etc.

One thing GCC excels over LLVM is quality of debug information. If you switch from Clang to GCC, you will see less "optimized out" in GDB. This is pretty much guaranteed.


GCC does some things better than LLVM. It supports more architectures, has a not broken implentation of restrict (which should be useful for Rust), and optimizes some code better. They both have their own pros and cons.

> has a not broken implementation of restrict (which should be useful for Rust)

The most recent restrict bug the rustc developers found (which made they disable restrict again) was found in both LLVM and GCC (they made a C reproducer, so they could test in both). See: https://github.com/rust-lang/rust/issues/54878 (rustc) https://bugs.llvm.org/show_bug.cgi?id=39282 (LLVM) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609 (GCC).

The interiteration issue with LLVM's noalias has been known for years:


I even filed a version of it myself:


I'm actually surprised that Rust enabled noalias usage with this known outstanding issue. When I worked on Rust years ago, it was definitely common knowledge on the compiler "team" that this was broken.

I'm equally surprised that GCC had that bug, since their pointer aliasing model is equipped to correctly handle this situation (and is why they were able to fix it quickly).

Note that GCC bug is now fixed. LLVM bug isn't.

Support for more CPU types. LLVM is limited to the mainstream architectures.

Eh, it’s not as one-sided as that. GCC has a larger number of targets, but LLVM supports several newer targets that GCC doesn’t, like WebAssembly and eBPF (although the latter is coming in GCC 10). But it would certainly be nice for Rust to support both sets of targets.

Compare the number of architectures by GCC:

> https://en.wikipedia.org/wiki/GNU_Compiler_Collection#Archit...

And the number of architectures by LLVM:

> https://en.wikipedia.org/wiki/LLVM#Back_ends

GCC supports vastly more targets.

That's the current state but LLVM is adding new targets (like the newly added AVR target), while the AVR target in GCC is under threat of removal: https://www.bountysource.com/issues/84630749-avr-convert-the...

GCC still supports more architectures even if AVR support is removed.

What part of my comment does that contradict?

Can the semantics of Rust handle some of the stranger architectures that GCC supports?

In theory, both GCC and LLVM take a front-end (in this case rust) and compile it down to an intermediate representation (IR). There will likely be some differences between the output from a front-end, but after successive optimisations have been applied this will likely disappear. By the time you get to generating assembly, you can't really tell the difference anymore so the semantics of the original language don't make an impact.

I'm sure there are a number of "reasonable" assumptions that aren't true–probably things like the number of bits in a byte, or the size of a particular integral type, or support for a particular platform behavior.

https://gankra.github.io/blah/rust-layouts-and-abis/ lists assumptions. As you suspected, one assumption is 8-bit bytes.

There are architectures on which sizeof(char)==sizeof(short)==sizeof(int) in C implementations, because it's the only way to produce efficient code.

It's a good sign of stability if a language is supported by multiple compilers. I'll probably never use gcc for Rust, but I'm still happy to see it.

I've been under the impression that GCC still has much better hardware optimizations than LLVM has.

> I've been under the impression that GCC still has much better hardware optimizations than LLVM has.

That is my experience too.

GCC for code with high level of nesting, meaning high potential for inlining (typically C++), is close to unbeatable. Including even compared Highly optimised compilers like Intel ICC.

It has not been mine. IME gcc can do slightly better than clang, sometimes, but it's generally a wash.

Google found GCC to be faster than Clang for their code, that's why they are rewriting LLVM inliner.

LLVM developers are aware of this and they are working on a new inliner specifically to address this.

GCC has a reputation of having confusing architecture. It is a very hard project to work on. LLVM is typically considered cleaner and more understandable. GCC is known to have still in 2019, a rather slight performance benefit. LLVM also has a stable IR named LLVM itself, while GCC refuses to do so over the decades for political and strategic reasons.

> LLVM also has a stable IR named LLVM itself,

LLVM's IR is not stable by design. [1, 2]

[1] http://lists.llvm.org/pipermail/llvm-dev/2016-February/09487... [2] http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compa...

> while GCC refuses to do so over the decades for political and strategic reasons.

That was a long time ago. Since GCC 4.5 (released in 2010) GCC supports external plugins. [3,4] These plugins, like the rest of GCC, use GENERIC and GIMPLE as their IR.

[3] https://gcc.gnu.org/gcc-4.5/changes.html [4] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html

Having worked with both, I don't know what you mean by "confusing architecture". Both are OK to work with, but both have some glaring holes in their documentation. LLVM's data structures are typically nicer to use than GCC's linked lists in a lot of places, that much is true.

LLVM IR is called “LLVM assembly language” in the documentation.

GCC is better than LLVM for some things and vice versa.

I was honestly curious, I will learn to ask better next time to protect my points :)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact