Hacker News new | comments | show | ask | jobs | submitlogin
ARM chips have an instruction with JavaScript in the name (stackoverflow.com)
406 points by kdeldycke 11 days ago | hide | past | web | 297 comments | favorite

If anybody else was curious, it appears that the performance win of use of this instruction looks to be about 1-2% in general javascript workloads: https://bugs.webkit.org/show_bug.cgi?id=184023#c24

I would argue a solid 1-2% can get you a promotion in HW companies. You put 5 of this improvements and that’s a generational improvement.

1-2% for a single language at the cost of polluting an entire instruction set.

Would you say the same if the language was c or c++?

Yes this is necessitated by some of JS's big warts, but the sheer amount of javascript in existence weights heavily when considering the trade-offs. You cannot ignore HTML/JS if you're targeting UI applications - it is table stakes.

Yeah, but isn't the whole point of ARM to be a reduced instruction set? How reduced are we, really, if we're dedicating transistors to the quirks of a single language?

RISC is a misleading name, the concepts of its design are not really based around the idea of a "Reduced Instruction Set" as in "small" per se, nor are CISC machines necessarily a large size instruction set.

It is much more about the design of the instructions, generally RISC instructions take a small, fixed amount of time and conceptually are based on a sort of minimum unit of processing, with a weak to very weak memory model (with delay slots, pipeline data hazards, required alignment of data etc) with the compiler/programmer combining them into usable higher level operations.

CISC designs on the other hand happily encode large, arbitrarily complex operations that take unbounded amounts of time, and have very strong memory models (x86 in particular is infamous here, you can pretty much safely access memory, without any alignment, at any time, even thought often the result will be slow, it wont crash)

As an example, the PDP-8 has fewer than 30 instructions, but is still definitely a CISC architecture, some ARM variants have over 1000 instructions but are still definitely RISC.

RISC is about making building processors simpler, not about making instruction sets and programming with them necessarily simpler.

Considering the amount of phones and tablets in the world, ARM's primary purpose right now, by a large margin, is interpreting web pages.

Consumers and even manufacturers are both focused on performance as a primary metric, not the size of an instruction set.

I'd argue at this point ARM serves more value as an instruction set which isn't encumbered by the mass of x86 patents and historical legal baggage, thus meaning it's something that can reasonably be implemented by more than just two companies on the planet.

It being RISC is secondary to that.

> Yeah, but isn't the whole point of ARM to be a reduced instruction set?

It was the point at, say, the original ARMv1.

Yeah, the problem is not that it's javascript per se, it's that it's a quirk. FCVTZS would have been just fine.

I'd argue that the whole point of ARM (as in ARM holdings) is to make money for their shareholders! :-)

One of the most popular and fastest growing languages

What is the cost of this pollution?

I am genuinely curious, I don't know much about instruction set design.

This uses existing rounding modes with pre-set flags, so it costs 1 entry in the LUT and a small number of muxes, one per flag, assuming the worst case.

I guess it's however many bytes the instruction occupys in L1 cache.

The pollution would be in the chip complexity.

It's not a random esoteric programming language, it's JavaScript.

Is that significant enough to justify the instruction?

A recent AMD microarch was 10-20% faster than the previous one, despite running on the same physical process. No single component was responsible; there were changes to several components, each of which increased speed by only 1-4%.

It is if it also increases battery life by a similar amount ;-)

It increases the die area, power consumption, and instruction set size by a miniscule amount, so probably.

A great question for the folks who actually do CPU design! As somebody who writes JS rather frequently I’m not complaining.

it depends on the cost (in gates and timing) looks like this might be just a few gates

Nothing justifies the prolonging of Javascript torture.

Nothing justifies the prolonging of C torture either, except of the C's wide spread. Why do you think modern CPUs still expose mostly C-abstract-machine-like interface instead of their actual out-of-order, pipelined, heterogeneous-memory-hierarch-ied internal workings?

>> Why do you think modern CPUs still expose mostly C-abstract-machine-like interface instead of their actual out-of-order, pipelined, heterogeneous-memory-hierarch-ied internal workings?

Because exposing that would be a huge burden on the compiler writers. Intel tried to move in that direction with Itanium. It's bad enough with every new CPU having a few new instructions and different times, the compiler guys would revolt if they had to care how many virtual registers existed and all the other stuff down there.

But why C? If you want languages to interface with each other it always comes down C as a lowest common denominator. It's even hard to call C++ libraries from a lot of things. Until a new standard down at that level comes into widespread use hardware will be designed to run C code efficiently.

> Until a new standard down at that level comes into widespread use hardware will be designed to run C code efficiently.

Exactly this hinders any substantial progress in computer architecture for at least 40 years now.

Any hardware today needs to simulate a PDP-7 more or less… As otherwise the hardware is doomed to be considered "slow" should it not match the C abstract machine (which is mostly a PDP-7) close enough. As there is no alternative hardware available nobody invests in alternative software runtime models. Which makes investing in alternative hardware models again unattractive as no current software could profit from it. Here we're gone full circle.

It's a trap. Especially given that improvements in sequential computing speed are already difficult to achieve and it's known that this will become even more and more difficult in the future, but the computing model of C is inherently sequential and it's quite problematic to make proper use of increasingly more parallel machines.

What we would need to overcome this would be a computer that is build again like the last time many years ago, as a unit of hard and software which is developed hand in hand with each other form the ground up. Maybe this way we could finally overcome the "eternal PDP-7" and move on to some more modern computer architectures (embracing parallelisms in the model from the ground up, for example).

> If you want languages to interface with each other it always comes down C as a lowest common denominator

Nope, "always" only applies to OS written in C and usually following POSIX interfaces as they OS ABI.

C isn't the lowest common denominator on Android (JNI is), on Web or ChromeOS (Assembly / JS are), on IBM and Unisys mainframes (language environments are), on Fuchsia (FIDL is), just as a couple of examples.

CPUs expose a "mostly-C-abstract-machine-like" interface because this allows chip designers to change the internal workings of the processor to improve performance while maintaining compatibility with all of the existing software.

It has nothing to do with C, specifically, but with the fact that vast amounts of important software tend to be distributed in binary form. In a hypothetical world where everybody is using Gentoo, the tradeoffs would be different and CPUs would most likely expose many more micro-architectural details.

> Why do you think modern CPUs still expose mostly C-abstract-machine-like interface

I don’t think that, because they don’t. Your premise is hogwash.

Modern RISC derived CPUs for the most part expose a load store architecture driven by historical evolution of that micro arch style and if they are SMP a memory model that only recently has C and C++ adapted to with standards. Intels ISA most assuredly was not influenced by C. SIMD isn’t reminiscent of anything standard C either.

Also you might want to look into VLIW and the history of Itanium for an answer to your other question.

> Also you might want to look into VLIW and the history of Itanium for an answer to your other question.

There's only one question. What do you mean?

But Itanium wasn't out of order. How does that even come close to answering a question about exposed out-of-order machinery?

There is one CPU that exposes its out of order inner workings, the VIA C3 ("ALTINST"). The unintended effects are so bad that people that accidentally discovered it referred to it as a backdoor: https://en.wikipedia.org/wiki/Alternate_Instruction_Set

> [..] referred to it as a backdoor

Because it was implemented in a flawed way.

> "In 2018 Christopher Domas discovered that some Samuel 2 processors came with the Alternate Instruction Set enabled by default and that by executing AIS instructions from user space, it was possible to gain privilege escalation from Ring 3 to Ring 0.["

It's not like other languages are well-adapted to that either. That's a hard target to code for.

Why couldn’t Haskell compilers make good use of that?

Well it turned out that for running scalar code with branches and stack frames exposing too much to the compiler was not helpful, especially as transistor budgets increased. So as long as we program with usual functions and conditionals this is what we have.

I can swap out a cpu for one with better IPC and hardware scheduling in 10 minutes but re-installing binaries, runtime libraries, drivers, firmware to get newly optimized code -- no way. GPU drivers do this a bit and it's no fun.

For a long time I thought the JS hate was just a friendly pop jab. From working with backend folks I’ve realized it comes from at least a somewhat patronizing view that JS should feel more like backend languages; except it’s power, and real dev audience, is in browsers, where it was shaped and tortured by the browser wars, not to mention it was created in almost as many days as Genesis says the world was built.

Huh? Python and powershell are all over the backend, and it’s hard to argue against JavaScript while you’re using those. At least in my opinion.

I think it has more to do with the amount of people who are bad at JavaScript but are still forced to sometimes work with it because it’s the most unavoidable programming language. But who knows, people tend to complain about everything.

So, there are excuses for Javascript's terribleness, but that doesn't stop it from being objectively terrible.

Sad to see the word "objective" become the next bullshit intensifier because people can't separate their own subjective opinions from the realm of verifiable fact.

"Terribleness" isn't an objective property.

And that's your opinion. I find javascript quite enjoyable and easy to use, without producing errors. YMMV.

That’s not what “objective” means.

So you're saying we should applaud reducing it by 1-2% right?

To answer the question of if it's worth it to add this specialized instruction, it really depends on how much die space it adds, but from the look of it, it's specialized handling of an existing operation to match an external spec; that can be not too hard to do and significantly reduce software complexity for tasks that do that operation. As a CE with no real hardware experience, it looks like a clear win to me.

Why whine about it? No matter your age, JavaScript will almost certainly still be in use until you retire.

This is like saying “nothing justifies the prolonging of capitalist torture”. On some level it’s correct, but it’s also being upset at something bordering a fundamental law of the universe.

There will always be a “lowest common denominator” platform that reaches 100% of customers.

By definition the lowest common denominator will be limited, inelegant, and suffer from weird compatibility problems.

If it wasn’t JavaScript it would be another language with very similar properties and a similar history of development.

Given that the instruction set already has a float to integer conversion it seems likely that the overhead of implementing this would be small and so given the performance (and presumably energy) win quoted elsewhere seems like a good move.

It would be interesting to know the back story on this: how did the idea feed back from JS implementation teams to ARM. Webkit via Apple or V8 via Google?

Correct, the overhead is minimal - it basically just makes the float->int conversion use a fixed set of rounding and clamping modes, irrespective of what the current mode flags are set to.

The problem is JS's double->int conversion was effectively defined as "what wintel does by default", so on arm, ppc, etc you need a follow on branch that checks for the clamping requirements and corrects the result value to what x86 does.

Honestly it would not surprise me if the perf gains are due to removing the branch rather than the instruction itself.

Not quite. JS is round towards zero, ie. the same as C. If you look at the x86 instruction set then until SSE2 (when Intel specifically added an extra instruction to achieve this) this was extremely awkward to achieve. x86 always did round-to-nearest as the default.

The use of INT_MIN as the overflow value is an x86-ism however, in C the exact value is undefined.

> The problem is JS's double->int conversion was effectively defined as "what wintel does by default",

No, JavaScript "double to int conversion" (which only happens implicitly in bitwise operations such as |, &, etc) is not like any hardware instruction at all. It is defined to be the selection of the low-order 32-bits of a floating point number as-if it were expanded to its full-width integer representation, dropping any fractional part.

Interesting. So it's as much an x86 legacy issue as JS and presumably JS followed x86 because it was more efficient to do so (or maybe by default).

Sounds too like performance gains will depend on how often the branch is taken which seems highly dependent on the values that are being converted?

> Interesting. So it's as much an x86 legacy issue as JS and presumably JS followed x86 because it was more efficient to do so (or maybe by default).

Most languages don't start with a spec, so the semantics of a lot of these get later specced as "uhhhhh whatever the C compiler did by default on the systems we initially built this on".

Seeing as JavaScript was designed and implemented in two weeks, I'm betting this is the answer

Today's JavaScript is so divorced, so radically different from the the original implementation to be considered a different language, though.

Your comment made me realize this is true now. Through the 80s it was the other way around. You counted as having a language with a spec, even if there was no implementation, but an implementation without a spec was a "toy"

> Through the 80s it was the other way around. You counted as having a language with a spec, even if there was no implementation, but an implementation without a spec was a "toy"

Not to my recollection. I don’t recall anyone at uni discussing a C standard until 1989, and even by 2000 few compilers were fully compliant with that C89 spec.

There were so many incompatible dialects of FORTRAN 77 that most code had to be modified at least a bit for a new compiler or hardware platform.

All of the BASIC and Pascal variants were incompatible with each other. They were defined by “what this implementation does” and not a formal specification.

It was 100% [ed] due to [/ed] the default wintel behavior. All other architectures have to produce the same value for that conversion.

The branch always has to be taken if executing JavaScript, because otherwise how would you tell if the value was correct or not? You'd have to calculate it using this method regardless and then compare!

If you always take a branch, it's not a branch.

Or did you mean by "taken" that the branch instruction has to be executed regardless of whether the branch is taken or not?

JavaScript JITs always emit this instruction when ToInt32 is required, since checking would be more expensive in user code. And the instruction always used the JS rounding method, since that's cheaper in silicon. I used branch since the parent used "branch".

What are you defining as correct?

"correct" in this case being consistent with the JavaScript specification.

Unless I misread the current arm docs, I don't think this is still present in the ISA as of 2020?

The whole RISC/CISC thing is long dead anyway, so I don't really mind having something like this on my CPU.

Bring on the mill (I don't think it'll set the world on fire if they ever make it to real silicon but it's truly different)

To understand RISC, ignore the acronym it stands for, instead just think fixed-instruction, load-store architecture. That's what RISC really means today.

No variable-length instructions. No arithmetic instructions that can take memory operands, shift them, and update their address at the same time.

Someone once explained it like “not a reduced set, but a set of reduced instructions”. Not r(is)c, but (ri)sc.

Pretty much what you say, I just liked the way of describing it.

It seems to me people are ignoring that the C stands for Complexity. What's reduced is Complexity of the instruction set, not the size of it (or even the instructions themselves). In the context of the coinage of the term, they almost certainly could have called it "microcode-free ISA", but it wouldn't have sounded as cool.

Doesn't the C stand for Computer?

Oops I'm wrong about the name but not about the spirit. This is the original paper: https://dl.acm.org/doi/pdf/10.1145/641914.641917

Is ARM even microcode free these days?

I dont think so

> No variable-length instructions

AESE / AESMC (AES Encode, AES Mix-columns) are an instruction pair in modern ARM chips in which the pair runs as a singular fused macro-op.

That is to say, a modern ARM chip will see "AESE / AESMC", and then fuse the two instructions and execute them simultaneously for performance reasons. Almost every "AESE" encode instruction must be followed up with AESMC (mix columns), so this leads to a significant performance increase for ARM AES instructions.

> think fixed-instruction, l[...]. That's what RISC really means today.

> No variable-length instructions.

So, ARM is not a RISC instruction set, because T32 (Thumb-2) instruction can be 2 or 4 bytes long.

Similarly, RISC-V has variable-length instructions for extensibility. See p. 8ff of https://github.com/riscv/riscv-isa-manual/releases/download/... (section "Expanded Instruction-Length Encoding").

What is with variable-length instruction aversion? Why is it better to load a 32-bit immediate with two 4-byte instructions (oh, and splitting it in 12/20 bit parts is non-intuitive because of sign extension, thanks RISC V authors) than with one 5-byte instuction?

Fixed width instructions allow trivial parallel instruction decoding.

With variable length instructions one must decode a previous one to figure out where the next one will start.

> With variable length instructions one must decode a previous one to figure out where the next one will start.

People said the same thing about text encodings. Then UTF-8 came along. Has anyone applied the same idea to instruction encoding?

That would eat precious bits in each instruction (one in each byte, if one only indicates ‘first’ or ‘last’ bytes of each instruction).

It probably is better to keep the “how long is this instruction” logic harder and ‘waste’ logic on the decoder.

All sorts of boundary issues occur when you're near the end of a page or cache line. How many instruction bytes should you load per cycle? What happens when your preload of a byte that happens to be in a subsequent page trips a page fault?

By comparison, 4byte instructions that are always aligned have none of those problems. Alignment is a significant simplification for the design.

(Somewhere I have a napkin plan for fully Huffman coded instructions, but then jumps are a serious problem as they're not longer byte aligned!)

> What happens when your preload of a byte that happens to be in a subsequent page trips a page fault?


...and even ARM breaks instructions into uops, just like x86.

Of course it does. The level of abstraction required for modern pipelining and OOo scheduling is still beneath ARM. I'm not that familiar with the details of arm on paper but it's not that low level by research standards.

Arm v8 is the "current" 64-bit version of the ISA and you almost certainly have it inside your phone. You might have a version older than v8.3.

I think you are thinking of the old Java-optimizing instructions on the older ARM processors.

Those never really took off…

What does it mean for the RISC/CISC thing to be dead? The distinction between them is more blurred than it used to be?

RISC / CISC was basically IBM-marketing speak for "our processors are better", and never was defined in a precise manner. The marketing is dead, but the legend lives on years later.

IBM's CPU-advancements of pipelining, out-of-order execution, etc. etc. were all implemented into Intel's chips throughout the 90s. Whatever a RISC-machine did, Intel proved that the "CISC" architecture could follow suite.


From a technical perspective: all modern chips follow the same strategy. They are superscalar, deeply-pipelined, deeply branch predicted, micro-op / macro-op fused "emulated" machines using Tomasulo's algorithm across a far larger "reorder buffer register" set which is completely independent of the architectural specification. (aka: out-of-order execution).

Ex: Intel Skylake has 180 64-bit reorder buffer registers (despite having 16 architectural registers). ARM A72 has 128-ROB registers (depsite having 32-architectural registers). The "true number" of registers of any CPU is independent of the instruction set.

Since RISC wasn't coined by IBM (but by Patterson and Ditzel) this is just plain nonsense. RISC was and is a philosophy that's basically about not adding transistors or complexity that doesn't help performance and accepting that we have to move some of that complexity to software instead.

Why wasn't it obvious previously? A few things had to happen: compilers had to evolve to be sophisticated enough, mindsets had to adapt to trusting these tools to do a good enough job (I actually know several who in the 80' still insisted on assembler on the 390), and finally, VLSI had to evolve to the point where you could fit an entire RISC on a die. The last bit was a quantum leap as you couldn't do this with a "CISC" and the penalty for going off-chip was significant (and has only grown).

You can't understand the RISC/CISC "debate" until you spend a few minutes skimming through the IBM 360 mainframe instruction set.


Not even remotely. Nothing in the RISC philosophy says anything about pure data-path ALU operations. In fact this instruction is pretty banal compared to many other FP instructions.

It strikes me as ironic that an architecture that used to pride itself on being RISC and simple is heading in the same direction as intel-levels of masses of specialist instructions.

I don't mean this as a criticism, I just wonder if this is really the optimum direction for a practical ISA

Ironically this instruction only exists because JavaScript accidentally inherited an Intel quirk in it's double-to-int conversion. This instruction should be "Floating Point Intel Convert" not "Floating Point Javascript Convert", but w/e, trademark laws and all that.

In practice the instruction made code a few percent faster, primarily because it allowed dropping a few branches at the end of every JavaScript int conversion on ARM. IMO, ARM has always been at the "edge" of RISC (at least until AArch64 threw out all of it's quirks) and the exception proves the rule here. Instructions like this exist specifically because they accelerate execution time, rather than for assembler programmer convenience. That's the same underlying reason why RISC had such an advantage over CISC before micro-op decoders and register renaming became a thing. It's not so much that "having less instructions is better", but that things like memory operands or repeat prefixes or what have you primarily act as programmer convenience at the expense of code efficiency. Stuff like FJCVTZS is used by compiled code to gain a measurable speed benefit, ergo it stays in.

May be as the time increases the threshold for being a RISC increases as the tech advances?

Emery Berger argues that the systems community should be doing exactly this -- improving infrastructure to run JS and Python workloads:


We need to incorporate JavaScript and Python workloads into our evaluations. There are already standard benchmark suites for JavaScript performance in the browser, and we can include applications written in node.js (server-side JavaScript), Python web servers, and more. This is where cycles are being spent today, and we need evaluation that matches modern workloads. For example, we should care less if a proposed mobile chip or compiler optimization slows down SPEC, and care more if it speeds up Python or JavaScript!

Personally I think this would be unfortunate as I don't think JavaScript is the path forward, but computers have always existed to run software (d'oh), which mean the natural selection will obviously make this happen if there is a market advantage.

However I see all high performance web computing moving to WASM and JavaScipt will exist just as the glue to tie it together. Adding hardware support for this is naive and has failed before (ie. Jazelle, picoJava, etc).

I partially agree. I wish TypeScript compiled directly into a JIT without compiling into JavaScript and I wish TypeScript has a strict mode that always actively requires type definitions.

> Adding hardware support for this is naive and has failed before (ie. Jazelle, picoJava, etc).

The hardware support being added here would work just as well for WASM (though it might be less critical).

Pray tell in which way this will help WASM?

WASM would have at least one f64 -> int32 conversion routine with the same semantics as JS, if possibly not the only one.

maybe this will lead to the revival of Transmeta-like architectures? I always had a soft spot for reprogrammable microcode.

WASM is _designed_ to be easily jitted, without the expensive machinery we had to put in place to do this for x86, so the whole point is not require a new architecture.

Because of this I find WASM to be the best direction yet for a "universal ISA" as it's very feasible to translate to most strange new radical architecture (like EDGE, Prodigy, etc). (Introducing a new ISA is almost impossible due to the cost of porting the world. RISC-V might be the last to succeed).

"WASM is _designed_ to be easily jitted, without the expensive machinery we had to put in place to do this for x86"

I'm not sure this is actually true, given that the original intent was for WASM to be easy to compile using existing JS infrastructure, not in general. So given that, it would make sense to carry over JS fp->int semantics into WASM. WASM is in effect a successor to asm.js.

It's certainly also not too hard to compile/jit for new architectures, but that was not the initial intent or what guided the early/mid-stage design process.

If you examine the current WASM spec, it doesn't appear to specify semantics at all for trunc. I would expect it inherits the exact behavior from JS.

True, but wasn't my point with "expensive machinery". By that I was referring to

  * discovering which instructions are executed (tracking  control flow),
  * mapping x86 code addresses to translated addresses,
  * discovering the shape of the CFG and finding loops,
  * purging translations that get invalidated by overwrites (like self-modifying code),
  * sprinkling the translated code full of guards so we can make assumptions about the original code
etc. All this is unnecessary for WASM and you can in fact make a quick and dirty one-pass translation for the 1st gear. I don't know, but it seems reasonable that they would keep the same FP->Int semantics as JS, but you rarely do that in non-JS code.

EDIT: fail to make bullets, oh well.

Is there really reason to believe that "porting the world" has gotten that much harder in the decade since RISC-V was released?

It would mean JavaScript would have to compile to the same byte code less there are multiple instruction sets.

Unlike Java there's no official bytecode and all implementation do it differently. I don't think any high performance implementation use bytecodes, but instead uses threaded code for their 1st tier, and native code for all others.

The quote isn't really saying that JS-specific instructions need be added to the ISA though.

In that sense it's not saying anything different from what we have been doing for the past 60 years.

The only significant thing that has changed is that power & cooling is no longer free, so perf/power is a major concern, especially for datacenter customers.

> In that sense it's not saying anything different from what we have been doing for the past 60 years.

Yes it is? The essay's point is that "standard" hardware benchmark (C and SPEC and friends) don't match modern workloads, and should be devaluated in favour of better matching actual modern workloads.

You think Intel designs processors around SPEC? (Hint: they don't).

ADD: It was an issue a long time ago. Benchmarks like SPEC are actually much nicer than real server workloads. For example, running stuff like SAP would utterly trash the TLB. Curiously, AMD processors can address 0.25 TB without missing in the TLB, much better than Intel.

Yeah WASM killing JS is an outside bet for this decade. Could happen. (Please Lord).

Curious, have you actually written production software targeting WASM? Because I have, in Rust, the language with the best toolchain for this by an order of magnitude.

Despite all that, and me being very familiar with both Rust and JS, it was a big pain. WASM will remain a niche technology for those who really need it, as it should be. No one is going to write their business CRUD in it, it would be a terrible idea.

I don't dispute it, but could you elaborate on the painful parts?

I find the crossing in and out of JavaScript to be less than ideal. However, I don't see why WASM couldn't evolve to require less of that, ie. expose more of what you need JavaScript for today.

> However, I don't see why WASM couldn't evolve to require less of that, ie. expose more of what you need JavaScript for today

It can, and it is. Designers are already doing all they can to make it an appealing target for a variety of languages on multiple platforms.

We are no longer in 2005. Javascript, especially in its Typescript flavor, is a perfectly capable modern language.

Its not that it isn't capable, its that it has more gocha's than most other languages of it size (and no, just because the gocha is well defined doesn't mean that it doesn't trip programmers up).

Its also despite a couple decades of hard work by some very good compiler/JIT engineers at a considerable disadvantage perf wise to a lot of other languages.

Third its most common runtime environment, is a poorly thought out collection DP and UI paradigms that don't scale to even late 1980's levels, leading to lots of crutches. (AKA, just how far down your average infinite scrolling web page can you go before your browser either takes 10s of seconds to update, or crashes?).

Many of the gotchas wouldn't ever show up if only people committed to writing JS as sensibly as they write their programs in other languages when they're being forced to do things a certain way.

But when it comes to JS and even other dynamic languages, people for some reason absolutely lose their minds, like a teenager whose parents are going out of town and are leaving them home by themselves overnight for the first time. I've seen horrendous JS from Java programmers, for example, that has made me think "Just... why? Why didn't you write the program you would have written (or already did write!) were you writing it in Java?" Like, "Yes, there are working JS programmers who do these kinds of zany things and make a real mess like this, but you don't have to, you know?"

It's as if people are dead set on proving that they need the gutter bumpers and need to be sentenced to only playing rail shooters instead of the open world game, because they can't be trusted to behave responsibly otherwise.


Regarding gotchas, it's bearable. I only have a couple on my short list: == vs === and xs.includes(x) vs x in xs, and only the latter is not reducible to a trivial rule of thumb. TS is helpful in this regard, possibly there are more in plain JS.

Regarding performance, modern JS is plenty fast, but it's not in the 'terrible' category. It's memory usage, perhaps ;) https://benchmarksgame-team.pages.debian.net/benchmarksgame/.... For performance critical code JS is not the answer, but good enough for most uses, including UIs.

Regarding UI paradigms, I'm not sure what the problem is, or what significantly better alternatives are. I did MFC/C++ in '90s and C++/Qt in the '00s, and both were vastly inferior to modern browser development. React+StyledComponents is a wonderful way to build UIs. There are some warts on the CSS side, but mostly because Stackoverflow is full of outdated advice.

It's obviously interesting to understand performance on real life JS and Python workloads and maybe to use this to inform ISA implementations.

I don't think that it's being suggested that ISAs should be designed to closely match the nature of these high level languages. This has been tried before (e.g. iAPX 432 which wasn't a resounding success!)

“We need more performance, should we fix the underlying performance problem in our software? No, we should design our CPU’s to accommodate our slow, bottlenecked language!”

Asking for CPU features to speed up Python is like trying to strap a rocket to horse and cart. Not the best way to go faster. We should focus on language design and tooling that makes it easier to write in “fast” languages, rather than bending over backwards to accommodate things like Python, which have so much more low-hanging fruit in terms of performance.

> “fast” languages

What are those?

C++, Rust, .Net would qualify, Go (doesn’t have a “release mode as such, but achieved decent speed), Julia, etc

Anything that makes an actual attempt at optimising for performance.

That's rather silly. He says:

"For example, we should care less if a proposed mobile chip or compiler optimization slows down SPEC, and care more if it speeds up Python or JavaScript!"

But anything that impacts SPEC benchmarks (and the others we use for C code) is also going to impact Python performance. If you could find a new instruction that offers a boost to the Python interpreter performance that'd be nice, but it's not going to change the bigger picture of where the language fit in.

> But anything that impacts SPEC benchmarks (and the others we use for C code) is also going to impact Python performance.

Say you work on an optimisation which improves SPEC by 0.1%, pretty good, it improves Python by 0.001%, not actually useful.

Meanwhile there might be an optimisation which does the reverse and may well be of higher actual value.

Because spec is a compute & parallelism benchmark, python is mostly about chasing pointers, locking, and updating counters.

Please No. Let the hardware do best what it's good at, being simple and running fast. Let the interpreter/compiler layer do its thing best, flexibility. There have been attempts to move the hardware 'upwards' to meet the software and it's not generally worked well. No special purpose language supporting hardware exists now that I'm aware of - lisp machines, smalltalk machines, Rekursiv, stretch, that 1980s object oriented car crash by intel whose name escapes me...

Edited to be a touch less strident.

You do understand that current hardware exists to support C, right?

What aspect of currently popular CPU instruction sets ‘exists to support C’?

Strong sequential consistency is a big one. Most architectures that have tried to diverge from this for performance reasons run into trouble with the way people like to write C code (but will not have trouble with languages actually built for concurrency).

Arguably the scalar focus of CPUs is also to make them more suited for C-like languages. Now, attempts to do radically different things (like Itanium) failed for various reasons, in Itanium's case at least partially because it was hard to write compilers good enough to exploit its VLIW design. It's up in the air whether a different high-level language would have made those compilers feasible.

It's not like current CPUs are completely crippled by having to mostly run C programs, and that we'd have 10x as many FLOPS if only most software was in Haskell, but there are certainly trade-offs that have been made.

It is interesting to look at DSPs and GPU architectures, for examples of performance-oriented machines that have not been constrained by mostly running legacy C code. My own experience is mostly with GPUs, and I wouldn't say the PTX-level CUDA architecture is too different from C. It's a scalar-oriented programming model, carefully designed so it can be transparently vectorised. This approach won over AMDs old explicitly VLIW-oriented architecture, and most GPU vendors are now also using the NVIDIA-style design (I think NVIDIA calls it SPMT). From a programming experience POV, the main difference between CUDA programming and C programming (apart from the massive parallelism) is manual control over the memory hierarchy instead of a deep cache hierarchy, and a really weak memory model.

Oh, and of course, when we say "CPUs are built for C", we really mean the huge family of shared-state imperative scalar languages that C belongs to. I don't think C has any really unique limitations or features that have to be catered to.

> Now, attempts to do radically different things (like Itanium) failed for various reasons, in Itanium's case at least partially because it was hard to write compilers good enough to exploit its VLIW design. It's up in the air whether a different high-level language would have made those compilers feasible.

My day job involves supporting systems on Itanium: the Intel C compiler on Itanium is actually pretty good... now. We'd all have a different opinion of Itanium if it had been released with something half as good as what we've got now.

I'm sure you can have a compiler for any language that really makes VLIW shine. But it would take a lot of work, and you'd have to do that work early. Really early. Honestly, if any chip maker decided to do a clean-sheet VLIW processor and did compiler work side-by-side while they were designing it, I'd bet it would perform really well.

> in Itanium's case at least partially because it was hard to write compilers good enough to exploit its VLIW design

This is half true. The other half is that OOO execution does all the pipelining a "good enough" compiler would do, except that dynamically at runtime, benefiting from just in time profiling information. Way back in the day OOO was considered too expensive, nowadays everybody uses it.

A shocking amount, but in a many cases its also what doesn't exist, or isn't optimized. C is designed to be a lowest common denominator language.

So, the whole flat memory model, large register machines, single stack registers. When you look at all the things people think are "crufty" about x86, its usually through the lenses of "modern" computing. Things like BCD, fixed point, capabilities, segmentation, call gates, all the odd 68000 addressing modes, etc. Many of those things that were well supported in other environments but ended up hindering or being unused by C compilers.

On the other side you have things like inc/dec two instructions which influence the idea of the unary ++ and -- rather than the longer more generic forms. So while the latency of inc is possibly the same as add, it still has a single byte encoding.

Address generation units basically do C array indexing and pointer arithmetic, e.g. a[i], p + i, where a is a pointer of a particular size.


In C, something like a[i] is more or less:

    (char*)(a) + (i * sizeof(*a))
And I think it will do a lot more than that for "free", e.g. a[i+1] or a[2k+1], though I don't know the details.

By having address calculations handled by separate circuitry that operates in parallel with the rest of the CPU, the number of CPU cycles required for executing various machine instructions can be reduced, bringing performance improvements.[2][3]

Here's a (doubly-indirected) example: https://news.ycombinator.com/item?id=24813376

And what about that has anything to do with C specifically? Every useful programming language requires cause precede effect, and every architecture that allows load-store reordering has memory barrier instructions. Specifically, where would code written in C require the compiler to generate one of these instructions, where code hand-written for the process's native instruction set would not?

Yep. They have a compiler to bring it down to the metal so IDK what you're saying.

--- EDIT ---

@saagarjha, as I'm being slowposted by HN, here's my response via edit:

OK, sure! You need some agreed semantics for that, at the low level. But the hardware guys aren't likely to add actors in the silicon. And they presumably don't intend to support eg. hardware level malloc, nor hardware level general expression evaluation[0], nor hardware level function calling complete with full argument handling, nor fopen, nor much more.

BTW "The metal which largely respects C's semantics?" C semantics were modelled after real machinery, which is why C has variables which can be assigned to, and arrays which follow very closely actual memory layout, and pointers which are for the hardware's address handling. If the C designers could follow theory rather than hardware, well, look at lisp.

[0] IIRC the PDPs had polynomial evaluation in hardware.

The metal which largely respects C's semantics? For example, here are some instructions that exist to match C's atomics model: https://developer.arm.com/documentation/den0024/a/Memory-Ord...

I've done work on a proprietary embedded RTOS that has had high level versions of those barriers at least a decade before the C atomics model was standardized (and compiles them to the closest barrier supported by the target architecture).

I suspect that the OS and Architecture communities have known about one-way barriers for a very long time, and they were only recently added to the Arm architecture because people only recently started making Arm CPUs that benefit from them. And that seems like a more likely explanation than them having been plucked from the C standard.

Moreover, one-way barriers are useful regardless of what language you're using.

[0] Close, it was the VAX-11 and is the poster child for CISC madness.

Let the hardware do best what it's good at, being simple and running fast. Let the interpreter/compiler layer do its thing best, flexibility.

Yeah, this is pretty much the opposite of what actually works in practice for general-purpose processors though – otherwise we'd all be using VLIW processors.

The complex processors like the PDPs were followed by risc processors because they were simpler. The hardwrae has to run code, I get that, but VLIW didn't work. Risc did. The x86 decompiles its opcodes into micro-ops which are load/store risc-y simple things. Simplicity was always the way to go.

I do take your point about VLIW, but I'm kind of assuming that the CPU has to, you know, actually run real workloads. So move the complexity out of the languages. Or strongly, statically type them. Or just don't use JS for server-side work. Don't make the hardware guys pick up after bad software.

When RISC first appeared they really were simpler than competing designs.

But today I think it's hard to argue that modern pipelined, out of order processors with hundreds of millions of transistors are in any sense 'simple'.

If there is a general lesson to be learned it's that the processor is often best placed to optimise on the fly rather than have the compiler try to do it (VLIW) or trying to fit a complex ISA to match the high level language you're running.

I guess if "as simple as possible but no simpler" then I'd agree.

> ...rather than [...] trying to fit a complex ISA to match the high level language you're running

Again agreed, that was the point I was making.

It seems like every 2 months I feel the burn of JS not having more standard primitive types and choices for numbers. I get this urge to learn Rust or Swift or Go which lasts about 15 minutes... until I realize how tied up I am with JS.

But I do think one day (might take a while) JS will no longer be the obvious choice for front-end browser development.

Funny, I have a small JavaScript app I have abandoned because I find developing JS so awful. Now that I have ramped up in Rust I am very tempted to rewrite it as Rust has first-class WASM support. Unfortunately I'd still need JavaScript for the UI bits.

IMO: Rust isn't the easiest language to learn, but the investment pays off handsomely and the ecosystem is just wonderful.

EDIT: I meant "to learn" which completely changes the statement :)

>But I do think one day (might take a while) JS will no longer be the obvious choice for front-end browser development.

I think that day might be sooner than anyone thinks- Chromium is dominant enough now that their including Dart as a first-class language (or more likely, a successor to Dart) will likely be a viable strategy soon.

Of course, the wildcard is Apple, but ultimately Dart can compile down to JS- being able to write in a far superior language that natively runs on 80% of the market and transpiles to the rest is suddenly much more of a winning proposition.

I feel like kotlin is a much better language than dart, has many more use cases and compiles down to javascript also.

Dart is weak for functional programming / functional way of thinking. Right there, dart lang loses me as a potential user.

If you want that, you can start with TypeScript and name your number types. Doesn’t do anything real at the moment, but subsections of code can then be compiled as assemblyscript to wasm.

I dunno how you folks do it, but I admire anyone that can stick with JS. It's just so...bonkers, every aspect about it, from the obvious "wat" moments, lack of integers, to the mental overhead of libs, modules, frameworks, templates, and packing. But I'm glad I have coworkers that grok it.

I like trying new languages and have played with dozens, and JS and Prolog are the only languages that have made me actually scream.

You should definitely try to branch out. At the very least it gives you new ways of thinking about things.

Go or dart is probably the best for "JS killer" in terms of maturity, tooling, and targeting the front end, followed by haxe, swift and/or rust (they may be better even, but frankly I'm not as familiar with them).

Honestly modern JS has far fewer "wat" moments than most languages, modern javascript is incredibly well designed. And JavaScript does have integers now.

Nowadays it feels like the opposite, the committee takes so long to dot the i's and cross the t's that features take multiple years to make it through the approval process be ready to use (I'm looking at you, optional chaining).

You could split the difference and go C# or Java and use one of their web stacks.

With WebAssembly that day is nearing. However it is certainly many years out.

JS supports

    bools (Boolean)
    floats (Number)
    ints (x|0 for i31 and BigInt)
    arrays (Array, and 11-ish variants of TypedArray)
    linked lists (Array)
    sets (Set and WeakSet)
    maps (Map and WeakMap)
    structs (Object)
    genSym (Symbol)
    functions (Function)
    strings (String)
What are they missing that you are dying to have?

Are you saying you'd like to have specific int32, int64, float32, float64 types?

Strong typing and static typing - yes please.

In terms of adding types to deeply dynamic languages I think Julia had the best approach: sub-typing and union types.

It has the advantage that it is possible to give a reasonable type to most functions without a rewrite (even if the types would be terribly long to accommodate the weak underlying type system)

Yes this specifically.

What other numeric types would you like to see?

In addition to the existing doubles, ES2020 added support for signed integers.

My guess is that this is what JIT people asked for.

The JSC JIT people seemed kind of surprised by this, which was weird. Maybe the V8 JIT people asked for it?

Is there a source for that? I was under the impression this initially shipped on iOS devices so it'd be weird for JSC to be surprised by it.

According to this bug it was shipped six months after the bug was created and they didn't seem super familiar with it.

It was public knowledge that Apple had a shipped implementation using this instruction before that patch was merged.


I also don't really see any indication that the project maintainers don't know about the instruction.

Looks like an example of JSC not being developed in the open more than anything else.

are you referring to the tweet linked there? That tweet is wrong in a lot of ways, because that instruction gives you a 1-2% boost, not a 30% boost like they claimed.

Don't sufficiently advanced compilers infer what the real type of a variable is, in the most important cases?

First comment on question: “The JavaScript engine has to do this operation (which is called ToInt32 in the spec) whenver you apply a bitwise operator to a number and at various other times (unless the engine has been able to maintain the number as an integer as an optimization, but in many cases it cannot). – T.J. Crowder”

Edit: From https://www.ecma-international.org/ecma-262/5.1/#sec-9.5

  9.5 ToInt32: (Signed 32 Bit Integer)

  The abstract operation ToInt32 converts its argument to one of 2 integer values in the range −2³¹ through 2³¹−1, inclusive. This abstract operation functions as follows:

  Let number be the result of calling ToNumber on the input argument.
  If number is NaN, +0, −0, +∞, or −∞, return +0.
  Let posInt be sign(number) * floor(abs(number)).
  Let int32bit be posInt modulo 2³²; that is, a finite integer value k of Number type with positive sign and less than 2³² in magnitude such that the mathematical difference of posInt and k is mathematically an integer multiple of 2³².
  If int32bit is greater than or equal to 2³¹, return int32bit − 2³², otherwise return int32bit.

  NOTE Given the above definition of ToInt32:

    The ToInt32 abstract operation is idempotent: if applied to a result that it produced, the second application leaves that value unchanged.

    ToInt32(ToUint32(x)) is equal to ToInt32(x) for all values of x. (It is to preserve this latter property that +∞ and −∞ are mapped to +0.)

    ToInt32 maps −0 to +0.

Remember not to refer to outdated specs; the modern version is at https://tc39.es/ecma262/#sec-toint32 . The changes look editorial modernizations (i.e., I don't think there have been any bugfixes to this low-level operation in the 9 years since ES 5.1 was published), but it's better to be safe than sorry, and build the right habits.

We do, but there are still times when a double -> int conversion is necessary, this is true in every other language as well.

The real problem is that JS inherited the x86 behavior, so everyone has to match that. The default ARM behavior is different. All this instruction does is perform a standard fpu operation, but instead of passing the current mode flags to the fpu, it passes a fixed set irrespective of the current processor mode.

As far as I can tell, any performance win comes from removing the branches after the ToInt conversion that are normally used to match x86 behavior.

Many important cases are polymorphic so have to be able to handle both.

A good follow up to "HTML5 Hardware Accelerator Card" from yesterday: https://news.ycombinator.com/item?id=24806089

SPARC processors have tagged add and subtract instructions to help dynamic languages.

IBM has hardware acceleration for XML processing, of all things, so there is plenty of precedent for this.


... which were removed in SPARCv8 (64-bit) because nobody used them.

Anyone else remember the ARM Jazelle DBX extension? I wonder if they'll end up dumping this in this the same way.

I don't remember very many phones supporting DBX, but IIRC the ones that did seemed to run J2ME apps much smoother.

It's quite different though, Jazelle literally implemented java bytecode in hardware.

These instructions "merely" performs a float -> int conversion with JS semantics, such that implementations don't have to reimplement those semantics in software on ARM. The JS semantics probably match x86 so x86 gets an "unfair" edge and this is a way for ARM to improve their position.

It takes a lot more silicon to implement Jazelle than a single well-tailored ALU instruction. Moreover, while the market for "JVM on small Arms" never really took off, the market for "JavaScript on Arms of all sizes" is flourishing and has been for at least a dozen years.

This one is at least documented and in use by major browsers, so I doubt it will go away anytime soon.

All cpu have instructions with nsa in the name. Well not in the name, but added at their behest at some point in history.

Nice. Also useful is a set of integer operations that raise an exception on overflow.

The dream of the Lisp machine is long dead. Our computers are now JavaScript machines.

So it was easier to add an instruction in silicon to cater for an ill-designed programming language, than to change the language itself?

I mean, if float-to-integer performance is so critical, why was this not fixed a long time ago in the language? What am I missing?

> What am I missing?

ARM wants to be appropriate for more workloads. They don't want to have to wait for software to change. The want to sell processor designs now.

> What am I missing?

I guess the incredibly long tail of JavaScript deployments. You could change the language today and it would take years for it to percolate down to all the deployments out there. By comparison a silicon change requires zero code changes and is entirely backwards compatible. Plus there's a pretty simple marketing argument: "buy our CPU, it makes commonly-encountered workloads faster", vs "buy our CPU, when everyone upgrades the platform they use it'll be faster"

One is a technology change, the other is a culture change. The former is infinitely easier than the latter, in my opinion/experience.

Consider Node JS is a top server side language, and arm for data centers is coming ( to the extent it's not already here), this makes sense.

Technically someone can just make a better VM engine for JavaScript to execute inside of, but whatever I guess they decided this would be easier.

I'd say JS on the mobile ARM devices is 10.000x more common, and thus important, than the NodeJS on ARM servers.

Amazon has already rolled out their own ARM servers at a very cheap price. A lot of well-funded companies are headed in that same direction. Adding this instruction encourages those designs too (very good from a licensing standpoint).


Do you mean JS inside of browsers themselves ? Or JS running in another manner


> Which is odd, because you don't expect to see JavaScript so close to the bare metal.

This seems to ignore all the work done on server-side javascript with projects such as node.js and deno, as well as the fact that cloud service providers such as AWS have been developing their own ARM-based servers.

No it doesn’t, it’s still surprising to see a CPU instruction added specifically to cater for such a high level language like JS.

It does totally makes sense though, given the importance of JS and it’s common use in mobiles.

Not surprising to me to see CPU instructions being added for widely used use cases. Basically, a CPU instruction set maker's job is to perform clustering of the tasks that the CPUs do and accelerate commonly used tasks. If people do a lot of AES, the instruction set maker adds AES extensions. If people do a lot of CRC, they add CRC instructions. If people convert doubles to integers all day in a very JS specific way, then that's the instruction they add.

With reduced instruction sets, though, the idea is to provide common subparts and make them fast rather than dedicated instructions. While it's not odd to see dedicated instructions in CPUs in general, it's jarring to see if you're familiar with ARM chips from back when they had 25 instructions total.

Yeah that's how RISC vs CISC is taught in class, I've heard that same thing. I think it's an outdated paradigm though, if it's not been wrong all along. A CISC chip can still officially support instructions, but deprecate them and implement them very slowly in microcode, as is done with some x86 instructions. And a RISC chip manufacturer might just have used this phrase as a marketing line because designing RISC chips is easier than starting with a monster that has tons of instructions, and they needed some catchy phrase to market their alternative. They then get into the gradual process of making their customers happier one by one by adding special, mostly cold, silicon targeted for special workflows, like in this instance.

Ultimately, the instruction set isn't that relevant anyways, what's more relevant is how the microcode can handle speculative execution for common workflows. There's a great podcast/interview from the x86 specification author: https://www.youtube.com/watch?v=Nb2tebYAaOA

Can you name any other instructions with the name of a programming language in the actual instruction name?

No? Then it seems way more specific than the other examples you listed. So specific that it’s only applicable to a single language and that language is in the instruction name. That’s surprising, like finding an instruction called “python GIL release”.

Not necessarily programming languages (in similar vain as what est31 said) but cryptography has made its way into instructions, see SHA1RNDS4, SHA1NEXTE, SHA1MSG1 and more. They are not general cryptographic primitives but specific instructions for computing a specific cryptographic hash, just because SHA became popular. Also has SHA in it's name :)

Yes, the examples I quoted AES and CRC all have dedicated instructions in x86 processors (at least the ones that implement the extensions).



As does ARM:



You could say that these are language-agnostic, and indeed they are, but in a certain way they are more specific than the JavaScript operation, because the JavaScript operation is likely used in many different algorithms. I'd argue that from the point of view of a chip manufacturer, the difference matters only little. Both are tasks that the CPUs do often and thus cost energy. By implementing them natively, the CPU manufacturer reduces the cost of those workflows for their clients.

In addition to what the sibling comments said, note that "Python GIL release" is a very high level concept and how the GIL is implemented can change over time. It interacts with how the OS implements locking. FJCVTZS on the other hand is a (mostly) pure function with well defined inputs and outputs, and will likely be useful for Javascript implementations for a long time. Will JS be around in 10 years? Very likely. So ARM adds an instruction for it.

The people at ARM have likely put a lot of thought behind this, trying to find the places where they, as instruction set vendor, can help making JS workflows easier.

And btw, I'm pretty sure that if python has some equally low hanging fruit to make execution faster, and python enjoys equally large use, ARM will likely add an instruction for it as well.

Based on other comments:

The Javascript spec decided to standardize float operations and (in part) opted for whatever x86 did. ARM had different floating point profiles so ARM processors had to emulate in software intel's operations. Thus ARM decided to simply implement intel's operation as an instruction and called it by the relevant use case.

Off the top of my head, and ignoring the Java bytecode instructions added at one point, no. I can however think of quite a lot of crypto related instructions, and many CPU and MMU features aimed at very specific language features such as garbage collection.

Well if you go far enough back, it was not usual to find special allowance for programming language constructs (like procedure linkage that supported display, useful for languages like Pascal with nested procedures etc).

The reason you don't find this is that all "modern" processors designed since ~ 1980 are machines to run ... C as the vast majority (up until ~ Java) of all software on desktops and below were written in C. This also has implications for security as catching things like out of bound access or integer overflow isn't part of C so doing it comes with an explicit cost even when it's cheap in hardware.

If they had named the instruction FCVTZSO, would you care? Would you even know?

It doesn't surprise me because we are supposed to expect commonly used programming languages to receive optimizations. If we didn't follow this logic there would be no reason to optimize C. Instead each processor company would optimize for their own proprietary language and expect everyone to use the proprietary language for high performance needs.

> No it doesn’t, it’s still surprising to see a CPU instruction added specifically to cater for such a high level language like JS.

It surprises you personally, but if you think about it it's easy to understand that widespread interpreters that use a specific number crunching primitive implented in software can and do benefit from significant performance improvements if they offload that to the hardware.

You only need to browse through the list of opcodes supported by modern processors to notice a countless list of similar cases of instructions being added to support even higher-leve operations.

I mean, are you aware that even Intel added instructions for signal processing, graphics, and even support for hash algorithms?

And you're surprised by a floating point rounding operation?

Considering that JS is JIT-compiled and doesn't have an integer type, this is not suprising to me at all.. (but still interesting)

Just because a language is used on servers / at scale / in enterprise, doesn't make it any closer to the metal. I mean, look at Python, Java, or Ruby: All used on servers for "serious" applications, but certainly not any sort of "bare metal" languages still.

Some ARM chips used to have bare-metal Java bytecode support, called Jazelle.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact