> Google will have the cloud TPU ... to handle training models for various machine learning-driven tasks, and then run the inference from that model on a specialized chip that runs a lighter version of TensorFlow that doesn’t consume as much power ... dramatically reduce the footprint required in a device that’s actually capturing the data ... Google will be releasing the chip on a kind of modular board not so dissimilar to the Raspberry Pi ... it’ll help entice developers who are already working with TensorFlow as their primary machine learning framework with the idea of a chip that’ll run those models even faster and more efficiently.
As a longer, elaborating point: Chisel is much closer to the LLVM compiler infrastructure project than a new hardware description language. Chisel is a front end targeting the FIRRTL circuit IR. There's a FIRRTL compiler that optimizes the IR with built-in and user-added transforms. A Verilog emitter then takes "lowered" FIRRTL and emits Verilog.
Consequently, Chisel is the tip of the iceberg on top of which the Edge TPU was built. The speakers in the video mention this explicitly when explaining the "Chisel Learning Curve" slide and doing automated CSR insertion.
As a further elaboration, Chisel is pedantically not High Level Synthesis (HLS). You write parameterized circuit generators not an algorithm that is optimized to Verilog.
Unfortunately, Chisel is built on Scala, and I have no interest in learning Scala. Though I'm intrigued by the claim of using generators and not instances, and would be interested in a white paper that explains it in PL-agnostic terms (PL: programming language).
Also have on my to-do list MyHDL , a Python solution to the same problem. (has anyone tried it and found to be better than VHDL/Verilog?)
That's a strange reason for not wanting to reap the benefits of Chisel. Care to explain your rationale?
There is another compile-to-HDL "language" called SpinalHDL, so I would actually argue that Scala's metaprogramming features seem to be a good fit for this usecase.
There are reasons why VHDL/Verilog are still in use in the industry and why high-level synthesis hasn't taken off.
VHDL/Verilog for hardware design is not broken. I won't claim that there isn't space for improvement (because there is) but there isn't anything fundamentally broken in them. They are fit for the purpose and they fulfill all of the needs we have.
What could be massively improved is actually the functional verification languages we use, SystemVerilog for verification is in serious need of an overhaul.
1) the syntax is very finicky (slighttly more so than C, I'd say). Most software languages (thanks to more experience with parsers and compilers) have moved on from things like requiring semicolons, verilog has not.
2) writing tests is awful. Testbenches are crazy confusing. Much better would be some sort of unit testing system that does a better job of segregating of what constitutes "testing code" versus the "language of the gates". You would have a hard time doing something like, say, property testing using verilog.
3) there isn't a consistent build/import story with verilog. Once worked with an engineer that literally used perl as a verilog metaprogramming language. His codebase had a hard-to-find perl frankenbug which sometimes inserted about 10k lines of nonsense (which somehow still assembled a correct netlist!) but caused gate timings to severely miss and the footprint to be bloated. It took the other hardware developers one week to track down the error.
None of these things have anything to do with the fundamental difference between software and hardware development.
SystemVerilog makes this distinction between RTL (language of the gates) and verification environment code (wrt. testing, they are different things in my experience) very clearly. SystemVerilog inherits much of what people dislike about Verilog, but it makes writing large verification environments much easier. Again, not without lots of potential pain points, but you can do an awful lot that way.
On a slight aside - it worries me (though perhaps unreasonably) about the different approaches to functional verification which come from the software world v.s. the hardware world.
Software verification seems to (generally) be a much more continious affair, while for hardware, there is an extremely intense period of verification before the product is delivered to a customer (as IP) or physically manufactured. This arises because fixing software bugs is cheap by comparison to fixing hardware (again, please accept my generalising!).
It makes me shiver a little to hear people applying software "testing" strategies and terms to verifying actual hardware. I don't know if this is reflected by their actual practice ofcourse. There is a lot of potential for the hardware community to make use of so many software development practices in their verification environments (Big systemverilog testbenches are giant class hierarchies which are far more akin to straight up software), but I'm yet to be convinced about hardware itself. The development constraints are so different, and the possiblity for continuous development is hindered by the hard cut off point (manufacture).
Maybe we should write in our native tongues without any kind of punctuation.
This feels like as shallow of a dismissal as "lisp uses too many parens"
2. Are we talking about Verilog or SystemVerilog? Verilog is not suitable for functional verification, people usually use SystemVerilog and methodologies like UVM for verification.
3. It's hard to tell what your college did exactly but it sounds like he over-engineer something himself.
You are talking about the advantages of Chisel for functional verification, not for hardware design, which was exactly the point I was trying to make.
Maybe nitpicking, but languages like Chisel and MyHDL aren't really HLS. Here there is a straight-forward mapping between the written language and the rendered result, and there should be little surprise in what logic is actually generated.
I am convinced that some specimen of this class of languages will eventually overtake verilog.
One feature I'm eagerly waiting for is an equivalent of Option/Maybe types, which makes it impossible to access some signals unless they are signaled as valid by a qualifier signal.
I'm curious about what improvements you would like to see in SystemVerilog?
- Chisel: https://github.com/freechipsproject/chisel3/wiki/Frequently-...
- MyHDL: http://www.myhdl.org/start/why.html
- SpinalHDL: https://spinalhdl.github.io/SpinalDoc/regular_hdl
I haven't used Chisel personally, but from my experience with Clash -- it is better to think of them as structural RTLs that that have vastly better abstraction capabilities than VHDL/Verilog have. And I don't mean whatever weird things hardware designers think up when they say "abstraction" and they chuckle about software programmers (before writing a shitload of tedious verification tests or using Perl to generate finite state machines or some weird shit but That's Cool And Good because most don't know the difference between a "macro" and a "preprocessor" and no I am not venting), I mean real abstraction capabilities -- for example, parametric types alone can drastically reduce the amount of boilerplate you need for many tedious tasks, and those parametric types inline and are statically elaborated much in the same way you expect "static elaboration" of RTL modules, etc to work. Types are far more powerful than module parameters and inherently higher order, so you get lots of code reuse. In Clash, it's pretty easy to get stateful 'behavioral' looking code that is statically elaborated to structural code, using things like State monads, etc, so there's a decent range of abstraction capabilities, but the language is generally very close to structural design. The languages are overall simply more concise and let you express things more clearly for a number of reasons, and often can compare favorably (IMO) even to more behavioral models (among others, functions are closer to the unit of modularity and are vastly briefer than Verilog modules, which are just crap, etc). Alternative RTLs like MyHDL are more behavioral, in contrast.
The biggest problem with these languages are that the netlists are harder to work with, in my experience. But the actual languages and tools are mostly pretty good. And yes, they do make verification quite nice -- Clash for example can be tested easily with Haskell and all Clash programs are valid Haskell programs that you can "simulate", so you have thousands of libraries, generators, frameworks etc to use to make all of those things really nice.
(This is all completely separate from what a lot of hardware designers do, which is stitch together working IP and verify it, as you note with the verification comment. That's another big problem, arguably the much more important one, and it is larger than the particular choice of RTL in question but isn't the focus here.)
Single biggest red flag when hiring engineers.
> You need to create a quantized TensorFlow Lite model and then compile the model for compatibility with the Edge TPU. We will provide a cloud-based compiler tool that accepts your .tflite file and returns a version that's compatible with the Edge TPU.
This seems like a new low in software freedom, and pretty risky to depend on as Google is known to shutter services pretty often and could just decide to turn off their cloud-based compiler at any time they feel like.
I went to their shing dig and they were working their butt off to wow the developers who were invited. When I asked for hard number they were very mum about that and very evasive.
The timeline for Nervana chip have been always seemingly in this mystical horizon that is never solidified to a real date but over yonder.
Google is going to pull this crap? They got better software expertise than Intel though they may be able to do it. But after that fiasco with Angular 1 to 2 I wouldn't trust Google with any early version number.
This is the problem with certain kinds of technology that are bumping up against the edge of innovation. They're too powerful and if these technologies get in the hands of the DIY set, governments will lose control so they have to DRM and regulate everything. Heck, it's a problem with old technology. Many weapons aren't that complicated technologically, but their production and use are tightly regulated.
Edit: I'm not saying this is a good thing, I'm just deconstructing their though process for tight control over AI tech going forward.
For some reason drones are perceived to be completely different from all weapons that have existed before them. Those killer drones have existed for half a century. They are called missiles. Also the reason why UAV based fighter jets are not viable is because a cruise missile can be launched from 1000 miles away and for the cost of a global hawk you can send out more than a hundred of them.
If terrorists have access to explosives then it doesn't matter how they deliver them because most lucrative targets (= lots of people in a small area) are stationary or predictable. A simple bagpack filled with explosives was more than enough to injure hundreds of people during the Boston Marathon.
The "right thing to do" is to open up these technologies, so that everyone can harness its power, not hide them under the wing and discretion of the (already too) powerful.
[UPDATE] I misread and assumed the previous case (where no cloud tool was required) was still true (I worked with previous versions of this device).
The way I read the quote, you use TF-Lite to produce a quantized TF-Lite model, and then use a cloud based compiler to compile it for the actual chip.
This is why I asked "am I missing something." Do you have a reference for where the compiler exists in the open source TensorFlow project?
Mostly, what I'm interested in is learning what capabilities their TPU provides, to see if it would be useful for other similar kinds of kernels like DSP (which, like machine learning kernels, also involves a lot of convolution).
So I'm interested in looking at what the capabilities of the chip are, seeing what could be compiled to it. But I haven't found those docs, or found a compiler that could be studied. But maybe I'm not looking in the right place.
Here's an overview of the architecture of their Cloud TPUs, which has some good architectural details but doesn't documet the instruction set:
Nvidia's embedded boards are EXPENSIVE. So expensive it limits the applications dramatically. They also require a different skillset in people to set up which drives up the cost.
We did an analysis for a security project that required visual inference. It turned out all the extra costs to setup with TX boards meant it actually made more sense to have mini desktops with consumer gtx cards.
I am excited to see the performance of the inference module. If it's decent at a good price, that opens up so many pi/beagle/arduino applications that were limited by both cost and form factor of existing options.
Currently the only real options for amateur off-the-shelf (accelerated) edge ML are the Nvidia boards (but small carrier boards for the TX2 cost more than the module itself) or the Intel NCS which inexplicably blocks every other USB port on the host device due to its poorly designed case. There is the Movidius chip itself, but Intel won't sell you one unless you're a volume customer. The NCS also does bizarre things: the setup script will clobber an existing installation of opencv with no warning, for example.
There are various optimised machine learning frameworks for ARM, but I'm only counting hardware accelerated boards here. I'm also not including the various kickstarter or indiegogo boards which might as well be vapour ware.
There are no good, cheap, embedded boards with USB3 that I can find. There are a few Chinese boards with USB3, but none of them have anywhere near the quality of support that the Pi has.
Then camera support. The Pi has a CSI port, but it's undocumented and only works with the Pi camera. The TX2 is pretty good, but you need to dig through the documentation to figure things out. USB is fine, but CSI is typically faster and frees up a valuable port.
Finally another issue is fast storage. It's difficult to capture raw video on the Pi because you can't store anything faster than about 20MB/s. There are almost no boards that support SATA or similar (the TX2 does), so the ability to use USB3 storage would be welcome too.
If this is offered at a reasonable price point, it could be a really nice tool for hobbyists. It looks like they're trying to keep GPIO pin compatibility with the Pi too.
Hopefully it will, since the voice and vision kits on the same AIY page are sold for $50 at Target.
> There is the Movidius chip itself, but Intel won't sell you one unless you're a volume customer
Single units are listed on this page, e.g. mini-PCIe board with Movidius VPU for $79: https://up-shop.org/25-up-ai-edge
I was referring to the Movidius Dev Kit which exists, but seems impossible to buy as a consumer.
There's a review here though: https://fossbytes.com/rock960-review-affordable-six-core-arm...
I seriously hope that's not the only way they're expecting people to compile models for this particular TPU.
Well this is just begining I am sure they are going to expand its capability.
Of course, some would say machine learning is needless complexity if you just want to scan barcodes :)
 - https://www.indiegogo.com/projects/sipeed-maix-the-world-fir...
The same goes for big companies of course. Intel has a habit of releasing Iot platforms and then killing them. Let's hope the TPU lasts a bit longer.
As for a comparison, it's impossible to say until Google releases benchmark information on the edge TPU, or some kind of datasheet for the SOM.
I definitely have been shocked how fast Intel maker boards have come and gone though. It feels like Intel has written them off before anyone's tried to build a project using one. I have one sitting around here somewhere that's never so much as been powered on.
Intel made some nice little boards, but there wasn't much publicity and actually getting started with them wasn't easy at all because the docs were buried. They were usually modules designed for integration, not standalone devices.
With the Pi you can buy a kit, plug in the SD card and boot ot desktop in minutes.
Seems like a nice way to gather ideas and data about new products.
Hardware and software are different things. We are all sad that Google Reader doesn't exist anymore, but every silicon product has basically been a flash in the pan. They make it, you buy it, and by the time it's shipped to you, it's announced as obsolete. That's the pace of that industry. Maybe with Google's attention span, they should have been a hardware company all along. They will fit right in.
I'll stick to the usual suspects before I get roped into some cloud based development system. Why does Google need access to my IP to begin with?