Hacker News new | comments | show | ask | jobs | submitlogin
H3: A Hexagonal Hierarchical Geospatial Indexing System (github.com)
140 points by omot a year ago | hide | past | web | 45 comments | favorite

For anyone else wondering, like I did, why you would use a hexagonal indexing system instead of a (roughly square) latitude longitude system, from https://eng.uber.com/elk/:

"Data size matters in Elasticsearch; a large index means too much data, caching churn, and poor response time. When query volumes are large, ELK nodes become overloaded, causing long garbage collection pauses or even system outages.

To address this, we switched to hexagonal queries, dividing our maps into hexagonal cells. Each hexagonal cell has a string ID determined by the hexagon resolution level. A geodistance query can be roughly translated to a ring of hexagon IDs; although a hexagonal ring is not a circular surface, it is close enough for our use case. Due to this adjustment, our system’s query capacity more than tripled."

Discrete Global Grid Systems (DGGS), which is what this touches on, are a deceptively complex topic. You are optimizing for many performance dimensions, and some things that intuitively seem like they should be optimized for don't really matter, and other things people tend to ignore (like congruency) matter quite a bit.

For example, "equal area" subdivision of the surface is not a particularly useful property. This seems like it is wrong on its face such that most people try to achieve it but you have to remember that equal area only matters if your data is uniformly and predictably distributed. Geospatial data models are neither in a pretty severe way as a rule, which means you'll have to deal with data load asymmetry regardless via some other mechanism. If you can ignore equal area because it is handled by some other mechanism, it opens up other possible surface decompositions that have much stronger properties in other ways.

Compactness of representation and cost of computing relationships on the representation has a large impact on real-world index performance that is frequently ignored. The poorness of lat/lon grids in this regard is often significantly underestimated. A singularity-free 3D DGGS can have an indexing structure that is an order of magnitude smaller than a basic 2D lat/long grid for addressing, and the storage on disk of records encoded in these DGGS are also significantly smaller. This all adds up to a lot of performance.

Hexagonal grids tend to work particularly well for visualization. However, they do have their own significant weaknesses e.g. it is typically not a good representation for join operations and they are relatively expensive to search at large scales relative to some other DGGS tessellation systems.

Hi Andrew - You went to school for chemistry and chemical engineering. I'm still experimenting with ideas for binary codes of succinct/implicit geometric representations for similarity graph embeddings of non-spatial property data like text and numbers. I keep bumping into the crystallography literature -- the traditional lattice structures and also quasicrystal / fibonacci chain representations. How much of your chemical engineering background have influenced your designs?

Almost no influence as related to spatial structure; I tend to view spatial structure as an information theoretic manifestation.

On the other hand, chemical engineering had a big influence on how I reason about distributed systems. That discipline is essentially about the design of complex, continuous flow, coordination-free distributed computation systems that are robustly stable in an efficient equilibrium. It maps directly to computer science but has a concept of the problem space that I think is much more refined than what you commonly see in computer science though it is never expressed in computer science terms. But it makes sense, it is chemical engineering’s One Job.

>robustly stable in an efficient equilibrium

I would like to read more about what this means and how it applies to distributed systems. I have a physics background, but I sense that chemists tend to have a much more intricate (and interesting) conception of "stability".

I found it interesting to discover that there's a significant amount of cutting-edge HPC development along these lines for meteorology and climate modeling: https://www.hpcwire.com/2017/06/22/gpus-power9-figure-promin...

For those interested in the variety of uses for hexagonal / hierarchical indexing, Dr. Kevin Sahr at Southern Oregon University has produced an open specification and a number of ancillary research around the topic: http://www.discreteglobalgrids.org/information/

There are several options for partitioning your grid, and the geometric consequences on your index. This overview in particular should be accessible even without a deep GIS background: http://webpages.sou.edu/~sahrk/sqspc/pubs/xuzhouKeynote13.pd...

That's interesting. It looks like the ideas are similar to a Haskell library I wrote awhile back to tile a sphere with hexagons (and a few pentagons):


My tiling had a fixed resolution that you specify up-front; I didn't really consider how to make my grid hierarchical.

S2, linked from the README[1], does the same hierarchical subdivisions in squares, with each square having a string ID.

Squares divide into sub-squares exactly, but hexagon's don't sub-divide into hexagon's neatly.[2]

Can someone explain the advantages of hexagons over squares in this use case?

[1]https://code.google.com/archive/p/s2-geometry-library/ [2]https://www.illustrativemathematics.org/content-standards/ta...

Gosper island shapes do subdivide into gosper island shapes, if you want something tricky: https://en.wikipedia.org/wiki/Gosper_curve#Properties

See sibling post for links, but the short answer: you can use 12 pentagons, strategically interspersed. Like a (soccer) football.

edit: Oops, that only explains the tiling. For explanations of how the hierarchy descends, Dr. Sahr produced some GIFs that give a little bit of an overview of how the resolutions overlay: http://webpages.sou.edu/~sahrk/dgg/images/topogif/topogif.ht...

Thanks for that link.

Also found this which goes into some more detailed comparison with rectangular grids:


Boundaries between all 6 neighbors are equal. With S2, four are on the sides and top are equal and the corner neighbors have only a single point as their boundary.

This means you can better simulate dynamic systems where there is flow between cells.

Of course a squareish grid can also be used to approximate a disc or ring. The approximation is not as pretty, but I don't see how it makes much difference for a query system.

A triangular grid (or if you like, a grid of hexagonal pixels) is more efficient and closer to isotropic than a square grid. Circles pack more closely in a hexagonal arrangement than a square one.

So there are a few reasons to favour hexagons (or triangles) over squares....

Squares don't pack as closely as hexes, meaning that you can get 18% more hexes into a space than squares for a given perimeter-size. Squares also degenerate badly over the surface of the sphere.... you can't keep a consistent size as you change latitude, and they turn into triangles when you reach the poles.

Google S2 (a square-based alternative to Uber's H3) works by projecting the surface of the sphere onto the surface of a cube, then subdividing each face of the cube. So there's still some distortion around the edges and corners of the cube, but nowhere near as bad as a naive longitude/latitude strategy.

I thought it was to make best use of all those hexagonal monitors ?

What's with the output of the hexagon's vertices? There technically isn't a 285th degree line of meridian, or are they saying, 285 degrees east from 0? In which case, why don't they just subtract 360 from all values greater than 180 (since the input is in decimal degrees)?

Looks like it's just a rough edge per this very insightful comment: https://news.ycombinator.com/item?id=16136912

OGC has a spec for Discrete Global Grids at http://docs.opengeospatial.org/as/15-104r5/15-104r5.html

It has a good overview of the implementation details, and pictures!

I used to work at Uber and worked on H3. They're working on a blog post to explain some more, but if you clone the repo and build the doxygen docs, you'll get more explanation.

The big difference between this and S2 is that hexagons have only one kind of neighbor, neighbors that border on an edge, instead of the two kinds of "checkerboard" neighbors that squares have. This lets you do some interesting analysis on data gathered with hexagons.

Movement between neighbors could be modelled as a current flow, with the edge being a cross-sectional area between them, and since the edges are all equal (unlike squares) you can consider it a constant factor on that analysis, and then drop it and simply use the directional flow counts (the H3 UniEdge index) directly.

H3 takes better care than S2 to keep both hexagon area and shape distortion to a minimum together (area distortion is about the same as S2, but shape distortion is much better than S2), so the hexagons look almost the same almost anywhere on the planet, and so data gathered in one city on the planet is directly comparable to data gathered in another city, which can let you do interesting things like classify "types" of hexagons (urban, suburban, etc) and then make predictions about cities you haven't even set up shop in, yet, based on similar cities.

Hexagons are also the most efficient way to pack spheres, and can best approximate a circular radius with a discrete grid, so they're also useful for doing fast approximations of field-effect calculations (like electromagnetic fields from discrete particles). You could count drivers as a negative charge and riders as a positive charge, for instance, and use EM equations to determine the biggest imbalances in supply and demand distribution, and this will let you do it very quickly with little error.

The hexagons themselves can get down to a granularity below GPS resolution error, so you could, without any effective losses, pack 3 doubles (lat, lng, error) into a single 64-bit integer (H3 index of an appropriate size based on the error) and reduce bandwidth usage on location data.

The H3 index for any particular resolution is ordered in something similar to a Gosper curve[1] so if you really need just a rough approximation of data to query from an area, you actually only need to store the two indexes at the beginning and end of the gosper-like curve you're interested in.

This C library wasn't meant to be used directly by most developers and that's why it has a few rough edges (like not centering the output around the meridian (-180, 180) instead of the current (0, 360) output). I can't wait until the bindings come out, too, probably with the blog post. :)

[1]: https://en.wikipedia.org/wiki/Gosper_curve

If you want to do “field-effect calculations” or care about preservation of shapes you probably want a conformal grid. If you want to trade off area vs. shape distortion, it’s probably better to pick a projection which tries to optimize distance errors. Using a gnomonic projection for each icosahedron triangle is probably not the best choice for pretty much any application, though it has the advantage of being less work to implement.

There are lots of reasonable choices for high-level grid shapes, e.g. http://www.lib.ncep.noaa.gov/ncepofficenotes/files/on467.pdf

For human comprehensibility of coordinates I would recommend instead starting with an octahedron as the basic geometric skeleton.

Thanks for sharing these details!

This looks like a completely new implementation with none of the original DGGrid source. (Unsurprising for some license prohibitions of parts of it that would prevent Uber from using it.)

I haven't had a ton of time to dig through the source, but haven't seen some of the utilities for things like bulk binning of coordinates. (Hopefully the bloggers will talk about this a little bit.) When you worked on it, was Dr. Sahr involved with any of the new API adaptations? [edit: Yes, I see!] He and I had chatted about feature wishlists, and iOS / mobile bindings was at the top of our list a few years ago, but neither of us had much time to work on it. :-)

Yep, Dr. Sahr came to the office a few times when we were first getting the original version done. He implemented the core of the algorithm and we took on many of the utility methods, most of them in https://github.com/uber/h3/blob/master/src/h3lib/lib/algos.c

Then we dug in on code formatting, performance tuning (we've removed almost all of the H3IndexFat struct representation usage and switched most things to bitwise operations), and testing coverage.

As for the bindings, I don't know which will be open sourced, so I can't say more, but we had asked Dr. Sahr to make sure the API itself never allocates memory, it must be passed in allocated memory, so it's not too hard to make bindings that work with both manual memory management and garbage-collected languages.

Making the emscripten-compiled front-end Javascript "binding" was a lot of fun, though. :)

Awesome! It has been awhile since we spoke, but it is immediately obvious how much cleaner this codebase is compared to the earlier research work. Even if I have to write my own bindings, this is a huge leg up.

A lot of the stuff I'm wanting is stuff I wouldn't expect Uber to care about, but it doesn't hurt to ask. Did you look at implementing pathfinding? (As I recall, Dr. Sahr said A* should be easily implementable in this scheme. Elsewhere in the thread someone mentioned that joins are not easy. Any other tidbits like that that came up?)

Spatial join performance and scalability is sensitive to tessellation geometry, particularly when dealing with polygons and other non-point geometries or if you have high precision requirements. The abstract join algorithms are similar but it is much more difficult to make them rigorously correct in the general case with hexagonal tessellations, and they'll never be particularly fast at scale.

One of the main reasons that equal-volume cubic tessellations have emerged as the default choice for high-scale analytical DGGS is that they are nearly optimal for scaling out spatial joins between arbitrary geospatial data models. And relatively optimal in most other regards as well, especially computationally; the primary "downside" is that they are 3-dimensional, which is slightly wasteful, though more and more geospatial analysis applications make good use of a direct 3-dimensional representation.

Could you point me at any references on equal-volume cubic tesselations? I've found Dr. Sahr's work on DGGS incredibly accessible given my rudimentary understanding of the math disciplines involved.

An unfortunate aspect of all this, is how few good implementations of these algorithms exist outside of big commercial GIS packages. I'm extremely grateful that a public university financed this particular research originally, or we might not have gotten a well funded open source library.

Shortest path was something we were thinking of implementing, but yes, we didn't have an immediate need for it so we didn't. The main complexity is switching between different IJK coordinate systems across different icosahedron faces.

If you only need to know the hex grid distance between the hexagons, but not the actual path, there's a quicker algorithm looking for a common parent between the hexagons (but can have unpredictable slower paths when the hexagons don't share the same base cell, and might have to fall back to an A* algo in that case, anyways).

I don't know what exactly they mean by "joins" elsewhere. We usually use it as a hash index, but the index has a gosper-like curve so b-tree indexes work on it, too, for particular use-cases, and you can also use the parent-child operations (just some bit twiddling) to get approximate area joining super cheap, too.

This was a very helpful explanation, thank you! Does using hexagons represent the start of a paradigm shift for mapping applications or is it something that has been used for a while? This is the first I have heard of it, but based on your explanation it makes a lot of sense.

There have been many uses of hex grids and maps for a while, such as Civilization V, and this awesome hex grid resource from a gaming company: https://www.redblobgames.com/grids/hexagons/

Others have pointed out various indexing systems that NASA uses that support hexagons, as well. H3's advantages over the others is, in my opinion, that we tried to marry as much of the awesome S2 library into a hexagonal grid (short 64-bit addresses for all hexagons at all resolutions, a parent-child tree with no shared parents, minimized area distortion, and a pretty simple API with some built-in utilities, like geofence polyfill, hexagon compaction, GeoJSON output, etc), where the hexagonal properties give you the other advantages I outlined over S2.

To be perfectly fair, there are some things that S2 will still do better than H3, most notably that the area coverage of a parent perfectly matches all of its children, where that's not the case with H3, though we minimized it as well as I think is possible.

Kevin Sahr (whom others have cited in here) worked with us on this library and came up with the parent-child orientation and scaling, and implemented the original version of the code.

Hexagonal pixel grids have been used for decades by many different people for various GIS/mapping applications. If you do a google scholar search you will find hundreds of relevant papers. They’re still uncommon compared to square grids, because they are less convenient in several ways, and less familiar – every part of our culture (textiles, architecture and construction, mechanical engineering, urban planning, writing and books, furniture, visual art, toys, cartography, circuit boards, computer displays, abstract mathematics, ...) is deeply rooted in rectangular grids.

Like most tools/conventions, there are trade-offs involved.

Is every face a hexagon?

I thought to cover a sphere you had to throw in at least a few things that weren't hexagons. Euler and all that.

There are pentagons:

> The first H3 resolution (resolution 0) consists of 122 cells (110 hexagons and 12 icosahedron vertex-centered pentagons), referred to as the base cells.

From: https://github.com/uber/h3/blob/9df3940473e2f0446cf38e22443c...

If you are willing to have some pairs of hexagons border each-other along two edges, you can squint and pretend to cover a sphere by hexagons alone. http://archive.bridgesmathart.org/2017/bridges2017-237.pdf

The way the discrete global grids people handle it (TFA is an implementation of that) is generally to stick 12 pentagonal pixels in their grid at the corners of an icosahedron.

See Buckminster Fuller's Dymaxion map also [0](https://en.wikipedia.org/wiki/Dymaxion_map)

Can anyone compare/contrast this with geohex.org? I've used geohex in a past life, just curious what, if anything, H3 might offer over it.

I have never heard of this library before, which is surprising, because I've been looking hard for such a hexagonal privacy-fencing system for a long time!

I can't find anything that explains how the partitioning in geohex works, but the Uber system makes it clear that they are using Dr. Sahr's partitioning scheme: https://github.com/uber/h3/blob/master/docs/doxyfiles/h3inde...

Also see NASA JPL's HEALPix v3...

"A multi-resolution HEALPix data structure for spherically mapped point data" (2017) http://www.heliyon.com/article/e00332

So MGRS[0] with hexagons? Pretty interesting.

[0] https://en.wikipedia.org/wiki/Military_Grid_Reference_System

Seems more like Hierarchical Triangular Meshing with hexagons. I had to use that for a project a few years back.



No, the military grid is based on (a whole bunch of distinct skinny slices of) transverse Mercator projections, which don’t really line up properly at the seams, plus stereographic projections at the poles. It was largely designed for use with paper maps.

Discrete global grid systems based on hexagons are an idea which differ in most respects from that, except for the part about hierarchical coordinates (which can also be done using any arbitrary other map projection).

Hexagonal? Is this inspired in any way by Borges's Library of Babel?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact