Hacker News new | comments | show | ask | jobs | submitlogin
Multi-node TimescaleDB is now free (blog.timescale.com)
482 points by manigandham 13 days ago | hide | past | web | 215 comments | favorite





I really appreciate that the linked article uses the phrasing "source-available", the lower case "free", and doesn't use the phrase "open source". Terminology matters a lot.

For me, a lot of the value in Free software comes from being able to make modifications to the software (either yourself, or by hiring others), and generally being in control of your own "software destiny".

With that in mind, I think it's important to call attention to this license's prohibition of running modified versions in production. This prohibition applies regardless of your modifications being distributed (and in fact, later in the license, distribution of modifications is expressly prohibited as well):

Clause 2.1 (d): "A license to prepare, compile, and test Derivative Works of the TSL Licensed Software Source Code solely in a Non-Production Environment ...

I've often pined for visibility into the source code of proprietary software that I use. I suppose this is a "win" for TimescaleDB in my mind over source-unavailable proprietary software. In the end, however, this license means it's still just proprietary software.


Thanks for drawing attention to Clause 2.1 (d).

The original intent of that clause was to avoid us needing to support modified versions that were deployed to production. (Note: We provide a lot of free support in our 4000+ member Slack channel [0].)

But that clause was written 1.5 years ago, and a lot has changed since then. There’s actually an internal debate right now on whether we need to keep it. So thank you and HN for spurring this discussion!

[0] https://slack.timescale.com/


If you intend to change it to allow running open-sourced changes, you might consider allowing changes submitted to you privately too, for vulnerability reports.

Thanks for the input, will bring it back to the team for discussion.

(Note: I really appreciate getting this kind of feedback openly from the community, so thank you :-)


One possible approach compatible with true software freedom and the usual definition of open source is not to restrict use of modified versions of the code, but instead to use naming to distinguish between the two, and only support the unmodified version.

For example, the code build system could have variables for the name and maybe the logo and other trademark/brand-ish things, and the public codebase could be configured by default to call itself Timescale Community DB or Timescale Custom DB or some other name instead of TimescaleDB. Your private build would simply substitute the json file with those data values and maybe point to logos that aren't in the repo instead of generic ones that are, or something similar to that.

You'd also have the option to use any mixture of trademark law or copyright conditions to restrict the commercial version's name and branding assets.

All of the options I described above are used in reality by various projects out there. For example, the git repository for VS Code OSS has a product.json file with most of the customization points (not all) that MS changes in building their supported VS Code release, TeX and Red Hat apply naming restrictions, and Red Hat also has rules in their support contract.


It's an interesting idea. We'll consider it. Thanks!

Could you not just say "we don't support modified versions"? This seems like throwing the baby out with the bathwater a bit.

It's a dual-edged sword: Not supporting modified versions means that there are potentially unhappy users.

But we also understand the motivation behind wanting to run a modified version.

This is something we are now actively discussing internally.

Thanks!


You could drop it, but require changes to be submitted back or themselves be source available.

I second this. I'm quite puzzled by the apparent dislike for open source ideals expressed elsewhere in the comments here.

Licensing is a complex problem and open source isn't some magical solution. It isn't the right model for every usecase out there and that's OK! Pay to play is sometimes the only fair and workable approach from a business perspective, but that doesn't make it open source. There's nothing wrong with that though!

Using terminology correctly is important. Timescale gets the terminology right and I appreciate that. (I also think it's awesome that they're releasing this product for free.)


This is an interesting window into their business model. This could be a purely an altruistic decision, which businesses sometimes do, contrary to popular belief. More likely it's a bet that wider adoption from making the clustered version free will drive more revenue through their managed database as a service offering. Which shows that their non-OSI open-source license is actually leading to more code and features being available free and (mostly) open-source. As opposed to gating features for paying customers.

I think we're too hung up on OSI open source licenses. The additional restriction in the timescaledb license that you can't run a paid database as a service offering affects hardly anyone negatively (AWS). It affects us all positively by providing a sustainable business model to support additional development and support of an open-source product we use. Win-win if ever there was one. I'd like to see more open-source and closed-source companies consider this model.


(Timescale CEO and post author)

You are spot on. Before the Timescale License, we were left with a tough decision: do we open-source a feature so that everyone can have it for free OR do we close a feature so that the mega-clouds don't have access to it?

We didn't like either of those options, which is why we created the Timescale License, which allows us to offer capabilities for free (and make the source code available) to everyone except the cloud providers (ie free for 99.9999% of all users).

We find that this has resulted in a mutually beneficial outcome for ourselves and our users.

  "I think we're too hung up on OSI open source licenses. The additional restriction in the timescaledb license
  that you can't run a paid database as a service offering affects hardly anyone negatively (AWS).
  It affects us all positively by providing a sustainable business model to support additional development
  and support of an open-source product we use. Win-win if ever there was one. I'd like to see more open-source
  and closed-source companies consider this model."
^ Really well put.

> and make the source code available

It's available to _see_, but not to "prepare, compile, and test Derivative Works of the TSL Licensed Software Source Code" in a production environment, as per your license's clause 2.1 (d). That's a pretty big departure from open-source; and a bit discouraging for use by non-mega-cloud business interests too. One of the important reasons I personally use and support open-source is the freedom to not only inspect (which the TSL provides) but to also not have to ask someone else and wait on them to make any changes I need to the software I use. Any chance the TSL can be modified to include this freedom too?


This. I don't care if I can see the source code if I can't actually _do_ anything with it. If I can't run my modifications in production, it doesn't guard me against vendor lock-in and it doesn't give me the right-to-repair. So what's the benefit?

Note that I am not arguing for OSS licences, but something like the Commons Clause (use freely, even for commercial use, repair as you wish, just don't sell) seems much more suitable for such cases imho. It protects the business from cloud providers, while still offering some basic protections to the users. This... doesn't.


I do.

I don't think Java would have gotten anywhere if not for the fact that most of the source code was available. Trying to divine information about Windows from the header files was excruciatingly painful.

Being able to step into the code and figure out why it doesn't like 0 in the third argument was a massive boost to my efficacy as a coder. I could add a guard and then file a very precise bug report to get the issue fixed.


Yep. I have had the same experience since the .NET ecosystem open-sourced.

We did consider the Commons Clause when investigating our own licensing approach, but ended up concluding that its definition of "Sell" were actually much vaguer than we felt comfortable with:

  “Sell” ...a product or service whose value derives, entirely or 
  substantially, from the functionality of the Software.
I realize that opinions might differ, though.

But then you have put a limitation on use in production instead of sell? Commons Clause might be (too?) vague, but the concept is imho more fair to end users.

Still, I for one applaud you for this step. We need better non-OSS licenses and it's good to be having this discussion.


This is a completely valid point. I also appreciate the "right-to-repair" comment made elsewhere.

The original intent of this clause was so that we wouldn’t have to support modified versions that were deployed to production. (As pointed elsewhere, we provide a lot of free support in Slack [0].)

But that clause was written 1.5 years ago, and a lot has changed since then. There’s an internal debate right now on whether we should remove this restriction.

So thank you for asking this question!

[0] https://slack.timescale.com/


Is this really an issue in practice? For libraries (react, jquery) that can’t be used on their own as a product, a lot are adopting MIT. For a “service” - mongodb, redis, rabbitmq, Kafka, Postgres, etc. I have never run into an issue where I would be comfortable modifying something, rebuilding source and deploying into production.

> I have never ...

_You_ may not have, but plenty of us have. Although, it's not as important when or how many times one has _needed_ to exercise one's freedoms, as it is to have them. But yes, plenty of us open-source users and supporters have exercised this very freedom. In fact, quite a lot of open-source contributions happen _because_ of this freedom: someone has an itch, they scratch it, and _then_ they upstream it.


Is that not possible under this license, to upstream a change? It sounds like you can’t put the change into production without it first being accepted, but not that you couldn’t contribute in other ways. I get the spirit of your argument. However, the issue is that companies are not able to make open-source compatible, permissive licenses that allow commercial use due to the new reality that creating a service and supporting a product are the main moneymakers. The code is not itself valuable to them but it is valuable as a holistic system because it’s an already built and adopted and production ready standardization of an idea.

It's also a hedge against the company/project shutting down or pivoting in a radically different direction than you want.

All the things you listed are foundational pieces of technology that are incredibly risky+costly to swap out, so if you needed to, there's an option to continue with a fork. If the thing is popular enough, there's a good chance a community-driven effort will pop up, you can find consultants to work on code for you, or even a new company will form around it, letting you continue working on your core product (mostly) uninterrupted.


A slightly different perspective of our licensing/copyright approach is there isn't confusion about ownership, so if we were ever to decide on a more permissive license, we have the clean ability to do so.

Zero plans, but this also applies if a company like ours were to ever pivot/shut down: we as copyright holders can just relicense _all_ the code to be more permissive / dual licensed / etc, which is not the case for projects where individuals hold copyright over merged contributions. (Note the opposite is not true: we can't "unrelease" versions of the code already released under a more permissive license, such as how most of our code-base in Apache 2.)

In short, I understand your point, but I think there are actually multiple sides to this issue as well.


Disclosure: I work on Google Cloud (but am glad to see you protecting your rights to your software).

The conversation down thread though raises an interesting point: why does the license say you can’t run modifications in production (under any circumstances) versus some sort of “for commercial purposes” clause? It seems to me like it’s infeasible to have actual contributions if someone isn’t allowed to have a patch, carry it forward, and attempt to upstream it over time.

I assume the intent / goal of your license is to prevent people (AWS, Azure, GCP) from taking your code and offering it as a service. I don’t disagree with that. I think it’s also fine to prevent even small companies from saying “and now we wish to be the TimescaleDB company!”. But it seems strange to also prevent non-commercial usage to run patched versions.

Lawyering is hard, but is there a clear reason against patched non-commercial?


The original intent of that clause was to help us avoid needing to support modified versions that were deployed to production.

And we tend to offer a lot of free support, e.g., via our 4000+ member Slack channel [0]. We like making sure TimescaleDB users are happy.

But that clause was written 1.5 years ago, and a lot has changed since then.

There’s actually an internal debate right now on whether we need to keep it. So thank you and HN for spurring this discussion!

[0] https://slack.timescale.com/


It's a good question. How do we classify non-commercial? Is a telecom company using timescaledb for internal time series storage non-commercial although it is directly supporting a commercial offering (maybe mobile traffic platforms)?

I do get the direct commercial inference. What about the indirect ones? Just about anything in production is directed towards supporting some sort of commercial offering.

Just genuinely curious...


We took great effort to try to draw a clear line within the actual Timescale License language [0].

Usage is permitted, as long as:

  the [end-]customer is prohibited, either contractually or technically, from
  defining, redefining, or modifying the database schema or other
  structural aspects of database objects, such as through use of the
  Timescale Data Definition Interfaces, in a Timescale Database utilized by
  such Value Added Products or Services.
In other words, if your service just provides DML access (read/write/modify), then that is permitted, while DDL access (modifying/creating schemas) is not permitted.

And in fact we already have thousands of companies building commercial applications on top of Timescale Licensed software (while adhering to the license).

[0] https://github.com/timescale/timescaledb/blob/master/tsl/LIC...


If I'm reading https://github.com/timescale/timescaledb/blob/master/tsl/LIC... right, then for a SaaS company -- not necessarily a database-as-a-service company -- section 3.11 states that SaaS company can't run Data Definition Language (DDL) commands like CREATE, DROP, ALTER, TRUNCATE, COMMENT, and RENAME.

So if I need to adjust my schema using ALTER TABLE, how would I do that and stay license compliant?

Or if I'm running out of disk and need to run DROP TABLE, is my only choice to simply get more disk space rather than dropping tables?

Some of our customers will need their own unique schema, and will need their own tables. So, how would we even run CREATE TABLE and stay compliant?

Maybe I'm missing something?


Hi there!

We use and love Timescale, so we've been paying attention to this feature. We currently use the open source version and it's very nice.

Would you mind clarifying a bit, because the blog post doesn't really explain: how much of TS do you expect to remain Open Source versus proprietary? Is the idea here that you will switch the entire project to this new license (i.e., this means you're killing the open source project), or is the idea that you'll continue to work on the open source version (but with the enterprise functionality now available under the new license)?

The omission of this really important information in the blog post makes me suspect that this is, in effect, the end of OSS Timescale. I'd love to be reading that wrong! If the core will remain open source, you should consider mentioning that in the post.


Hi JeremyNT: We've never "changed" any Apache-2 licensed code to TSL-licensed code. And in fact, we've recently basically eliminated most of our enterprise features (read: paid only) and converted them to community features (read: free under the TSL).

So I'm curious: Why do you use only the Apache-2 version of TimescaleDB rather than the Community version?

https://www.timescale.com/products/features

(I realize that I'm saying "TSL-licensed" versus proprietary, because I'm not sure what it means to think of the code as proprietary when it's all source available on github, people can contribute, and 99% of companies just use it for free.)


First, thanks for replying. But, I'll note you didn't answer my question, which is about the future of the current open source codebase. Knowing that you haven't changed the license on such code yet is great, but that doesn't speak to the future direction for that codebase.

EDIT: I just saw this reply to another child, which addresses this concern for the core timescale, I think! [0]

To be clear, I don't think the license change as described is a blocker for my org, given their use case. Indeed, it may be an near term win, as they will likely be able to take advantage of the new features that you are placing under this license.

That said, I always prefer open source code that I can modify myself. Open source licenses guarantee that the code can't be "taken away" from me, that I can integrate a technology without concerns for the sands shifting under me. If a company goes away, I and others can keep on working on it, and it can live on even if the original authors decide to no longer maintain it.

So, as an example: say Timescale changes the license of all its code under this new license tomorrow, then happily adds features and changes some fundamental things over the next few years, and then is bought by Oracle, who decides to take it fully proprietary under a new license that is more restrictive than the TSL license. This would make a fork unlikely or at least very difficult to get going!

[0] https://news.ycombinator.com/item?id=23276614


Just to clarify, the Timescale License was originally announced in December 2018. At that time, we didn't "relicense" any existing Apache-2 code, we just said some future features will be licensed under the TSL rather than Apache 2.

Many people over the past year+ knew that we were working on a distributed version of TimescaleDB; a common question was whether the distributed TimescaleDB would be paid-only (like some other time-series database alternatives) or whether it would also be free.

This announcement was meant to say: Yes, multi-node TimescaleDB is free, not paid.

So there wasn't any _new_ license announced today; just that multi-node TimescaleDB would be released under the TSL rather than as a paid-only option, which many of our users had assumed.


Proprietary usually mean "not Open Source".

Not JeremyNT, but from my perspective, avoiding vendor lock-in is my number one reason to favor open source software. Avoiding vendor lock in is very important to me when I am evaluating options.

Just because your business model and my business model currently align, does not guarantee that they will always align. I don't want you to be stuck being my vendor if I am not a good customer for you- and I don't want to be stuck using you as my vendor if you are no longer able or willing to provide the product I need.


Just so that everything is 100% clear, most of our code base is still Apache 2 licensed (and we have no plans to change that).

All that this blog post is (trying) to say is that multi-node will not be under a paid license, but instead will be free under the Timescale License.

Hope this helps.


I admire the `except the cloud providers`. What GCP, Azure and AWS have done with paid Redis offerings make me curious what @antirez (Salvator) thinks about it. They're making billions off Redis while the core contributors get nothing. I guess they agreed to it by having their work as BSD licence.

I do think there is a place for royalty based software. Free for personal and development use. For production use, you pay a small royalty to have it on the cloud. It's a win win on both sides. User gets managed offering + support + ability to look at source code, db/service authors get sustainable revenue, cloud providers get their usual PaaS cut.


> make me curious what @antirez (Salvator) thinks about it. They're making billions off Redis while the core contributors get nothing. I guess they agreed to it by having their work as BSD licence.

Here's what Salvatore has to say (from http://antirez.com/news/120):

"About myself, I’ll keep writing BSD code for Redis. For Redis modules I’ll develop, such as Disque, I’ll pick AGPL instead, for similar reasons: we live in a “Cloud-poly”, so it’s a good idea to go forward with licenses that will force other SaaS companies to redistribute back their improvements. However this does not apply to Redis itself. Redis at this point is a 10 years collective effort, the base for many other things that we can do together, and this base must be as available as possible, that is, BSD licensed."


> while the core contributors get nothing

Running a half-billion dollar company after raising $150M in funding is not nothing.


On the other hand it is unlikely Redis would be so popular if it stared as paid software.

I'm a huge fan of Splunk but always want to keep my eye open for alternatives. My use case is mostly security analytics against event content and patterns, and for that the Splunk Processing Language is very well suited.

That said I find it's fairly tedious to do a lot of time-series analysis and pattern discovery/anomaly detction across rich event models (think aws cloudtrail events).

Anything TimescaleDB can help with here? Are there case studies you can point us to? It feels like there is probably home for both just in my domain and quite obviously in the broader context of large enterprise ops/security.


Yes, we hear Splunk complaints quite often. :)

Here is a doc on using TimescaleDB as a horizontally-scalable, easy-to-deploy, operationally-mature data store for Prometheus data (i.e., metrics), put together by another of our engineering teams:

Building an open-source analytical platform for Prometheus

https://tsdb.co/prom-design-doc

I'm also happy to discuss privately if you'd prefer - ajay (at) timescale.com.


A couple examples from Timescale users which might be relevant to your use case:

ShiftLeft - code analysis and security scanning to catch vulnerabilities [https://blog.shiftleft.io/time-series-at-shiftleft-e1f981969...]

k6 - a load testing tool that scales to 100k concurrent users, analyzes performance over time, etc. [https://www.timescale.com/case-studies/k6]

If you want to talk specific scenarios, you can reach out alex @ timescale or on Slack - slack.timescale.com.


I use TimescaleDB for mass storage and query of security events (up to 100s of millions) - the speed of queries and aggregate queries even on a single node is very impressive.

I haven't done anything with regards to anomaly/trend detection yet, but it's planned. Not really sure where you see a database (TimescaleDB) fitting into that though?


We're in that scale domain where everything is a pain in the ass but not obviously outside the scope of commercial solutions. I just checked and we're averaging ~500k events per second in the five areas I'm interested in.

I feel that we could probably use a time-series database to reflect our streams as 'last observed state' type collections as well as do the aggregations that we need to feed back into anomaly detection.

I'd like to also use something like that to create a 'heat map service' where you can feed a property/window/range and get back scalar for color coding and possibly a slice of values for sparkline type UI.

Without getting hands on, though, it's hard to say for sure.


@jcims I'm really interested to see if we can help. If you're open to discussing, please feel free to email me: ajay (at) timescale.com

Hi GordonS, would love to hear about your use case if you don't mind sharing! ajay (at) timescale.com

What makes it tedious?

Did you discuss the license with any free/open source software organization to get their input? Are other products adopting similar licenses, and if so, what are the differences?

I'm not fundamentally opposed to any deviation in licensing. But I am much less inclined to use software if the license hasn't been reviewed/endorsed/used by others widely.

Also, I'm curious if you considered just addressing the license to anyone excluding $COMPANY_YOU_ARE_WORRIED_ABOUT. Not sure what the implications of that would be, but it would be interesting. Realistically, a startup has to prove itself in a relatively short period of time, and in that short period of time there are generally very few competitors (typically zero or one) that threaten your business model itself. If you have 10 competitors that actually threaten your business model, your business has already proven itself ;-)


We launched the Timescale License about 1.5 years ago.

At that point we did engage with multiple folks in different organizations, but realized that we had a business to build and didn't have the bandwidth for all the politicking usually required to establish a standard.

However, if someone wants to propose a standard around the principles of the Timescale License, then I would completely support that discussion.

Also, so that we are 100% clear: most of our code base remains licensed under Apache 2. The main thing that this post is trying to convey is that multi-node will be free under the Timescale License.


Can this license be used to, say, create a Datadog-like SaaS powered by TimescaledDB?

In most cases, yes, the license will permit this.

(There is discussion elsewhere in this threads about whether you give DDL access to your users, i.e., they themselves define schemas, tables, indexes, etc. Otherwise, Datadog is primarily a Value Added Service over just the database; huge numbers of companies utilize Timescale for building their SaaS services.)


It depends on the amount of access to TimescaleDB the services provides.

If you're really curious, I'd be happy to chat: ajay (at) timescale.com


I imagine the model will be effective because it will help you to increase adoption, provide an easy path to transition to your cloud managed version and likely a lot of support/training opportunities.

Plus, it makes it easier to just start with Timescale even if you don’t need it because we all like to preoptimize.


I kind of agree.

The argument of go 100% FOSS or not reminds me of the argument of free market vs regulation. Yes a free market sounds ideal, but in practice, a few major players take advantage and everyone who isn't them is left out in the cold.

The same applies to FOSS vs Proprietary software. If everyone did FOSS to the fullest extent, and anyone building products on FOSS made their product 100% FOSS, then it wouldn't be nearly as big a deal if Amazon took TimescaleDB and sold it, because anything they did would be available to everyone else.

But that isn't how it works. They just wait until a market establishes itself and swoop in and do something someone else is already doing successfully, but better by using their massive resources behind it. Honestly it's a lot like embrace, extend, extinguish. I'm not saying this to shit on Amazon, I don't necessarily think it's intentionally malicious, it just often ends up hurting a lot of organizations building FOSS.

I think of it this way: A lot of people agree that tech giants are increasingly becoming more powerful as they expand their reach, often due to their ability to simply buy or kill the competition through undercutting with their massive cash stockpiles accumulated through their primary business.

For FOSS, it is the same strategy except the competition isn't being "bought", it's simply being taken (FOSS) and made into something "better" with their nearly unlimited resources, and then undercutting the original proprietors.


> I think we're too hung up on OSI open source licenses.

I disagree. The free software criteria were defined as they are for a reason. AWS and other cloud vendors are taking advantage, but that is not a good reason to give up on the ideals of the movement. I would be much more comfortable contributing to timescaledb if the license had a date at which it expired to AGPL or some other OSI/DFSG/fourfreedoms license.


Ideals for the sake of ideals doesn't resonate with me, in software or outside of it. Give me a practical reason.

The expiry clause is interesting, but I'm not sure it matters in practice. Not many people want to use code several years out of date instead of the current version just because of practically no additional freedoms. Except maybe a potential competitor. I'd be happier to have a reversion to an OSI license if the product stops being maintained or gets acquired and shutdown. That's always a risk with young companies.


> Ideals for the sake of ideals doesn't resonate with me, in software or outside of it. Give me a practical reason.

Fair enough.

I am mainly looking to prevent the timescale corp. from coasting off of their long past work. In such a scenario, the several years out of date code would not be that much different from the current version, because income for timescale would not have been spent on meaningful improvements.

Another benefit is as a check to the timescale corp. in case they start acting up. In such a case, a large contributor or user might pick up maintenance of an old version and start porting it to newer postgresql versions as leverage. Users of TimescaleDB could be reassured that timescale will not abuse them through the licensing situation because there is some backup plan.


> Give me a practical reason.

Your legal ability to apply patches to the software you run to better suit your needs, and in extreme cases to fork and continue development if the maintainers can't or won't accommodate your usecase. It's kind of the entire point of the open source movement.


In principle, yes I agree, but in practice: no! AGPL is a minefield, it's a shame it is, but it is. The spirit of GPL (and AGPL) was that you could do whatever you want but you have to release the source, but they have continually been handicapped by the evolution of software. Existing licenses are not appropriate for SaaS businesses, they are either too liberal or too restrictive. What you're seeing here is the right move: I can use this for free, for personal use or business. I can't sell it directly -- why should I be able to? It's sleazy and encourages closed ecosystems (AWS, GCP, etc.).

The spirit of free software is very much alive in this decision, methinks.


> The spirit of free software is very much alive in this decision, methinks.

From the text of the Timescale License, clause 2.1 (d): "A license to prepare, compile, and test Derivative Works of the TSL Licensed Software Source Code solely in a Non-Production Environment". Further along, in section 2.2, the following prohibition is laid out: "You agree not to, except as expressly permitted in Section 2.1(d), prepare Derivative Works of any TSL Licensed Software"

That removes the freedom to run your own modifications in production. Pretty incompatible with the spirit of free software.


This is only for the parts of the code that are licensed under the Timescale license (most code is not).

Personally, I don't have a problem with this, and think it encourages users to upstream their changes.


> ... it encourages users to upstream their changes.

Not really, and not as much. Not really because one cannot begin using their own changes unless and until the upstreaming process concludes successfully. Not as much because, unlike with open-source licenses, one does not get to keep their copyright.

One of the important reasons I personally use and support open-source is the freedom to not only inspect (which the TSL provides) but to also not have to ask someone else and wait on them to make any changes I need to the software I use. The restriction against production use prevents that.

One of the important reasons I personally don't mind contributing to open-source, is the fact that I get to retain my rights.

> This is only for the parts of the code that are licensed under the Timescale license (most code is not).

This is a moot point because without the parts that are TSL licensed, we'd not be having this discussion.


> The spirit of GPL (and AGPL) was that you could do whatever you want but you have to release the source, but they have continually been handicapped by the evolution of software.

This was largely my assessment, as well. And the motive behind the work that led to Parity (https://paritylicense.com).


> AWS and other cloud vendors are taking advantage, but that is not a good reason to give up on the ideals of the movement.

Why not?


> The additional restriction in the timescaledb license that you can't run a paid database as a service offering affects hardly anyone negatively (AWS).

That's not the only additional restriction. The Timescale License does not give you the freedom to run modified versions of the code in production. Pretty big difference compared to Open Source.


I think I agree with this sentiment with one caveat: (I didn't read their license in detail, but really for licenses like this which seems to be the broader topic)

Today Timescale offers Timescale-as-a-service, so this allows them a kind of soft-monopoly on being a paid provider for this, but do these licenses generally contain a provision such that if they no longer provide that service themselves, whether from going out of business or a pivot to another product, then someone else could step in and offer it in the future? Closed source products have often had a kind of source-code escrow arrangement so that if they go out of business, you're not stuck unable to fix your own bugs, but similarly, if part of the value in adopting it is that the paid service IS available, knowing that someone else can offer a compatible service if they disappear might be a nice reassurance for the license to offer.


They're not actually providing it themselves - Aiven actually runs the backend. Seems like that points to it having some longevity.

https://kb.timescale.cloud/en/articles/2752585-timescale-clo...


> I think we're too hung up on OSI open source licenses.

Where "we" is defined as an extremely vocal minority on HN. I don't think most people take issue with licenses like this Timescale license or the BSL.


I don't think (most) people take issue with source available licenses per se, but rather attempts to misapply the term open source to them. The Timescale license is great if it allows them to provide source code and a free product while still operating a successful business! That doesn't make it open source though. (And to their credit, Timescale doesn't make that claim.)

Yes, that's probably true. Still vocal minorities tends to wield outsized power by virtue of being vocal.

I think you greatly misunderstand the implications of the Timescale License. Others have pointed out additional restrictions, but as to the restriction you mention, if it were the only additional restriction:

If I build my product on such software, host it on the official service, and latter their business model changes to no longer be a good fit for my business model (or if my model changes), I am stuck hosting the fork myself- I can't pay some one else to host it anymore. Now lets say that I am not alone- but there are several other customers in this situation- we can't band together to create a fork with a thriving, healthy community, including hosting options.

Avoiding vendor lock-in is my number one reason for choosing Open Source software! If your non-open source "Diet" Open license license leads to vendor lock-in, it is not much better to me than any other proprietary license.


I think you are being a bit generous on the causes and effects. The #1 concern of a product like this is adoption. Not in some unsustainable subsidized taxi or food delivery type thing, just that data stores are naturally sticky and come with long term opportunity as usage grows. If they dominate the time series use case, and there is good reason to believe they will, earning revenues will fall out of that in a multitude of ways. I will believe the license proves prophetic if a major cloud or minor cloud with deep pockets does a SaaS license. Until then, this is a pretty standard FOSS+support business model that became popular in the past decade.

I don't think it's FOSS+support, more like a cloud database service offering. Time will tell I suppose as to which business model dominates. And one could make the argument the service is a form of support.

My own experience is the majority of people are using it on their own cloud instances, on prem, or embedded. It's not obvious the first-party aaS will catch on right now, just like the novel license. I don't mean any of this negatively, it's clearly a well run business by smart people that are experimenting with revenue models and trying to achieve the fair outcomes for customers and the business.

What this doesn't touch on is the reality of selling enterprise software as an early startup - being open source is a hard requirement for many buyers.

> It affects us all positively

Except for users who want the highest quality hosted Timescale possible and see this license as an attempt to prevent others from creating better offerings. The open source companies that compete with cloud offerings are not exactly struggling.

Don't get me wrong, I don't have a problem with them using their position to prevent competition, but when people sell it as 'best for all the users', it feels disingenuous.


Hi, I authored this post, but the credit really goes to the Timescale database team.

Multi-node TimescaleDB is the result of a massive amount of engineering effort over two years, as can be seen in this +67,000 line PR: https://github.com/timescale/timescaledb/pull/1923

We're thrilled to make this free so that more developers can use it.


Am i the only one who thinks it is really cool that this will be free but hesitant to use it?

I have seen so many distributed data storages fail in a multitude of ways that i just dont trust anyone anymore. After 2-3 years they may have ironed out most bugs and i can evaluate again whether i do trust their implementation to store my data safely.


This is why we built this on top of Postgres. It allows us to inherit Postgres reliability.

While I can't guarantee there won't be bugs ;-), we have found that building on Postgres has enabled a much higher level of reliability than other time-series databases.


One thing to recognize that the lowest-level storage guts of TimescaleDB is Postgres, which really provides a super-stable, reliable foundation. This obviously doesn't avoid all distributed bugs, but it's a huge benefit.

It's also the case that TimescaleDB provides real benefit and scale even in "single-node" form, which allows for traditional primary/replica replication (for fault tolerance / HA / read replicas and scaling), especially when coupled with our native compression.

So we have users storing 100s of billions of rows in hypertables in the non-distributed version of TimescaleDB as well, including in our fully managed cloud service.


Definitely excited as well!

Curious how we buy support?

Looking at https://www.timescale.com/support there doesn't seem to be any support plan for on-prem TimescaleDB.


I would add your GitHub org in the "community" link of your site/blog. I was trying to get to the code from there and couldn't find a link.

I'll be looking forward to playing around with this. thanks...


Thanks for the feedback!

that's a strange KPI to tout.

It's definitely not a KPI. But still representative of the amount of work that has gone into this!

I'm not sure a 67k line PR proves anything more than 'noone really reviewed this'.

Or that there was a huge development effort going in a private branch that had PRs and code review for each of the 226 commits :)

https://github.com/timescale/timescaledb/pull/1923/commits


I work with a lot of time series tables in Postgres, albeit not at the scale that this targets. (some millions of rows, distributed sparsely over time, on which the median insert/update size is <10, but with some tail-end inserts/updates touching >200k rows).

I like concepts behind TimescaleDB, and understand the value it's adding to vanilla Postgres. We have our own implementation at my company and it's quite good for our purposes, but it would certainly struggle at TDB's targeted scale.

As I understand it (correct me if I'm wrong, this is my impression from the marketing page), TimescaleDB is "more than an Extension" to Postgres, because it rewrites some of the Postgres internals (query parser, etc)?

If this is true, I'm curious, was it not possible to package the same results into an extension? What was the decision process like? Could the concept not be upstreamed into Postgres? I'm relatively ignorant of this side of the community, so please forgive me if this question is naive.

Finally, if it is "more than an extension", does this imply that TimescaleDB is a fork of Postgres, with all the risks to adoption that entails?


TimescaleDB is packaged as a Postgres extension. The "more than an extension" is meant to highlight that TimescaleDB makes changes and adds capabilities far beyond what the typical extension does.

>All of these capabilities are being released under the Timescale License, our source-available license that permits broad usage, except for where organizations are providing TimescaleDB-as-a-service.

So it's not open-source because AWS hasn't been nice with ElasticSearch and they don't want to be in the same situation?


That sounds like open source to me, I bet they're just being really conservative about saying "open source" because there's been so much backlash at MongoDB/Cockroach/etc for similar restrictions.

It's restricted OSS because AWS takes things, runs them, and eats up all the potential revenue.


The Timescale License does not give you the freedom to run modified versions of the code in production. Pretty big difference compared to Open Source.

> That sounds like open source to me, I bet they're just being really conservative about saying "open source" because people there's been so much backlash at MongoDB/Cockroach/etc for similar restrictions.

Open Source has a very defined meaning. Please read up on the history of open source and source available licenses before saying it is all the same.

We've been defending it against a number of attacks and we will probably do it again, so please don't get on the wrong side of history ;-)

Note: this is not a criticism of Timescale. I can see what they did and respectfully did not pretend it was Open Source. Compared to a proprietary license their license opens a llt of possibilities.


> Please read up on the history of open source

Not this patronizing bullshit again.

The term "open source" was marketing to take advantage of Netscape releasing their source code. Since since, everyone seems keen on trying to usurp its definition for whatever personal perspective they have that week.

To the rest of the community outside of the OSI and FSF (which is 99%+ of the software community), this is a perfectly acceptable example of "open source" that we're all that much richer for having.

The Timescale license checks almost all the boxes of the OSI definition (and I'm not certain how denying cloud providers specifically violates any of the language):

https://opensource.org/docs/osd


> To the rest of the community outside of the OSI and FSF (which is 99%+ of the software community), this is a perfectly acceptable example of "open source"

Please review clause 2.1 (d) and section 2.2. The freedom to run your own modifications in production is not granted. This is a big deal, and rightly a deal-breaking omission for something to be acceptable as either open source or free (as in freedom).

> The Timescale license checks almost all the boxes ...

_Almost_ all, but not all. Some things work only when all of them work, like freedoms.


Just for clarification: This limitation only applies to code under the Timescale License, while most of our code is licensed under the Apache 2.

> The Timescale license checks almost all the boxes of the OSI definition

By my reading, it fails most of the interesting ones, particularly points 1, 3, 4, 6, and 9, due to the field-of-use restrictions and the prohibition on distributing modified versions.


> this is a perfectly acceptable example of "open source"

No! You cannot modify and give away the code or even run your own modifications in production. That is pretty far from both the letter and the spirit of open source..!

> that we're all that much richer for having.

Agree, thanks Timescale members for sharing it! Also I'm happy that you on the team have decided not to pretend it is Open Source.

My beef is only with people who want to pretend that it is OK to say that software that cannot be modified and used/distributed is open source.


Frankly the pedantry around the definition of Open Source, which I understand, is incredibly nauseating. Sure, this isn't Open Source by "the definition", but it's close enough if you squint. The difference doesn't impact almost anyone. Are you or someone you love impacted by this licensing decision?

Throw in an expiry date, dual licensing (pay to play seems more than fair) and I'm content. History be damned.

I'm so sick of it. Just because one group defines it a certain way doesn't make it gospel. The OSI has no power over me.


> Frankly the pedantry around the definition of Open Source, which I understand, is incredibly nauseating. Sure, this isn't Open Source by "the definition", but it's close enough if you squint. The difference doesn't impact almost anyone. Are you or someone you love impacted by this licensing decision?

If we had accepted this line of reasoning Open Source had been a synonym for source available by now.

For those who wasn't there when it happened you just have to believe us old timers that some companies tried to pass of all kinds of almost-open-source-but-you-are-still-trapped deals almost since the term was coined.

Now even Microsoft have learned but it seems the war against misinformation isn't over yet.

> Throw in an expiry date, dual licensing (pay to play seems more than fair) and I'm content. History be damned.

Fine. I'm not against everything except open source. and I'll happily use it but why why why do you have to call it something that means something else?

Why not call it source available or something?


Yep. I've been putting code out for free (MIT) for over a decade. I'm an open source developer. But I'll decide what gatekeeping I participate in for myself thank you

Maybe we just need to have "little O" open source. Unless someone's saying "Open Source" don't get in on splitting these hairs


I'd rather have "big F" Free Software: something that ensures that the many efforts I put into development end up benefiting the end users.

Right now middlemen are making billions and end users get vendor lock-in.


"little o" open source is clever.

> pedantry around the definition of Open Source, which I understand ... it's close enough if you squint

Then I think that while you might have read and understood the definition, you seem to have missed the broader idea behind it.

> pay to play seems more than fair

That's irrelevant. Sure it's fair, but it's fundamentally _not open source_.

It's not about gospel or having power over you. It's about communication and well established meanings. Calling a fish a bird doesn't make it a bird, and resisting attempts by others to redefine the language I use is a Good Thing as far as I'm concerned.

(To their credit, Timescale uses the terminology correctly and I greatly appreciate that. I also think they picked the right licensing model given how things seem to work these days.)


> you seem to have missed the broader idea behind it.

Funny, I feel that you might have missed the point as well.

> It's about communication and well established meanings.

Yes, but it's fundamentally impossible to bucket various licenses into what they do and do not do, and what obligations or burdens they place upon the end user. What is Open Source? The OSI includes GPLv3, which is certainly not "free" for a ton of commercial uses.

Let's peek at the OSI's FAQ:

> This history has led to occasional confusion about the relationship between the two terms. Sometimes people mistakenly assume that users of the term "open source" do not intend to communicate a philosophical point of view via that term, even though many actually do use it that way. Another mistake, which has occasionally been seen since about 2008, is to assume that "free software" refers only to software licensed under copyleft licenses, since that is how the FSF typically releases software, while "open source" refers to software released under so-called permissive (i.e., non-copyleft) licenses. In fact, both terms refer to software released under both kinds of license.

> Neither term binds exclusively to one set of associations or another, however; it is always question of context and intended audience. When you sense a potential misunderstanding, you may wish to reassure your audience that the terms are essentially interchangeable, except when being used specifically to discuss the history or connotations of the terminological difference itself. Some people also prefer to use the term "free and open source software" (or FOSS, FLOSS [free, libre and open source software]) for this reason.

Okay so, let's recap: A) it's confusing. B) the terms are often interchangeable but context matters. C) not everyone agrees.

At this point, the value of any "Open Source Definition" is severely diluted for any considerable purpose. Just because a license meets OSI's definition doesn't mean I should make any assumptions about what I can or cannot do with it, so what value does this provide beyond adding confusion?

> Can I call my program "Open Source" even if I don't use an approved license? > Please don't do that. If you call it "Open Source" without using an approved license, you will confuse people. This is not merely a theoretical concern — we have seen this confusion happen in the past, and it's part of the reason we have a formal license approval process. See also our page on license proliferation for why this is a problem.

I'd argue the confusion is already present. A license is a license is a license. OSI is an organization that says "please don't call your thing Open Source if it doesn't meet our standards" -- I don't care. Whether or not I comply with this polite request changes nothing, offers me no direct benefits.

To be clear: I see value in OSI, and everything they provide. They have definitely provided a net benefit to the world. I do not see value in pedantry around the term "open source", those words are so plain and ordinary that gluing any tertiary meaning to them is foolish. It's as subjective as "good code".


I am glad to hear there are more and more people that accepts those new kind of open source, not the official definition of the OSI but more like the global idea behind it

> not the official definition of the OSI but more like the global idea behind it

The trouble is that it _isn't_ the "global idea behind it" - the idea behind open source is the unrestricted freedom to modify and reuse. Closed source, source available, and open source are quite distinct from one another. Terminology sometimes has well established meaning and that can be very important for effective communication.

I actually like proprietary source available software, but it isn't the same thing as open source and anyone claiming otherwise is simply ignorant of the very well established meaning of that term. Pedantry can be called for, particularly when a monetary incentive exists to confuse and deceive. Consider for example that the definitions of many food products are defined in law and regulated to protect the consumer from deceptive vendors.

(To their credit, Timescale gets the terminology right and I appreciate that. It's people in the HN comments section that are incorrectly throwing the term open source about and completely missing the point.)


They don't call it open source either, they call it source-available because open source software can't have restrictions based on commercial use

Yes. ElasticSearch, Confluent, Redis, and Influx learned this too late.

A few days ago I went to your site to check if the distributed stuff had arrived, good to see it's here! You're making incredible software.

Why should someone use TimescaleDB over ClickHouse for time-series/analytics workloads?

I've heard several points for not choosing ClickHouse and going to TimescaleDB as an extension of PostgreSQL:

1. As it is already mentioned, if metadata (data about timeseries) are already in PostgreSQL, then it is nice to stay in the same database engine for querying data with joins of both metadata and timeseries data, so there is no need to implement integration of the two source in the application layer.

2. Also related to the first item: advantage of already knowing PostgreSQL API. ClickHouse has different management API, so it is necessary to learn. While if you know PostgreSQL, you don't need to learn new management API and only timeseries specific API of TimescaleDB.

3. ClickHouse doesn't support to update and delete of existing data in the same way as relation databases.

Then the final decision still depends on your need.


The biggest reason is if you're using Postgres already as an operational database and want some timeseries/analytical capabilities.

Originally Timescale wasn't much more than automatic partitioning but with the new compression and scale out features, along with the automatic aggregations and other utilities, it can actually be pretty good overall performance. It still won't get you the raw speed of Clickhouse but instead you get all the functionality of Postgres (extensions, full SQL support, JSON, etc) and can avoid big ETL jobs.

Another PG extension is Citus which does scale-out automatic sharding with distributed nodes but is more generalized than Timescale for handing non-timeseries use-cases. Microsoft offers Citus on Azure.


Microsoft also offers Timescale on Azure, but only the Apache-licensed parts.

Yes, along with Aiven and a few others. Unfortunately the free community license is great but require either the Timescale Cloud or running it yourself.

That's a good question! Especially considering these overwhelming benchmarks [1] made via Timescale TSBS [2].

[1] https://www.altinity.com/blog/clickhouse-for-time-series

[2] https://github.com/timescale/tsbs


Those 2018 benchmarks pre-dated many of the features we released last year, including columnar compression, continuous/real-time aggregates, etc.

Then it would be great posting updated benchmark results on TimescaleDB blog.

If you use PostgreSQL, then it feels natural to add TimescaleDB extension and start storing time series or analytical data there alongside other relational data.

If you need effectively storing trillions of rows and performing real-time OLAP queries over billions of rows, then it is better to use ClickHouse [1], since it requires 10x-100x less compute resources (mostly CPU, disk IO and storage space) than PostgreSQL for such workloads.

If you need effectively storing and querying big amounts of time series data, then take a look at VictoriaMetrics [2]. It is built on ideas from ClickHouse, but it is optimized solely for time series workloads. It has comparable performance to ClickHouse, while it is easier to setup and manage comparing to ClickHouse. And it supports MetricsQL [3] - a query language, which is much easier to use comparing to SQL when dealing with time series data. MetricsQL is based on PromQL [4] from Prometheus.

[1] https://clickhouse.tech/

[2] https://github.com/VictoriaMetrics/VictoriaMetrics

[3] https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/Metr...

[4] https://medium.com/@valyala/promql-tutorial-for-beginners-9a...


We spent about 6 months looking at pretty much every database tech on the market, cockroach, clickhouse, influx, voltdb, memsql etc were top contenders, there was an outdated article on medium.com (by victoria metrics) which slammed TimescaleDB for its disk usage, we did not realised it was biased, so we dropped TSDB dropped off the list, but we saw a email about their compression segment by device_id, and gave it a shot, ....we implemented it, 5 months after our production release we now have outstanding performance and compression (95x) We are planning to move the rest of our databases to TSDB now as it ticks our boxes our use case is HTAP, not solely OLAP and OLTP

I'm super excited about this news, but TSDB please work on allowing us to put data over 1 year old on slow disk seperate servers, so we can keep the hot stuff on the NVME servers, once you get this sorted it will be the perfect fit for us.


> TSDB please work on allowing us to put data over 1 year old on slow disk seperate servers, so we can keep the hot stuff on the NVME servers, once you get this sorted it will be the perfect fit for us.

ClickHouse recently added multi-volume storage for exactly the use case you describe. [1] It's a great feature.

[1] https://www.altinity.com/blog/2019/11/27/amplifying-clickhou...


Glad to hear it is working out for you! I'll relay the request re: old data. But please also feel free to email me directly at ajay (at) timescale.com (or email support (at) timescale.com) if you have any follow up questions / requests.

Good news: TimescaleDB already offers this feature. Feel free to ping us support (at) timescale.com and we can walk you through it. Thanks!

It is customary on HN to disclaim when you are a member of/contributor to products you are suggesting.

Another thing to mention is that TimescaleDB has much stronger ACID guarantees than ClickHouse. Which means you get more clear semantics for consistency

I understand that the Timescale license can't be utilized by cloud providers, but what about others who need a timeseries database for their SaaS offering? Is this permitted as long as you aren't marketing a hosted TimescaleDB solution?

Edit: wording


That would be permitted, as long as the service isn't just a "TimescaleDB-as-a-service." [0]

For example, if the service allowed users to only make DML changes (access / modify data) then it is ok, but DDL changes (creating / modifying database schemas) is not permitted.

In fact, we already have 100s of SaaS companies using TimescaleDB as part of their offering.

[0] https://www.timescale.com/legal/licenses


More specifically, the text of the license says you can't offer any service that is "primarily a database storage or operations product", even one that doesn't allow schema modifications.

If that wasn't what you intended to prohibit, you should probably fix the wording of section 3.21(i).


Thanks akulkarni, this is great!

Correct me if I am wrong.

Timescale DB Core ( if there is such a thing ) is still available under Apache 2.0. So nothing has changed. You can use it just like any other open source project with no restriction.

Timescale DB multi-node, originally not free and only available in Timescale Cloud. Is now Freely available under the Timescale License, a source-available license.

Timescale DB multi-node and its license only forbid you to provide TimescaleDB "multi-node" itself-as-a-service. And does not allow running it with any changes that is not upstreamed. You can still resell any software or services built on top of Timescale DB multi-node.

Again, correct me if I am wrong.


Almost!

Yes, TimescaleDB "core" - still Apache 2.0

TimescaleDB multi-node - was never before released, is now released for free under the Timescale License, a source-available license

There are other capabilities (e.g., gap-filling) that are also under the Timescale License, in addition to multi-node.

The Timescale License prevents "TimescaleDB-as-a-service" usage.

You can still run software / services on top of Timescale Licensed software, as long as you are not offering "TimescaleDB-as-a-service".

The Timescale License currently prevents running any modifications in production, but we are actively debating removing that restriction (as I mention elsewhere).

Hope this helps.


Does TimescaleDB support automated downsampling using various functions (min/max/mean/avg) and then during querying automatically picking the correct downsampled data? This is the biggest issue that I and others have with InfluxDB, that it doesn't do that, so the only convenient way to use it is just to expire all data outside the retention policy. Ticket here: https://github.com/influxdata/influxdb/issues/7198

I think what you are referring to is the TimescaleDB real-time aggregates https://docs.timescale.com/latest/using-timescaledb/continuo...

It allows you to define aggregations that are automatically used when quering the raw table if the query matches, and it also allows you to drow the raw data with a retention policy but keep the aggregated form (https://docs.timescale.com/latest/using-timescaledb/continuo...)


OK, but it looks like I still have to define these aggregates manually. I was really more talking about the standard use-case that folks used to use Graphite / rrdtool for: Keep track of real-time high-fidelity metrics while still being able to query aggressively-downsampled historical data for comparison, and doing so without having to configure anything.

Hi @heipei -- one thing to observe is that Graphite & rrdtool are designed for a specific monitoring use case, while TimescaleDB is a more general-purpose time-series database.

So what that means is that TimescaleDB has mechanisms to make it really easy to define downsampling (continuous aggregates, data retention policies), and even have queries that transparency query across the historical aggregates and new raw data (real-time aggregates, which parent pointed to, which isn't supported by InfluxDB).

What the database _by itself_ doesn't do is automatically create certain continuous aggregates on metrics immediately, because frankly, users' needs vary so much.

That said, we have built stacks/solutions that leverage TimescaleDB and do precisely that. For example, we just released a design doc and beta around our refreshed native integration with Prometheus, that addresses an extremely similar use case to Graphite / rrdtool. Because now this is automated, it defines many of these things out-of-the-box, so you don't need to configure anything. Check it out and input welcome!

https://tsdb.co/prom-design-doc


Thanks for the pointer. I truly understand that TimescaleDB is a general-purpose time-series DB and I understand that most use-cases are unique in that it makes sense to make these decisions about what and how to downsample consciously. However, I feel that there is a large audience of people who "just" want a database that they can point their system-metrics collector at (Telegraf), point their dashboard at (Grafana) and just hit "go", much like would with something like Datadog, and have the confidence that they can still scale the database if its ever necessary. Much like ElasticSearch provides default mappings (text/keyword/date/number), this would a great 80-20 solution for the default use-case of "I want to collect system metrics from my hundreds of servers and have a few sensible defaults about granularity, downsampling and data-retention, and only then will I start to worry about whether that data will eventually exceed my one-server deployment."

Yep, that's exactly what the "Timescale Observability" stack is about. Type "helm install", and a full stack is spun-up and auto-configures to scrape information. You have graphs up in Grafana within 2 minutes, zero configuration.

- See https://github.com/timescale/timescale-observability

- Or join the #prometheus channel at https://slack.timescale.com


Very very interested in this too. Sometimes called automated roll ups. I know Elasticsearch does this

I replied to the parent comment. In short yes, it is supported (https://docs.timescale.com/latest/using-timescaledb/continuo...)

>67k line PR

Man I'm glad I don't have to review that


To be clear, it was developed in a branch so all the individual commits have been reviewed beforehand when landing on that branch. And the branch was rebased throughout the development cycle. This is just the final PR to merge that branch back into master :)

You don’t review it, just approve :)

How does the multi-node version work with data compression compared to the single-node version?

I like how on a single-node I can utilize data compression and get a 95% storage saving.


In the current version, you can execute `compress_chunks` on each of the data nodes and enjoy those same savings (and will work transparently with queries, as before).

In subsequent releases, we'll add full support of compression, e.g., just create a compression policy on the access node and you are off and running.


Sounds great. So I just manually execute this `compress_chunks` command once on each data node and then I have compression enabled forever on those nodes?

Not yet, I should have been clearer:

compress_chunk operates on a single chunk, the way to define "compress all chunks older than 1 week is":

   SELECT compress_chunk(i) from show_chunks('conditions', older_than => INTERVAL '1 week'); 
https://docs.timescale.com/latest/using-timescaledb/compress...

So you'd need to setup a cron job that runs that script every night or something...at least until we release compression policy support.


Free multi-node TSDB solution sound cool! I wonder if someone tried to use TimescaleDB as remote-storage for some heavy-loaded Prometheus [1] setups.

[1] https://prometheus.io/


Absolutely!

We wrote one of the first remote backends to Prometheus that supports both the remote_read/remote_write interface: https://prometheus.io/docs/operating/integrations/#remote-en...

Given how much interest we had using TimescaleDB for this, we recently built and released (in beta) a new "full-stack" of Prometheus + TimescaleDB + Grafana that comes fully configured and "just works" out-of-the-box:

- https://tsdb.co/prom-design-doc

- https://github.com/timescale/timescale-prometheus


This is so nice that design doc is opened for commenting - so much good thoughts there. Thank you for sharing!

After reading that I have a two questions:

1. While the integration with Prometheus sounds great it still requires to run pretty complicated system behind it. The distributed TimescaleDB could require a lot of knowledge to operate and additionally a connector that could become a one more point of failure. Have you considered to merge connector into Timescale to make setup more simple and robust?

2. Significant part of my everyday work is connected with writing PromQL queries and I often check week/month ranges while plotting timeseries. And I heard many complains that remote-read might be very expensive when it touches a lot of data. Do you consider possibility to support PromQL in TimescaleDB to avoid remote-read bottleneck?

Personally, I have a good experience working with Thanos and VictoriaMetrics because of seamless usage experience - same queries, same Grafana dashboards, same alert rules. Would love to see more products that support the same standards for timeseries data.

Edited: formatting.


1. Even though it is newer, distributed TimescaleDB is probably more robust and easier to operate (and already more operationally mature) than other local storage options for Prometheus metrics, in part thanks to the underlying maturity of Postgres.

2. Yes, supporting PromQL directly (ie not via remote_read) is already in internal testing. Coming very soon.

Would really appreciate feedback if/when you get to try it out yourself. Please feel free to ping me directly: ajay (at) timescale.com


I've had luck with https://thanos.io/ for a big (~1 billion timeseries across all our DCs) Prometheus scale out project. Horizontally sharded Prometheus that can be queried and alerted on in a unified view with object store backend.

I remember being very impressed with numbers from the following tweet https://twitter.com/this_is_tckb/status/1256649880434606080.

I'm wondering what is the cost of your setup to handle billions of timeseries?


Do you consider TimescaleDB as replacement for Thanos? Would be nice to read some operational and performance comparisons for real world cases.

You may also want to checkout https://eng.uber.com/m3 which is a highly available RF=3 multi-node TSDB metrics backend and is used with heavy Prometheus workloads and is used to ingest tens of millions of timeseries per second.

> All of these capabilities are being released under the Timescale License, our source-available license that permits broad usage, except for where organizations are providing TimescaleDB-as-a-service.

Maybe someone can give clarification on this, but the line between using TimescaleDB to build a product and providing TimescaleDB-as-a-service seems incredibly blurry. If I have a product that in some way let's you query time series data, and that product is powered by a TimescaleDB, would that count as providing TimescaleDB-as-a-service?

I used to work for Heap which is an analytics tool. In a way you can view Heap as just a wrapper around Postgres. We stored event data in Postgres and provided a UI that allowed you to express queries (e.g. count the number of logins over the past month). We would take the query in the UI, compile it into a SQL query, and run the SQL against Postgres. If Heap was powered by TimescaleDB, would that violate the Timescale License? In fact, you could technically view any dashboarding product that queries TimescaleDB as providing "TimescaleDB-as-a-service".

I looked at the actual license[0] to see what it says, and it seems really unclear. The license gives you permission to use TimescaleDB to develop "Value Added Products or Services" which it defines as a product that uses TimescaleDB as part of a larger offering. One of the requirements for a product or service to be considered "Value Added" is:

> (ii) such value-added products or services add substantial value of a different nature to the time-series database storage and operations afforded by the Timescale Software and are the key functions upon which such products or services are offered and marketed

This seems incredibly vague. What exactly does "substantial value of a different nature" mean? In the end, tons of products are just wrappers around DBs. If products like Heap or Datadog were to be backed by TimescaleDB, would they add "substantial value of a different nature" on top of it? In the end, Heap and Datadog are products designed for querying time series data. I could definitely make a case that they don't provide value of a different nature from TimescaleDB. This vagueness seems like a huge risk and without further clarification, makes me want to stay far away from TimescaleDB.

[0] https://github.com/timescale/timescaledb/blob/master/tsl/LIC...


Hi @malisper, we totally appreciate concerns around potential uncertainty what a "Value Added Service" means.

In fact, when we were looking at Timescale licensing, we took careful look at what a lot of other like company licenses did here (Confluent, Redis, etc), and what later became the Polyform License. Most of them left this definition pretty vague -- because frankly, legal language is never as precise (and perhaps shouldn't be) as what an engineer may like.

We went a step further, and tried to define this more precisely about what it means to "offer" TimescaleDB:

      (iii) users of such Value Added Products or Services are prohibited,
      either contractually or technically, from defining, redefining, or
      modifying the database schema or other structural aspects of database
      objects, such as through use of the Timescale Data Definition Interfaces,
      in a Timescale Database utilized by such Value Added Products or
      Services.
[def] https://github.com/timescale/timescaledb/blob/master/tsl/LIC...

What that means is that if you've defined the Heap schema, you have built the indexes and tables, and then are offering a SaaS product on this, you're fine:

- You are offering a product/marketing SaaS service around usage/product analytics, not a time-series-database-as-a-service

- You are not approaching the market and saying, "Here's how to get TimescaleDB-as-a-service" (unlike, say, Managed TimescaleDB running on Rackspace or Digital Ocean), you are saying "Here's a full Product/Marketing Analytics Solution".

- You are not giving your users direct/psql access to the raw database to define their tables/schemas/indexes and otherwise just treat that service as a hosted TimescaleDB instance.

I hope that helps!


> We went a step further, and tried to define this more precisely about what it means to "offer" TimescaleDB

I don't understand how the bit you posted helped make things more concrete? Section 3.21, the section you referenced lists three conditions, all of which have to be true for your product to be considered "Value Added". I agree the third condition, the one you quoted, is pretty clear. But the second condition, the one I quoted seems really vague so the definition of "Value Added" as a whole becomes really vague.

> What that means is that if you've defined the Heap schema, you have built the indexes and tables, and then are offering a SaaS product on this, you're fine.

FWIW, Heap would automatically create new tables for customers as they sign up and would also automatically create new indexes for customers as needed. For that reason alone, I'm pretty sure Heap would violate the Timescale license.

I agree that it's pretty difficult to be specific about what "value added" means. I'm not sure what the right solution is. I would still want to go over with the Timescale License with an IP lawyer pretty thoroughly before I were to use TimescaleDB.


> FWIW, Heap would automatically create new tables for customers as they sign up and would also automatically create new indexes for customers as needed. For that reason alone, I'm pretty sure Heap would violate the Timescale license.

Nope! The user doesn't define or control that those tables and indexes are created. I.e., the user, through the Heap UI, doesn't say: I want a table with this schema and I want to create an index on (event_id, timestamp).


What's the difference between Timescale's license ("free to everyone except cloud providers") and GPLv3?

The differences are pretty substantial.

The GPL puts no restrictions whatsoever on how you can use software that falls under it. Timescale's license, on the other hand, gives you very limited usage rights. You can use unmodified versions of the software, but you can't allow clients to make schema changes, nor can you use it to provide any service that is "primarily [a] database storage or operations product or service".

In addition, Timescale's license is much more restrictive about allowing derivative works. The GPL lets you create modified versions and/or reuse code in other products, no matter how extensive your changes, as long as the results are also GPL-licensed. Timescale's license lets you create modified versions, but you're not allowed to:

* make any changes that bypass "usage restrictions"

* use your changes in production

* distribute your changes in any way, except for assigning all the rights back to Timescale


Changes you make to GPL software only have to be provided under the GPL if you redistribute the work to others- if you keep it to yourself, run it yourself, etc, you are not required to release it as GPL.

Aren't you required to release it even if you run it yourself with GPLv3? Or am I thinking of another version?

You are thinking of the AGPLv3. Not to be confused with the GPLv3 =)

https://www.gnu.org/licenses/agpl-3.0.en.html

"It has one added requirement: if you run a modified program on a server and let other users communicate with it there, your server must also allow them to download the source code corresponding to the modified version running there."


As the term you cited says, the AGPL only requires distribution if you are providing it to some one else, including as service over a network- if you use it yourself, and don't distribute it (including over a network service) to anyone else, you still don't have to share your source.

Ahh, right, I thought I was misremembering but couldn't quite place how, thanks.

> In addition, Timescale's license is much more restrictive about allowing derivative works. The GPL lets you create modified versions and/or reuse code in other products, no matter how extensive your changes, as long as the results are also GPL-licensed

I mean, to be fair and add some balance here, a lot of people find that part of the GPL to be very restrictive.

There are many organisations who have banned the use of GPL code altogether because of this, and also because of ambiguity in the license (e.g. the never ending debate about static and dynamic linking etc).


> I mean, to be fair and add some balance here, a lot of people find that part of the GPL to be very restrictive.

I often see comments like this, but they make no sense. If you don't agree with the GPL, then don't use software licensed under it. The same as if there's proprietary software you don't want to license, don't use it. There's nothing to debate.

> There are many organisations who have banned the use of GPL code altogether because of this

Good, they read the license and don't want to follow it, so they don't use it. Exactly as intended.

You're confusing the GPL viral nature, which is a central feature, with something different that you wished for, but isn't real.

And by the way, since Linux is GPL, those same companies almost always make an exception, don't they now?


I think this is a good move. But from an enforcement perspective, how realistic is it to prevent someone like Amazon from offering a clone service (at least for backend components) and claiming they wrote it from scratch? Is there any way to force them to reveal the source for a particular service?

I mean.. you think any major provider like Amazon is going to just blatantly rip it off and sell it saying "We created this!"

That's just not going to happen.


Are there any tools to migrate from elasticsearch to timescale ? We are considering a switch from our es and are evaluating options. Timeseries is also one of the contenders. We are not looking for text search just some nested queries on a timeseries data.

Disclosure: I work for Timescale, previously worked for Elastic

Pretty much any ETL tool you like could do this, as long as it speaks to elasticsearch and postgres.

Logstash (if you're using the ELK stack) can write to CSV or other formats as well as do any processing, but it doesn't have a JDBC output plugin, so you'd have to ingest with something else. Conversely, fluentd for example can output to Postgres, but doesn't have an elasticsearch input (at least that I could find), so you'd have to export from es with something else.

So it might be a couple of steps, though there are rich clients for most major programming languages for both elasticsearch and postgres. If your schema is fairly simple, this might not be too bad to roll your own.

That said, the hardest part is likely massaging your data, if your elasticserch schema is complex. Because you have to totally denormalize things for es (generally), you might have to unravel some of that going back into a relational database.


Not sure, but if you post that question to the TimescaleDB Slack [0] you might get an answer there.

[0] https://slack.timescale.com/


Can someone please summarize what it does because I couldn't figure out from website? It says its "on Postgres", is it a flavor of PG? or it sits on top of multiple PG instances.

How's this:

TimescaleDB is a distributed time-series database that is packaged as a Postgres extension (a "mega-extension" to quote someone else on this thread).

TimescaleDB:

* Scales to over 10 million of metrics per second [0]

* Supports native compression, using delta-delta, Gorilla, Simple-8B RLE, and other best-in-class compression algorithms (achieving a median 94% compression based on user data) [1]

* Offers native time-series capabilities, such as data retention policies, continuous aggregate views, real-time aggregates, downsampling, data gap-filling, and interpolation

* Handles high cardinality [2]

* Outperforms other non-relational databases including InfluxDB [3], Mongo [4], Cassandra [5] for time-series data

With TimescaleDB you also get all of the goodness that is built into Postgres: full SQL, a variety of data types (numerics, text, arrays, JSON, booleans), ACID semantics, and operationally mature capabilities including high-availability, streaming backups, upgrades over time, roles and permissions, and security.

[0] https://blog.timescale.com/blog/building-a-distributed-time-...

[1] https://blog.timescale.com/blog/building-columnar-compressio...

[2] https://blog.timescale.com/blog/what-is-high-cardinality-how...

[3] https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-...

[4] https://blog.timescale.com/blog/how-to-store-time-series-dat...

[5] https://blog.timescale.com/blog/time-series-data-cassandra-v...


Here's the original intro to Timescale blog post - https://blog.timescale.com/blog/when-boring-is-awesome-build...

It's a PG extension.

How does TimescaleDB work as a traditional OLTP db? Can I run general analytical queries on it and leverage its distributed nature? Or is it better for single table append only workloads?

The Hypertables and Distributed Hypertables can be used to store any kind of data, but works best as long as it has a monotonously increasing partitioning key (e.g. time), with high ingest load, few data modifications (preferable bulked)

The beauty of TimescaleDB being built on Postgres is you can have your regular Postgres tables (OLTP schema) and time-series data (Hypertables) live side by side. Use 1 language (1 mindset) to query them, join them, work with them as you see fit. With Distributed Hypertables (what the post is about) you can now partition your data to live across multiple servers, and still use your 1 mindset to query all that data.

edit: With the preferred workload you get the most out of TimescaleDBs advanced features like compression, continuous aggregates and data retention policies. You can use the aggregates to build complex auto-updating materialized views that are automatically used even when you query the raw tables also (https://docs.timescale.com/latest/using-timescaledb/continuo...)


This sounds like the perfect fit to a write only event log table we stored in postgres at a previous employer. I pushed to move it to BigQuery but this sounds like it would have been fine.

There is more cost effective alternative to BigQuery for storing and analyzing big amounts of logs - LogHouse [1], which is built on ClickHouse.

[1] https://github.com/flant/loghouse


Here is a community post on storing logs in TimescaleDB:

https://www.komu.engineer/blogs/timescaledb/timescaledb-for-...


Continuous aggregates look like a killer feature.

Thanks! You might also find this related feature, real-time aggregation, really powerful as well.

We just released it last month: https://blog.timescale.com/blog/achieving-the-best-of-both-w...

"With real-time aggregation, when you query a continuous aggregate view, rather than just getting the pre-computed aggregate from the materialized table, the query will transparently combine this pre-computed aggregate with raw data from the hypertable that’s yet to be materialized. And, by combining raw and materialized data in this way, you get accurate and up-to-date results, while still enjoying the speedups that come from pre-computing a large portion of the result."


I saw that TimescaleDB is mostly C, like other PG extensions. Have you all put any thought into using Rust? Just curious about why or why not.

TimescaleDB uses heavily PostgreSQL API and hooks, which expose many data structures, macros and functions. My understanding is that using Rust or even C++ will require to write large FFI and also maintain it between PG major versions, which are released every year. Also, just having FFI is unlikely enough, and will require to write wrappers on top of it to use the best of Rust and not just another syntax on top of C.

How does the multi-node version handle high availability and automatic failover?

Are those included or are they paid add-ons?


Short answer is: The 2.0 release won't natively support automated failover, although you can build around using PG tools like physical replication + Patroni. But these capabilities are certainly things we are working on.

Per the PR notes:

  The current implementation has many more limitations 
  that will be addressed over time:

  - HA and replication has to be managed node-by-node. 
    This will be improved with native replication.

You can utilise multinode data replication for high availability of data, however it is still necessary to use an external tool for HA of the access node, which distributes data and queries to data nodes.

Awesome. I just hope that one day Amazon supports it on RDS. I do know that Digital Ocean does!

An important clarification is that Azure, Digital Ocean, Rackspace (Object Rocket), Alibaba Cloud -- which all support managed TimescaleDB today -- only offer the Apache-2 version of TimescaleDB.

Many of the more advanced features of TimescaleDB, including this distributed options, is released under the Timescale License.

All code under the Timescale License is also source available and people are free to use, incorporate into their commercial SaaS services, distribute, etc. with the primary limitation being if you are offering TimescaleDB as a hosted DBaaS (like RDS, Azure Postgres, etc.)

Instead, Timescale Cloud is the place to get TimescaleDB advanced features as a fully managed DBaaS.

https://www.timescale.com/products/features


I was hoping that too but I think Amazon is still working on their time series database[1]. We registered for their preview in 2018 and it's still in preview with no access.

[1]https://aws.amazon.com/timestream/


I would recommend looking at Aiven if you want to deploy Timescale on AWS (we use it to deploy on GCP, which is also missing the extension in their CloudSQL offering).

Very happy to see Timescale making more features available in the community edition.

We first started evaluating time series databases a month or two ago, some features like continuous aggregation (rollups) were enterprise only. Perhaps their strategy is to drive adoption and letting people try their feature out, hoping that some of these adoptors will end up using their managed solution. I checked their pricing, and the delta between their pricing and the underlying AWS instance seems quite reasonable.

We ended up testing Influx first, because it seems to be a safe choice with wider adoption and extensive documentation.

With Influx, it was very easy to put together a prototype quickly. But once we started throwing some real workload at it, it would lose writes under load. But it makes sense that it failed, because according to Influx's documentation (https://docs.influxdata.com/influxdb/v1.8/guides/hardware_si...), we would need cluster to make it work. Influx is very transparent in their documentation that writes and queries will fail immediately when a server is unavailable without cluster.

This isn't to say that Influx wouldn't work for other use cases. But at least in our use case, their open source offering isn't suitable for us, and it's unclear how much better the cluster version is.

Timescale, on the other hand, was able to handle the same workload under stress. As we are unable to backfill some of the ingressing data, it's quite vital that the system can degrade more gracefully.

For my use case, one feature that still need some work in Timescale is their real time aggregation. It is currently impossible to define a rollup on top of another rollup, which means that if you are ingesting a lot of data into the raw table, and you down sample into a wide time bucket (e.g. a day, or week), queries against these wider buckets will potentially ended up having to query a lot of data points, slowing the system down considerably. Granted, it is a new feature that just got released about a month ago. Hopefully, with multi-node nearing completion, continuous aggregation will get a bit more love.

I spoke with their engineers about this over Slack, and their suggestion was to manually modify the rollup materialized view to aggregate over a combination of the materialized buckets (currently handled by the continuous aggregation) + real time aggregation from a higher resolution bucket.

We are still testing out Timescale, of course. But so far, it's been holding up its end of the bargain. The fact that Timescale is "just an extension" built for Postgres also makes it a less risky choice and offers a lot of flexibility; if Timescale doesn't work out, we could still work with Postgres, and that IMHO is a very nice thing.


Thanks for the feedback. Really glad to hear that your experience is going well with TimescaleDB! Feel free to ping me directly ajay (at) timescale.com if there's anything I can do to help.

So does this mean anyone can join this giant snowball?

Like IPFS or MaidSAFE or Dat or Bittorrent?


If this is just postgresql, why would I use this implementation over postgresql?

It also seems like it scales linearly with some decrease in ROI after 12 nodes.


It's not "just" PG, it's like a mega-extension with data-types and tuple-layout and a tonne of magic for the data-domain.

You can, of course, make similar models with plain PG - in the same way one does GIS without PostGIS.


"mega-extension" <-- I like that!

Multi-node TimescaleDB is a great contribution to open source world!

BTW, it would be great comparing multi-node TimescaleDB to VictoriaMetrics cluster [1], which is licensed under vanilla Apache2 open source license [2].

[1] https://github.com/VictoriaMetrics/VictoriaMetrics/blob/clus...

[2] https://github.com/VictoriaMetrics/VictoriaMetrics/blob/clus...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: