Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Centrifuge: a reliable system for delivering billions of events per day (segment.com)
156 points by bretthoerner on May 23, 2018 | hide | past | favorite | 65 comments


I'm currently investigating a very similar problem (very high throughput webhooks, specified by customers, to unreliable endpoints) and was considering an architecture involving a set of queues partitioned by webhook response time and/or failure rate.

So if your webhook is bucketed into the 0-100ms queue and your responses start to exceed 100ms, you'd be bumped up to the 100-500ms queue which is more likely to have periodic queuing delays, and upwards from there depending on your response time / failure rate. If your API later recovered and started responding faster you'd be moved back up into the faster queue. That way we could offer different (soft) SLAs for different classes of response times, and scale the workers independently per queue.

I'm curious if there are known issues that people have run into with this approach? The main unknown was how many workers we'd be willing to throw at consistently slow endpoints to try to keep the slower queues from backing up too much, and possibly some flapping as endpoints could respond quickly under low throughput but slow down as soon as they move up to the faster queues.


I really like this approach. You could even automate it where the 0–100ms queue's workers could sever the connection after 100ms and re-queue the message on the next level up, while also incrementing the timeout counter. You could do really interesting things by incrementing and decrementing a timeout counter... e.g., it gets decremented when the transaction completes under the line for the queue it's in. That could help with flap.


Ugh. This is only tangentially related, but Segment has always looked great, but is so incredibly, ridiculously overpriced for web analytics.

If you want to slap it on your website for basic analytics, you better not get any traffic to speak of, because they charge you about $.01 per monthly tracked user, anonymous or identified.

If you get 100k visitors per month (not even THAT much), it's $1,125/mo. Their pricing estimator stops at $2,375 / month for 225k monthly tracked users.

God forbid you put it on your site or in your mobile app and then anything you do goes viral. It'll bankrupt you.

Who on earth is this aimed at? Is this only suitable for tracking logged in users? Or only suitable for enterprise companies?

Better yet, what about what they're doing is so difficult and expensive, when many of the places you'd be sending these analytics (which will be handling the same number of events) are free or cheap at this level of use?

Segment has always seemed cool, but overpriced for web analytics by a factor of 20-50x.

OK, rant over.


Segment isn't a web analytics tool, so much as a data dispatching service. Send data to Segment, which in turn sends it to the services you have configured.

Think of it as a central data store for your business that has built-in integrations to move that data to other providers such as Google Analytics, Salesforce, Mailchimp etc.


Yep, that’s what makes it so cool and useful.

But if one of those services is Google Analytics, for example, then you probably want to use it with all visitors, which is excruciatingly expensive if you get hardly any traffic at all.


I've only worked with one account/client that used Segment specifically, so I can't speak authoritatively about what their primary market segment is.

But it fills a really valuable hole for anyone that spends a decent amount on paid media. Having conversion and tracking data funnel through Segment has two benefits:

1. Every ad network has it's own conversion tracking system, and Segment makes it a whole lot easier to ad in another one that marketing wants to test out 2. It enforces a semblance of consistency. Implementation can have a very large impact on analysis. Whether intentional or incidental, it's really easy to implement analytics code inconsistently. Then the business/marketing user is evaluating results between campaigns/platforms/agencies/etc as if it's apples to apples, and it turns out it was implemented like apples to oranges. Standardizing events with Segment and then forcing tracking events based on those neatly ensures a level of consistency between services that's hard to achieve otherwise.

#2 is the biggest problem I've ran into managing clients' web analytics. My goto solution is to leverage a tag manager, work on mapping out significant actions and creating triggers for those, and then making sure as much as possible is driven by those standardized triggers and anything one-off is scrutinized appropriately.

But that process is easier said than done, since the friction to create new triggers is low and just requires appropriate GTM access. With Segment, making a completely new event involves an actual development release and is harder for a business user or agency to push through nonchalantly.

$0.01/visitor is not insignificant but in that environment (which is not all that rare), something like Segment is invaluable.

Plus, don't assume cost scales linearly with the same variable as publicized pricing. Have never been privy to a Segment contract, but part of the "Call Us" component is to ensure hte unit economics of the product make sense to your business. MTU is the unit they chose for automated pricing tiers listed, but there are other variables like the number of sources/destinations, number of seats, etc are negotiable.

High volume of MTUs but just a handful of sources/destinations? Can probably get a contract structured with a very low MTU price but a negotiated max active destinations with a per-destination charge over that. Your MTU peaky? Can likely get those to roll over or have the MTU measured against a per-month average over a quarterly basis. Allowing your peaky periods to be offset by your down periods. Don't view the Business custom pricing option as only for super huge spenders - a lot of companies are willing to work with you when you're smaller in hopes to structure a contract that'll grow annually as you grow. Custom pricing doesn't always mean super expensive, it also means "atypical unit economics". And also happens to be a market research tool to help companies discover use cases they hadn't thought about (hence why their standard pricing differentiator didn't work for it).


>> To implement per source-destination queues with full isolation, we’d need hundreds of thousands of different queues. Across Kafka, RabbitMQ, NSQ, or Kinesis–we haven’t seen any queues which support that level of cardinality with simple scaling primitives.

I've been posting this a lot recently but it keeps coming up as the relevant solution, Apache Pulsar supports millions of topics without much overhead and offers the log semantics of Kafka with better scaling and per-message acknowledgement: https://pulsar.incubator.apache.org


Hey HN, author of the post here. A number of Segment engineers are hanging around today, and we are happy to answer questions in the comments. Thanks in advance for any feedback and thoughtful discussion!


This is a very interesting blog post and some great engineering, congrats!

I fail to understand the following aspects though, maybe you can clarify them a bit:

* How does a director recover from a failure? From my understanding it would require fetching all job IDs from jobs that are in an active state via the job transactions table (which sounds expensive) and then loading the associated meta-data from the jobs table? Is that correct?

* Do you assume that the director will have archived all non-completed jobs when deleting the database? Do you try to gracefully shut down the director first then? From my understanding it seems you perform a "drop table" statement on a given database and then regenerate the tables, but this would require being sure that all the jobs have been processed or archived.


On your first point, you’re correct. Directors scan their databases on start to rebuild their cache and reschedule the jobs that need to get retried. The scan is usually quick for data that was recently inserted since they’re likely in cache, it make take a couple of minutes to scan everything. Because we keep the database to a small size we can cap how long this operation will take. We also scan the database in reverse order (using the primary key, thanks to the rough ordering of KSUIDs), to help reschedule he most recent jobs first, which is possible because the scan can happen concurrently with tye jobs processing.

On your second point, archiving actually happens in the “drainers”, not the director. It’s the component that picks up unused databases and flush their data back into the set of running directors. We initially built archiving into the directors but it turned out to steal too much resources away from job processing so we moved it out into the drainers, which actually gives us opportunities to do it more efficiently (for example creating larger archives, which helps with compression as well).

Sorry if we cut some details off of the post, there is plenty more to tell about this system but it’s a lot for a single blog post ;)


Thank you!

> How does a director recover from a failure? From my understanding it would require fetching all job IDs from jobs that are in an active state via the job transactions table (which sounds expensive) and then loading the associated meta-data from the jobs table? Is that correct?

This is correct, if a director crashes for any reason, it needs to scan the database on boot. It is a more expensive operation, which is part of the reason that we try and cap the number of total entries in a given database.

> Do you assume that the director will have archived all non-completed jobs when deleting the database? Do you try to gracefully shut down the director first then? From my understanding it seems you perform a "drop table" statement on a given database and then regenerate the tables, but this would require being sure that all the jobs have been processed or archived.

Great question, we glossed over this aspect this a bit in the post itself.

Before a given database is transitioned to the 'spare' state, and its tables dropped, a single Drainer process is responsible for moving any non-completed jobs from that database to another active Director. The Drainer will not successfully exit and transition the database to 'spare' until it is certain it has processed all the non-completed jobs. We never drop any tables which have non-terminal jobs. Similar to the Directors, the Drainer will acquire a lock in consul to ensure only a single process is draining at a time.

We're hoping to go into a bit more depth on how the drainers work and these jobs move around in an upcoming post on Centrifuge's two-phase commit semantics. Ensuring that your data has moved to another system does require fairly complex transactional semantics, so we're hoping to go into depth about how this works.


How are you blocking multiple director processes from writing to the JobDB instances? I.e. node A gets partitioned away, a new node B picks up in its place, but then node A comes back and still thinks it owns the lock for the JobDB instance.

I would imagine you're taking the database version from Consul and using that as a fencing key for writes within a transaction in the database, but I didn't see that mentioned in the Go code snippet.


In short... atypically long timeouts on these locks. We can afford to inject a few minutes of delay on a small number of in-flight messages to preserve this safety property. I’d agree that we should probably have a second layer of fencing.


Is Centrifuge going to be open source?


We do want to open source the code. In order to get to production we took some shortcuts which tightly integrated some components of this system with Segment’s infrastructure and would make it hard for anyone to deploy. We discussed open sourcing the code along with this blog post but we felt like it would have more value once we’ve put a bit more work into it to make it easier to work with, and add a couple of features we have in the pipeline.


Would like to know this as well.

I have been on the lookout for a similar system to SideKiq.

Unfortunately for Go, there isn't anything that matches it 100%.

I have looked into:

https://github.com/celrenheit/sandglass

https://github.com/RichardKnop/machinery

https://github.com/contribsys/faktory

In the end I went with machinery due to supporting:

- Batches (or Groups, tasks executed at the same time)

- Chains (tasks executed one by one)

- Callbacks (a task executed, on completion of a Batch)

- Cron Jobs

- Rescheduling failed jobs

- Long-runnng Jobs

- Distributed/Fault Tolerant DB (for Jobs)

- Distributed/Fault Tolerant Workers

+ other things I cant remember now.

It would be very interesting to know if this is going to be opened sourced and whether it supports the list above.


I don't know if it exactly fits, but maybe check out some of the stuff Customer.io has been doing, such as https://github.com/customerio/fairway? It seems like they're solving similar problems and contribute a lot of Go projects.


Unfortunately it's not.

From their own ReadMe:

> Fairway isn't meant to be a robust system for processing queued messages/jobs. To more reliably process queued messages, we've integrated with Sidekiq.


I wonder if this would have built on Apache Pulsar (https://pulsar.apache.org) if it had been in open source and on the Segment team's radar at the time they started on Centrifuge. I work with one of the architects of Pulsar, and his first thought on seeing the blog was that Segment's scenario had a lot of similarities to what he and the team and Yahoo set out to do when they first built Pulsar there several years ago.


My spidey sense is telling me that this could have been achieved with a few Erlang VMs and a lot less moving parts.

It's a supervisor (Director) with a bunch of actors communicating with different 3rd party API's. The state mechanism could ostensibly get abstracted away so the underlying DB is irrelevant.


One of the challenges for us is that our downstream API integrations are almost all Javascript (and often provided by the integration vendors themselves). Rewriting the hundreds of integrations we provide into Erlang wasn't a viable option for us.


Interesting. How does the Go code interface with that?


The integrations gateways are built into HTTP services, so the Director is an HTTP client to those gateways. (The image under "All Together Now" omitted our integrations gateways -- architecturally, they lie between the Director and the actual third-party API endpoints.)


i don't understand. presumably the go code just makes an http call? what makes you think you can't do the same in erlang?


I don't know much about Erlang - wouldn't it be able to interface with HTTP in the same way?


Erlang certainly can be used to implement an HTTP client, but once you step outside of Erlang's actor model and virtual machine, you lose those benefits that it provides in terms of message passing behavior.


(the state of http clients on Erlang is also pretty abysmal.)


Just because you read it in a HN comment doesn't mean it works in the real world.

It has to deal with lots of third party interfaces, websockets, webhooks, etc. Erlang is notoriously bad for modern web stuff, with elixir making the ecosystem more brittle and lower quality. If you're building a system that needs to do a lot of back-and-forth, go is a much more usable tool.

Have you ever tried to debug a giant erlang clusterfuck, calling out to a dozen different service gateways, varying tuple msging nonsense all over the place, jankedy ass comment annotations that aren't even accurate half the time, and all the while trying to fix things with hot swaps?

It all sounds very good when Joe writes those weird little lewis carroll love letters about how magical erlang is, but I've worked on a system like that and I'll never touch erlang again except for small single-shot services. It's completely cancerous as a language ideology.

Not that go is all that much a panacea, but it's definitely the perfect fit for this sort of thing.

Hope this doesn't come across as too harsh, but what you're describing is a rabbit hellhole I've already crawled through.

Also, with projects this size, abstracting db details isn't wise. This kind of scale necessitates more precision in what schemas you choose and what configuration your dbms has. Crap like ecto will ultimately cost you in efficiency. Elixir can get away with it because beam works hard to do supervision thoroughly, but there are multiple performance penalties that beam imposes on itself by choosing abstraction instead of specialization, and again, this problem would demand a specialiZed approach to be efficient as it scales up.


"Erlang is notoriously bad for modern web stuff" Care to put some stats? Having used Erlang/OTP for several years, I find this very silly.


[flagged]


> Erlang is a language that doesn't have strings, for fucks sake.

It has binaries can those can be used as strings. Most json parsers turns json string into binaries.

> But a person who says erlang is suitable for modern web development is not living in the real world.

Says person on the internet who is swearing and doesn't seem to know what they are talking about.

Those silly Erlang and Elixir users, should be glad they finally found a new HN account telling them the truth.

> web demands a different model of problem solving than let-it-crash 90s telecom hackery provides

Ok, I'll bite. What is the new rockstar model of problem solving we should all be using?


I can't imagine being fragile enough that swearing in the Internet is like a major negative experience.

I've read more erlang than most of the elixir fanclub hackers will ever read or write. (granted elixir is a bit more terse, w/e) and I've shipped a functioning exploit that composed into RCE in a system that over twenty developers spent more than a decade designing, developing, and defending. Most of it was erlang. Almost all of it, really, except for some deployment routines and all the sql, etc.

I don't know the best way to solve problems, and I also don't know the very best problems that erlang is suited for, but large-scale web-oriented erlang is a mistake.

It's not that I want to hurt HNers feelings, it's just that I've literally seen this before. This whole thing is a rerun.

Just watch and see.

If it makes your booboo hurt any less, I think most elixir hackers are decently practiced hackers in general. If they had better beans, they'd make better coffee. The team I worked with were definitely top notch.

As for your question, I have no idea. I'm more of a problem spotter. My """solutions""" are generally of the "don't do this dumb thing" variety, with an accompanying exploit or kit or injection or whatever it is for that problem.


Please don't cross into incivility, regardless of how much more Erlang you know than others.

https://news.ycombinator.com/newsguidelines.html


I was careful not to.

God forbid someone be told they're wrong on HN.


Interpretations of what's uncivil differ, of course, but we ban accounts that snark like this repeatedly:

> If it makes your booboo hurt any less

That's a long way from just telling someone they're wrong—which, btw, by itself isn't that substantive in the first place. Explaining how they're wrong is much better, because then the rest of us can learn something. But presumably this is what you meant. You've posted a bunch of substantive comments that are just fine for HN, so whatever the above issue is should be easy to fix.


The fact that you didnt even knew about a widely popular library like jiffy until now, and you keep on ranting, "oh, my jsons..". It is just your ignorance, pal. You are ranting on about things you only have vague idea about.

For modern web development, go check out Cowboy. You are welcome.


Did it exist ten years ago?

Of course not.

But new libraries don't solve the underlying language infrastructure.


Ah, so you were not able to work with json in Erlang a decade ago, and still hold grudge about it! Jeez.

If your understanding of "underlying language infrastructure" is also based on your experience 10 years ago, then that is the same with all languages, pal. With every version, programming languages evolve, they become better. Erlang is no different.


Elixir is widely used for "modern web stuff." The OP said a few "Erlang VMs", not "the Erlang language"


Whoa, elixir is not even remotely widely used. Again, strong signs on HN-itis.

A few silicon valley corps experiment with it, maybe a couple major players have a service built with it. But in terms of actual mind share in the webdev sphere, it's a tip of the tip of the iceberg.

I already mentioned that beam is cool. I'm not making a point that beam can't be made to do things, only that it isn't suitable for some other things. In the sense of sending bytes from one place to another and having mechanisms for recovering from faults in a network where that communication is happening, erlang is good. But in dealing with complex page requests, structured data from json or webpages or user input or third party apis, it's not a good tool. A modern web developer is thinking about scalability, interopability with tools written in potentially dozens of different languages or variants of js, etc.

Elixir is cashing in on disgruntled shops frustrated with ruby but too incompetent to do scala/f#/clojure or potentially they were misled by uninformed hackers reading too much HN about how wonderful elixir is. They are going to find that while there are benefits to using a telephony system with decades of work out into it, they are inheriting decades of cruft as well. A luxury with ruby was that the past few years practically abandoned all sects of usage other than rails, so while performance was lacking, ergonomics were excellent and developer availability was unmatched even by today's standards where js is ubiquitous. You ever tried to use dialyzer for an app where the source lives on forty different machines? It's not fun. You ever had an actor that just randomly fails every now and then? You hunt and hunt, and in the end, locking it to a certain machine makes it go away and you realize that a machine somewhere else is causing good code to fail... Randomly.

BEAM for the web is a ticking time bomb.

I may seem bitter, but believe me, I'm not. In infosec, developer stupidity pays my bills. I used to think that I could do my job so well that someday stupid devs would all become defensive coders. But I've become a bit jaded, I suppose.

Btw, There is no such thing as erlang vm. The beam vm is the product, erlang composes otp, which generates beam byte code. At least, not an official one. Maybe there is one somewhere.

I meant what he knew


> Elixir is cashing in on disgruntled shops frustrated with ruby but too incompetent to do scala/f#/clojure or potentially they were misled by uninformed hackers reading too much HN about how wonderful elixir is.

You’re a very disgruntled engineer and these are very subjective opinions.


I'm a jaded engineer, there's a difference.

If all the software bugs in the world disappeared tonight, I'd be out of a job tomorrow morning but I'd have one hell of a good morning.

The world sucks. I don't choose to hate it anyways, but in order to make good solutions, having a realistic grasp of how extensive the problem is facilitates better, more correct solutions/fixes.


Great post!

> To keep the ‘small working set’ even smaller, we cycle these JobDBs roughly every 30 minutes. The manager cycles JobDBs when their target of filled percentage data is about to exceed available RAM.

I'm confused - why does JobDB's memory trickle up over time? Isn't it a database? Are you using MySQL's memory storage engine or something?


There are no issues with memory utilization of the MySQL databases other than once the working data set don’t fit in memory anymore it has to be fetched from disk on cache misses, which slows down fetches. Disk utilization of the Job databases increases over time because we don’t do any deletes to avoid extra write operations (a delete is basically a write). Once a database disk utilization reaches a defined threshold it is swapped for a spare, then drained, destroyed, and reinitialised to be reused later on.

I hope this gives some clarity, let me know if you have further questions.


Awesome write-up! Curious on the decision to choose MySQL since it's a write-heavy load with minimal querying. Would something like Redis be better suited? I'm not super familiar with Redis so just curious about what other DBs were considered.


The biggest problem with Redis compared to MySQL in this use case is persistence. You could technically run Redis using the different persistence options https://redis.io/topics/persistence, but it is not going to be as safe as storing things in a RDBMS.


Hi, thanks for the great write up. Can you provide some more details about how you handle the RDS side of the rotation process and maintaining the spares?


Thanks for the post! It was really a good read.

In a similar position I would have tried MQTT that should handle quite well a great number of topics.

Did you guys tried such protocol?


Correct me if I’m wrong, but MQTT is just a network protocol, it doesn’t solve storage, retries, failure resilience, etc... it’s well suited for pub/sub operations over TCP, not so much for ensuring “exactly once” delivery of messages.


Yes, you are correct. MQTT is a network protocol.

You build your infrastructure on top of it to support stuff like storage, retries, resiliency, etc...

However, it does gives you some guarantee, namely that the message will oblige its QoS (at most once, at least once, exactly once).

On top of these guarantees, you build whatever you need.

Honestly, I am quite glad that you are writing these posts, it is a great service for the community and if you end up open sourcing it, the project will bring a lot of value to everybody.

Thanks for your posts :)


Most MQTT implementations support "exactly once delivery": https://en.wikipedia.org/wiki/Comparison_of_MQTT_Implementat...


By age 35 you should have written your own queuing system at least once.


Exactly.


Ok, I've read through most of this (really cool post btw!) and I still can't figure out if this project is using the Centrifuge stack or not:

https://github.com/centrifugal

It looks like not, but that's a hell of a naming overlap.

I looked at Centrifugo a few years ago to deliver live Hearthstone games (in game replay format) through the web. It's a pretty sweet project.


It doesn't look like Segment has released the codebase? They do have quite a few public repos[1]...

1: https://github.com/segmentio?utf8=%E2%9C%93&q=&type=&languag...


It is not. The naming resemblance is strictly coincidental :)


Well I for one think that it's not really "cool" to just copy the name for more or less the same thing.

BTW, it also appears there is an active US trademark (#78010053) for "centrifuge" in the computer database context. Isn't that an issue, too?


By coincidental, I mean that no copying was intended. It was strictly an accident.


Ah ok; I have made a habit of typing "<name> {software,github}" into google as well as the WIPO trademark search when picking a name for a new project.

I started doing that after having to go through two iterations of renaming a project after finding out the old name was somehow encumbered... of course after we had already used it in front of other people; it was very embarrassing.


Honestly, I don't think any of us had heard of this other project when we coined the name internally. If someone had raised it up the flagpole, we certainly would have considered it.

Still, do keep in mind that it's not the name of a product offering - it's solely an internal name for the technology.


That’s correct, we were multiple months into the project and already using it in production when I came across (the other) centrifuge. We chose to just go with it because it wasn’t worth changing all the code and everyone’s habit of referring to centrifuge with this name.


What is with the rash of bad naming decisions these days?

It started with "Cucumber" or "Celery" or something a few years ago didn't it?

I'll skip ranting about how "Go" was a terrible name for a PL (and only mention parenthetically how they gaffled that name from a different PL!) They have "Grumpy", and something called "Thanos" (good luck searching for that until the hype for that comic book movie dies down.) I feel like I've seen several other projects recently that have been named after other things.


Yes, fortunately Ruby, Python, Julia, Java, Delphi, Elixir, Lisp, Forth, Groovy, and Rust are all unique words with no other meaning.


Oh hey everyone, sarcasm! You don't see a lot of that every day. ;-P

And Elm, Swift, Logo, Pascal and Haskell and Ada, Icon and Self, Opal and Occam and Maple...

There's a lot of them: https://en.wikipedia.org/wiki/List_of_programming_languages

If all your friends name their projects after something and then jump off a bridge, it's still a dumb thing to do.


Yes, sarcasm but, much like jumping off bridges recreationally, naming programming things after existing words is an old practice. Lisp is 60 years old. Complaining about Celery and Centrifuge seems as disingenuous as sarcasm. We can agree on Go though. Some words are simply too common.


Well met. And you're right, the practice is older than I said (and not even confined to computer stuff for that matter.) I just feel like I've seen a rash of generically-named projects recently and that's why I went off when I saw this one. (I've also been on imgur this last week and I think it's affecting my communication style.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: