There is no news that we have a history of not entirely reliable email, the problem has been twofold, one that our mailservers got overloaded writing to disk and therefore emails got inject into qbufferd, which was rusty and old and didn't do things very well, it was singlethreaded and tried each email once at a time and sent it to our mail servers.
The other is that we had bad visibility into what postfix was doing (if you are going to defend postfix, don't bother, I don't care, it is a good server, but we needed something different).
So, the solution is to use our reliable job manager TheSchwartz
with our own custom MTA
. The end result is that emails will be in our databases until the recipients email server gets them or we retry enough times to fail. We are already trying out a few domains with this and will be moving over more traffic beginning next week to this system.
So hopefully, next time we run into email problems, we will know what is going on.