

The coordinator/overlord handle metadata about segments, but do not actually touch the segments, so we can save some dollars there.

So broker/historical/index service nodes run on dedicated instances whereas the coordinator/overlord nodes run on a normal machine. We’re currently using opsworks and specify instance tenancy as dedicated when adding nodes. They cost more, but the machine isn’t shared with anyone else. Fortunately Amazon allows you to provision Dedicated Instances. The authors of Hipaa guidelines must not trust virtualization tech b/c you cannot as a company share physical machines with another company for any machine handling sensitive data. Make available a jets3t.properties file on the druid classpath where you can enable encryption. Cloud formation snippet:ĭruid uses the Jets3t library for uploading files to S3. S3 checks our headers when requesting to upload and will throw an exception if the upload isn’t encrypted. We placed an upload rejection policy on our S3 bucket to prevent un-encrypted uploads. S3 fortunately has built in server side encryption. We use M4 class machines and mount encrypted EBS volumes for Druid to use.

Amazon has 1st class support for encrypted EBS volumes. So how does that fit with Druid which has no built in security in a public cloud? Lets start with the easy parts first.įor EC2 instances like the index service and historical nodes, on disk storage is used for holding segments. Your virtual machines may not co-exist with ec2 instances of another AWS customer.If data goes over a network, encrypt it.The TLDR on Hipaa can be summed up with 3 rules I haven’t seen anyone talking out Druid + Hipaa, so I figured I’d share an outline of what it took. Druid’s an amazing engine and we just wrapped up becoming Hipaa compliant (for handling sensitive health care data) hosted on Amazon web services.
