Streaming Settings
Introduction
The Helios FHIR Server offloads transactional activity to the Cassandra cluster via Apache Kafka. Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Architecture
Kafka affords Helios FHIR Server administrators both choice and flexibility. Certain events that occur in the Helios FHIR Server may be configured to flow data through Kafka Topics. Events are organized and durably stored in Topics. Very simplified, a Topic is similar to a folder in a filesystem, and the events are the files in that folder.
Using Apache Kafka, Helios FHIR Server administrators may decide:
- How to balance write performance vs. eventual consistency.
- How to affordably handle burst capacity using lowest cost compute resources such as AWS EC2 spot instances.
- How to route specific data to the most suitable and cost-effective hardware optimized for that workload.
Kafka Settings
There are three ways to configure these configuration settings.
- Manually modify the $KARAF_HOME/etc/kafka.properties.cfg file prior to, or after startup.
- Use the Helios FHIR Server administrative user interface (Default - http://localhost:8181/ui), Settings menu to view or modify these values. Changes to these values in the administrative user interface will overwrite the values in kafka.properties.cfg.
- Use environment variables to override the kafka.properties.cfg values. This approach is helpful when you need to pass configuration values to a Docker container or otherwise do not wish to modify the values in the kafka.properties.cfg file.
Each setting section below lists two values.
- The first is the kafka.properties.cfg value, and
- the second is the same value expressed as an environment variable.
Boostrap Servers
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. This list should be in the form host1:port1,host2:port2,...
See Kafka documentation on bootstrap.servers.
bootstrapServers = kafka-server:9092
KAFKA_PROPERTIES_BOOTSTRAPSERVERS = kafka-server:9092
Number of Partitions
Topics are partitioned, meaning a topic is spread over a number of "buckets" located on different Kafka brokers.
See Kafka documentation on Main Concepts and Terminology.
numPartitions = 1
KAFKA_PROPERTIES_NUMPARTITIONS = 1
Replication Factor
Kafka replicates the log for each topic's partitions across a configurable number of servers. This allows automatic failover to these replicas when a server in the cluster fails so messages remain available in the presence of failures. See Kafka documentation on Replication.
replicationFactor = 1
KAFKA_PROPERTIES_REPLICATIONFACTOR = 1
Linger Time
The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out. However in some circumstances the client may want to reduce the number of requests even under moderate load. This setting accomplishes this by adding a small amount of artificial delay—that is, rather than immediately sending out a record, the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together. See Kafka documentation on linger.ms.
lingerMilliseconds = 100
KAFKA_PROPERTIES_LINGERMILLISECONDS = 100
Streaming Settings
In environments where it is desirable to have finer-grained control of the write transaction volume to a Cassandra cluster, the streaming settings below may be used to further improve on already excellent Cassandra write performance. This approach may be necessary in the following scenarios:
- Your Cassandra cluster may be subject to heavy write volume, such as during an initial data load.
- You wish to separate your data architecture according to a design pattern that enables greater performance and flexibility such as Command Query Responsibility (CQRS).
Stream Resource Inserts
Set to true
to stream all FHIR Resource inserts. If you set this to true
, set all other streaming settings below to false
.
streamResourceInserts = false
KAFKA_PROPERTIES_STREAMRESOURCEINSERTS = false
Stream Reference Inserts
Set to true
to stream FHIR References - i.e. the relationship between FHIR Resources such as in Patient the managingOrganization refers to an Organization FHIR Resource:
"managingOrganization" : {
"reference" : "Organization/1"
}
In this example, the organization search parameter, if enabled, will not return results until all references have completed streaming. Additionally, other FHIR capabilities that use references such as the $everything operation will similarly be impacted.
If you set this to true
, set streamResourceInserts
to false
.
streamReferenceInserts = false
KAFKA_PROPERTIES_STREAMREFERENCEINSERTS = false