Introduction

Geneva is a library that enables you to put code into YAML or JSON. It lets you call functions and define and reference variables there in the data.

Here's an example to get us started.

definition (edit)
fn:add:
- 40
- 2
result
42

This YAML calls an add function and passes in two arguments, 40 and 2.

The rules for Geneva are this.

  1. A key prefixed with fn: is a function call, and the value is the arguments passed to that function. To pass in multiple arguments, use an array.
  2. Any string prefixed with ref: is a reference to a value in the current context. An example might look like ref:fullName where fullName is the name of the variable.

Functions and references. That's it. Now you're an expert!

Geneva makes many functions available to use. Every function from Ramda and Ramda Adjunct is there, along with JSON Path and my own tool called Saunter.

Why would you do this to us, Stephen?

Now I'll be the first to admit that it sounds like a bad idea to put code like this in YAML. Developers don't want to write code in a serialization format. Non-developers don't want to learn to write code in their API definitions. It feels like a lose-lose situation.

However, this isn't new. People are putting in code in their YAML everywhere you look. Let me show a few examples—not to point out anyone is wrong, but to show that we've chosen this approach.

AWS does it

AWS CloudFormation has what they call Intrinsic Functions that people can use in their templates. Here is an example of their join command.

!Join
- ""
- - "arn:"
- !Ref AWS::Partition
- ":s3:::elasticbeanstalk-*-"
- !Ref "AWS::AccountId"

We see here a function call with !Join and a variable reference with !Ref. However, if you've had experience with this, you'll know this is not only tough to read (yikes), it's difficult to write and get correct!

AWS is not shy about calling this "infrastructure as code" even though it's YAML. We're all OK calling this code.

Azure does it

Azure Resource Manager (ARM) templates do it. They're better in ways because the code is easier to read. They're worse in ways because the code is all in strings, which makes it harder to write.

{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"greeting": {
"type": "string",
"defaultValue": "Hello"
},
"name": {
"type": "string",
"defaultValue": "User"
},
"numberToFormat": {
"type": "int",
"defaultValue": 8175133
}
},
"resources": [],
"outputs": {
"formatTest": {
"type": "string",
"value": "[format('{0}, {1}. Formatted number: {2:N0}', parameters('greeting'), parameters('name'), parameters('numberToFormat'))]"
}
}
}

ARM templates let people define parameters to pass into the document, allowing them to generate different outputs depending on the context. This is a common pattern.

This approach is not only about using functions in the YAML. It's about generating different builds and outputs depending on what is passed in.

GitHub Actions does it

GitHub Actions is a newer service, and they have defined a YAML format that allows for expressions. Here we see an example of an if expression.

steps:
# ...
- name: The job has failed
if: ${{ failure() }}

Like the previously-mentioned formats, they support many functions.

JSON Schema does it

JSON Schema has if, then, else support among other things.

"type": "object",
"properties": {
"street_address": {
"type": "string"
},
"country": {
"enum": ["United States of America", "Canada"]
}
},
"if": {
"properties": { "country": { "const": "United States of America" } }
},
"then": {
"properties": { "postal_code": { "pattern": "[0-9]{5}(-[0-9]{4})?" } }
},
"else": {
"properties": { "postal_code": { "pattern": "[A-Z][0-9][A-Z] [0-9][A-Z][0-9]" } }
}
}

OpenAPI and AsyncAPI do it

Like JSON Schema, OpenAPI and AsyncAPI support a $ref keyword as a way to reference other values. It also supports server variables.

servers:
- url: https://{username}.gigantic-server.com:{port}/{basePath}
description: The production API server
variables:
username:
# note! no enum here means it is an open value
default: demo
description: this value is assigned by the service provider, in this example `gigantic-server.com`
port:
enum:
- "8443"
- "443"
default: "8443"
basePath:
# open meaning there is the opportunity to use special base paths as assigned by the provider, default is `v2`
default: v2

There's also a proposal for Overlays on the table that creates a code-like specification.

See, everyone else is doing it!

Again, this is to show a trend, not to say this is good or bad. As we see, it's not as wild as it sounds to treat YAML as code. We're doing it everywhere.

But is it shifting a bit? AWS has created CDK to let people write code in programming languages like Python rather than write YAML. They've changed the interface more than the functionality.

And there's a company Pulumi that banks on the idea of writing infrastructure in programming languages to address the YAML explosion.

Maybe you've heard of the Configuration Complexity Clock.

  1. People write configurations in real code
  2. Over time, they move toward a general specification
  3. The specification gets complex and people decide to move to code. Why use a limited language when you can use the real thing?
  4. Repeat

If there is a shift back to programming languages, this won't get rid of this need for dynamic features in specifications. As a matter of fact, it may make the need even more prevelant as people build tools across many programming languages.

The past is new again

To add to it, we've been here before with XML. We had XML Schema and XSLT as ways to validate XML and transform XML. We shifted to JSON and later YAML because we felt XML was overly complex. Now we're back in the same situation with JSON and JSON Schema, though we lack a more formal transformation tool like XSLT.

We're repeating the path as before, just a bit differently.

Why do it?

There are direct upsides to adding dynamic features to a specification. One reason is that it allows people to write logic or take advantage of reusability without installing development environments. A text editor is all you need.

Another reason is for safety. It's a risk to run other people's code in your system. Adding functions into a YAML file gives users freedom with guardrails.

As far as Geneva goes, one of the main reasons to think this way is to provide a way to simplify other specifications. Instead of adding these features to every specification, we could keep them all simple and rely on a specification or tool designed to handle the complex parts.