Industrial macros
Summary
Most programmers think macros are complicated and that their programming language of choice shouldn't have them.
But most industry codebases use code generation constantly:
- Serialization libraries like Avro and Protocol Buffers generate code that you import
- Database libraries like Java's jOOQ, Go's jet, and TS's pgtyped generate code to prepare SQL queries. They often read the SQL schema directly from the database.
- React-like UI frameworks expect you to use an extended JS syntax with embedded HTML. Babel plugins and transpiling are a different words for syntactic macros.
Macros are code generation run by your programming language. For example, the Rust ecosystem uses macros to solve all those problems (e.g. serde for serialization, sqlx for SQL queries, Yew for UI).
So, next time you say "macros are bad" remember that, even if you are right, you still need them and use them all the time. You need code-generation so much that you import entire additional compilers into your project.
Code generation is everywhere
There is a common belief in the programming languages world that "macros are bad and complicated" which then translates into these conclusions:
- I don't use macros.
- Programming language X shouldn't have macros.
I think those conclusions don't quite follow even if you agree with the original statement that macros are bad and complicated.
You are most likely already using macros in your codebase1 even if you ostensibly dislike them in the abstract.
Macro is the name we give to code generation that is built-in to the programming language. Once you replace "macro" with "code generation", you see that:
- Most industry codebases use code generation in one way or another
- They use code generation for practical concerns like serialization, rendering UIs, and database access.
In this post, I'll do a shallow overview of those industrial macros to show you how widespread they are. And if I succeed, hopefully you can see how this sentence is flawed:
My programming language doesn't need macros. They make it too complicated.
And instead we get:
If a programming language doesn't have macros, its ecosystem will use "templates", "code generation", "transpilers", "meta-programming" instead and they will be adopted by a majority of its users.
To be extra clear: I am not trying to convince you that "macros are good" or that "you should be using macros all the time". This post is simply a reminder that you are most likely already using macros, you chose them, and think they are better than the alternative of not using them.
On to the list of industrial macros.
Serialization: Avro and Protocol Buffers
Avro and Protocol Buffers work the same way:- You define the data types for your messages, in a custom schema language they provide. This schema is shared across codebases and across programming languages.
- They give you a utility program that reads those schema definitions and generates code in your programming language of choice.
- You now include that generated code in your application and to serialize your data structures in and out of binary streams.
In other words, Protocol Buffers is an additional compiler that parses, analyzes, and generates code:
- Java: Here is documentation for where to put the generated code. The Java gRPC compiler generates files that you are supposed to check into your codebase. They wrote a compiler in C++ that writes Java files to your disk.
- JavaScript. They provide a CLI to generate JS files from your protobuf definitions. Here is where they tokenize and parse the schema files and here is where they generate the JS files.
- Rust. The Rust package is neatly organized into parser , codegen modules. They also write the generated Rust source files to disk.
- Python and Ruby. Here is how to invoke the protobuf compiler to generate Python and Ruby code.
Notice how in all these cases, somebody is:
- Parsing files
- Doing static analysis over them
- Generating source files by concatenating strings
- Writing them to disk
- Reading those generated programs from a different interpreter / compiler
Avro is mostly the same2. In their documentation they do specify that they only generate code for statically typed languages:
Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.
For serialization, Rust uses macros instead
Rust has serde, a library to serialize and deserialize Rust data structures. From their docs:
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize)]
struct Point {
x: i32,
y: i32,
}
fn main() {
let point = Point { x: 1, y: 2 };
// Convert the Point to a JSON string.
let serialized = serde_json::to_string(&point).unwrap();
// Prints serialized = {"x":1,"y":2}
println!("serialized = {}", serialized);
// Convert the JSON string back to a Point.
let deserialized: Point = serde_json::from_str(&serialized).unwrap();
// Prints deserialized = Point { x: 1, y: 2 }
println!("deserialized = {:?}", deserialized);
}
Notice the #[derive(Serialize, Deserialize)]
line. Those are macro invocations. Rust has macros! So, Rust is capable of generating code natively without a special build system and having writing files to disk.
User Interfaces: HTML and JSX syntax
Most frontend JavaScript and TypeScript use JSX syntax3 to define a user interface with React or some other frontend framework.The Babel package that implements this, @babel/plugin-transform-react-jsx, explains the code that it generates for you:
// Code you write
const profile = (
<div>
<img src="avatar.png" className="profile" />
<h3>{[user.firstName, user.lastName].join(" ")}</h3>
</div>
);
// Code it generates
import { jsx as _jsx } from "react/jsx-runtime";
import { jsxs as _jsxs } from "react/jsx-runtime";
const profile = _jsxs("div", {
children: [
_jsx("img", {
src: "avatar.png",
className: "profile",
}),
_jsx("h3", {
children: [user.firstName, user.lastName].join(" "),
}),
],
});
I got lost in the Babel codebase but AFAICT, this file shows how JSX AST nodes are validated, transformed, and passed to the Babel compiler to be generated into regular JS.
If you are using Babel, you can think of each Babel plugin as a set of macros that generate code on your behalf. As of June 2024, the @babel/core
package is downloaded ~50M times per week from npm.
For user interfaces, Rust uses macros instead
I don't know much about Rust's UI ecosystem but here are some UI libraries that implement macros to get a similar feel to JSX:- Dioxus Labs implements a
rsx!
macro, very similar to JSX - Slint is implemented in Rust and parses a DSL they created to specify how the UI should look like. You can define the UI in
.slint
files or in.rs
inside of theslint!
macro. - Yew uses a
html!
macro that lets you write something similar to HTML inside of your Rust source files:use yew::prelude::*; #[function_component] fn App() -> Html { let counter = use_state(|| 0); let onclick = { let counter = counter.clone(); move |_| { let value = *counter + 1; counter.set(value); } }; html! { <div> <button {onclick}>{ "+1" }</button> <p>{ *counter }</p> </div> } } fn main() { yew::Renderer::<App>::new().render(); }
To be clear, not all Rust UI libraries use macros and macros are not necessary to write UIs. For example, Druid, Xilem, Rust bindings for fltk, and iced are UI frameworks without macros.
Databases: SQL queries
To have better APIs to deal with SQL, many programming language ecosystems resort to generated code in way or another:- Haskell has Persistent which lets you define a schema with Haskell type declarations and then generates Haskell code with Template Haskell4 to query a database with that expected schema.
- Java has jOOQ which "generates Java code from your database". It has a compiler that you run every time you change your schema, it reads from the database, and generates Java code that you can then call from your application to query SQL.
- Go has jet which does the same as jOOQ.
- TypeScript has
pgtyped
.
The SQL-to-code tradition has been going for decades at this point. This fun article explains how in 1995 COBOL programmers used Embedded SQL, a SQL-to-COBOL code-generator:
For SQL, Rust uses macros instead
Once gain, Rust uses its macro system to solve this problem. For example, sqlx comes with the sqlx::query!
macro to generate at compile-time the Rust your SQL query would return:
let countries = sqlx::query!(
"
SELECT country, COUNT(*) as count
FROM users
GROUP BY country
WHERE organization = ?
",
organization
)
.fetch_all(&pool) // -> Vec<{ country: String, count: i64 }>
.await?;
// countries[0].country
// countries[0].count
To do that, the macro finds a local database with the schema and introspects it to understand the types of country
and organization
the users
table. This macro reads from a database at compile time5.
Common objections
You can object to many parts of this post:
Dynamically-typed languages don't need macros. Ruby and Python are not prominent in the examples above. Many of those examples come from an underlying need that is specific to statically-typed languages: when you need to specify types that depend on external systems (protobuf schema, SQL schema), it is error-prone to spell out those type definitions in two separate places (the programming language and the external system). But code generation is also used for UIs:
- Chameleon generates Python bytecode from HTML templates into strings which it then loads into Python using Python's builtin
compile
andexec
. - Tilt generates Ruby code for each of its templates. It generates strings with Ruby code and then it loads them into the Ruby VM.
Ruby and Python are dynamic enough to generate code and load code while your application is running. Their meta-programming solves many of the problems that macros solve elsewhere.
Those examples are all complicated things I don't like. Great! Do you still use them? If you write applications for a living and you don't use any of those code-generation patterns, good for you. But if you don't like them and still use them, hopefully you can see that you have a need that can be solved by code generation, even if you don't like that solution.
Programming language X doesn't have macros or code-generation and solves these problems. Great, consider porting those solutions to other ecosystems so that 99% of programmers can enjoy them! Also, send me an email with what that is.
These are all fine objections to the idea of macros but don't address the point of the post: somehow, everybody ends up using code generation anyway.
Takeaway?
If by now the following points are not obvious to you, then I have failed6:- Most programming language ecosystems develop the need to generate code.
- If the programming language doesn't support macros directly, its ecosystem develops someway to do it. It is often called templates, code-generation, transpiling, or monkey-patching7.
So, please, next time you dunk on macros, remember that you are using them too.
Footnotes
- You are most likely already using macros in your codebase if you are reading a blogpost about macros in programming languages...↩
- You can find their compiler implementations for C++ here and for C# here.↩
- These days, when starting out, many programmers think JSX is part of the JS standard.↩
- According to a friend that has been writing a Haskell application for years, many applications use Template Haskell but also, many stay away from it.↩
- In case you don't want to depend on that database being live on your system, you can also prepare a representation of that database to check in to your source with a CLI they provide.↩
- At least I have succeeded at writing and entire post about macros without using the L-word.↩
- Maybe in a different post I could explain why these strategies are worse than directly supporting macros.↩