etl4s
Powerful, whiteboard-style ETL.
import etl4s._
val extract = Extract(100)
val half = Transform[Int, Int](_ / 2)
val double = Transform[Int, Int](_ * 2)
val print = Load[String, Unit](println)
val save = Load[String, Unit](s => println(s"[db] $s"))
val format = Transform[(Int, Int), String] {
case (h, d) => s"half=$h, double=$d"
}
val pipeline = extract ~> (half & double) ~> format ~> (print & save)
pipeline.unsafeRun()
// half=50, double=200
// [db] half=50, double=200
import etl4s._
case class DbConfig(host: String, port: Int)
val extract = Extract(List("a", "b", "c"))
val save = Load[List[String], Unit].requires[DbConfig] { db => data =>
println(s"Saving ${data.size} rows to ${db.host}:${db.port}")
}
val pipeline = extract ~> save
pipeline.provide(DbConfig("localhost", 5432)).unsafeRun(())
// Saving 3 rows to localhost:5432
import etl4s._
val A = Node[String, String](identity)
.lineage(name = "A", inputs = List("s1", "s2"), outputs = List("s3"))
val B = Node[String, String](identity)
.lineage(name = "B", inputs = List("s3"), outputs = List("s4", "s5"))
Seq(A, B).toMermaid
graph LR
classDef pipeline fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000
classDef dataSource fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000
A["A"]
B["B"]
s1(["s1"])
s2(["s2"])
s3(["s3"])
s4(["s4"])
s5(["s5"])
s1 --> A
s2 --> A
A --> s3
s3 --> B
B --> s4
B --> s5
class A pipeline
class B pipeline
class s1,s2,s3,s4,s5 dataSource
import etl4s._
val process = Transform[List[String], Int] { data =>
Tel.withSpan("processing") {
Tel.addCounter("items", data.size)
data.map(_.length).sum
}
}
// Dev: no-ops (zero cost)
process.unsafeRun(data)
// Prod: plug in your backend
implicit val tel: Etl4sTelemetry = MyOtelProvider()
process.unsafeRun(data)
~>, branch with &, inject config with .requires.unsafeRun(). Works in scripts, Spark, Flink, anywhere Scala runs.Pipelines as values.
One file, zero dependencies. Lazy, composable, testable. Since pipelines are values, attach metadata, generate lineage diagrams, share them across teams.
Type-safe composition.
Types must align or it won't compile. Misconnections are compile errors.
Dependency injection, inferred.
Nodes declare what they need. Chain freely. The compiler merges and infers the combined type.
Why etl4s?
Chaotic, framework-coupled ETL codebases drive dev teams to their knees. etl4s lets you structure your code as clean, typed graphs of pure functions.
(~>) is just *chef's kiss*. There are so many synergies here, haven't pushed for something this hard in a while.
Sr Engineering Manager, Instacart
...the advantages of full blown effect systems without the complexities, and awkward monad syntax!
u/RiceBroad4552
Battle-tested at Instacart 🥕