From Kakeibo Chaos to Sankey Clarity: Docker, SQLite, and Plotly Magic

If you’ve ever felt buried under expense tracking apps but craved an effective visual that makes your entire year’s cash flow easy to read with a Sankey diagrams this is for you. Today I’m sharing my exact journey from exporting a Kakeibo database to generating a publication-ready Sankey diagram using Docker, SQLite, Python, and Plotly.

The Spark: That Reddit Sankey Epiphany

Scrolling Reddit last December, I stumbled upon a post titled “Our 2025 Year – Family Finances.” The attached Sankey diagram was pure gold: income streams converging into a central “budget” node, then exploding into expenses. It wasn’t just pretty, it told a story of personal finance!!!! At a glance, you saw allocation efficiency, waste spots, and savings triumphs.

Sankey diagrams emphasize the major transfers or flows within a system. They help locate the most important contributions to a flow, regardless we are talking about a flow of energy, of water or…. a flow of money in my family

https://en.wikipedia.org/wiki/Sankey_diagram

My Kakeibo app (Japanese budgeting method, Android-based) had been logging transactions for the whole year. But I was lacking a clear way to visualize all of those figures. Time to liberate those numbers…..

Step 1: Kakeibo Export → SQLite Docker Rescue

Kakeibo backups are ZIPs containing kakebo.db, kakebo.db-shm, and kakebo.db-wal Extracted to a folder, I fired up Docker for analysis:

			
docker run -d --name sqlitebrowser --security-opt seccomp:unconfined \
  -e PUID=1000 -e PGID=1000 -e TZ=Europe/Zurich \
  -p 3000:3000 -p 3001:3001 -v ./kakebo:/config \
  --restart unless-stopped lscr.io/linuxserver/sqlitebrowser:latest

The above gave me a web-based UI for sqllite, it was then able to load the exported files into database tables

below a log snippet

			
Browser to localhost:3000, open /config/kakebo.db. Tables: Transactions, Categories. Key columns:  timestamp, amount, categoryId, subcategoryId, notes, type (1=income, 2=expense).

First query for 2025 outflows

			
SELECT c1.name AS source, COALESCE(m.notes, c2.name) AS target, SUM(m.amount1) AS value
FROM Transactions m
JOIN Category c1 ON m.categoryId = c1.id
LEFT JOIN Category c2 ON m.subcategoryId = c2.id
WHERE m.Timestamp >= 1735689600000 AND m.Timestamp < 1893456000000
  AND m.type = 2  -- Expenses only
GROUP BY c1.name, COALESCE(m.notes, c2.name);

		

Exported to input.csv. Raw data ready, but needed transformation for proper Sankey structure:

incomes left → “Budget 2025” center → expenses right.

Sankey data structure is composed by three fields: source (where this transaction is coming from, example: salary) target (where this transaction is flowing, exampe: budget 25) and the amount (1 Million)

Data should be then harmonized like this

Step 2: Vibecoding the Data Pipeline

Here’s where antigravity (my AI assistant) shone. I fed it specs.md: precise 4-step logic to reshape CSV into Sankey format (source,target,value).

Core specs

For income rows: source=note, target=name, value=amount
Aggregate step 1 → source=target (income cats), target="Budget 2025", value=sum
For expense rows: source=name, target=note, value=amount
Aggregate step 3 → source="Budget 2025", target=source (expense cats), value=sum

Antigravity generated gensankey.py using pandas—pure gold:

import pandas as pd

df = pd.read_csv('input.csv')

# Step 1-2: Incomes → Budget
..........
income_agg['source'] = income_agg['target']
income_agg['target'] = 'Budget 2025'

# Step 3-4: Expenses → Budget reverse
...........
expense_agg['source'] = 'Budget 2025'
expense_agg['target'] = expense_agg['source']

# Combine all
sankey_df = pd.concat([income_links, income_agg, expense_links, expense_agg])
sankey_df.to_csv('output.csv', index=False)

Dockerfile updated:

FROM python:3
WORKDIR /usr/src/app
RUN pip install pandas plotly
COPY gensankey.py .
CMD ["python", "gensankey.py"]

Build and run: docker build -t sankey-gen . && docker run -v $(pwd):/usr/src/app sankey-gen. Boom—output.csv ready for visualization.

Step 3: Plotly Sankey – From Ugly to Stunning

Initial Plotly attempts? Disaster. Default layouts mangled flows, nodes overlapped, no custom colors. Antigravity iterated via walkthrough.md:

Key fixes from debugging:

Strict 5-layer positioning: x=[0,0.25,0.5,0.75,1] forced Income Details → Cats → Budget → Expense Cats → Details.
Dynamic heights: Calculated via cumulative sums to prevent overlaps.
Custom node colors: Mapped from YAML dict to node.color.

Final displaydata.py snippet:

pythonimport plotly.graph_objects as go
df = pd.read_csv('output.csv')

# Unique nodes and positions
all_nodes = list(set(df['source'].tolist() + df['target'].tolist()))
node_dict = {node: i for i, node in enumerate(all_nodes)}
colors = {'Budget 2025': '#FF9800', 'Stipendi': '#2196F3', 'Affitto': '#F44336'}  # etc.

fig = go.Figure(go.Sankey(
    .............

Why This Workflow Wins

Zero vendor lock-in: Kakeibo → SQLite → CSV → portable Sankey.
Docker reproducibility: One command rebuilds anywhere
AI-assisted vibecoding: Specs.md → working code in iterations.
Scalable: Add colors.yaml, CLI args for multi-year, categories.

Total time: 4 hours debugging → lifelong template.

Files for you:

Dockerfile
gensankey.py
displaydata.py
Sample input.csv

What’s your cash flow story? Drop in comments.

Vibeops.one: DevOps experiments for side hustlers. Follow for more

VibeOps