updates architecture drawing & run steps

This commit is contained in:
2026-01-04 17:13:00 +01:00
parent 78729ebd1e
commit 1928ab73dd

143
README.md
View File

@@ -1,55 +1,85 @@
# schleppe High Availability project
Goal is to have better webapp uptime for than AWS.
Defines code which describes a HA & cached scalable way of serving web applications.
## Architecture
```
+-----------------------------------------------------------+
| REGION: EU |
| Domain: schleppe.cloud |
| |
| +-------------- Floating IP ---------+ |
| | | |
| +----+---------+ +----+---------+ |
| | HAProxy #1 | | HAProxy #2 | |
| +----+---------+ +----+---------+ |
| \__________ active / standby _______/ |
| | |
| v |
| +------+--------+ |
| | haproxy (a) | |
| +----+----+--+--+ |
| | | A |
| direct | | | via cache |
| | v | |
| | +-+--+---------+ |
| | | varnish (n) | |
| | +------+-------+ |
| | | HIT / MISS |
| | | |
| +---------+ |
| | |
| v |
| +---------+-------+ |
| | web server (n) | |
| +-----------------+ |
| |
+-----------------------------------------------------------+
| +-----DNS (Cloudflare)-----+ |
| | round-robin A records | |
| +--------------------------+ |
| |
| ┌─────────────────┴─────────────────┐ |
| │ │ |
| A: 193.72.45.133 B: 45.23.78.120 |
| (SITE A) (SITE B..N) |
+------------+-----------------------------------+----------+
└────────────────┐
v v
+----------------------------------------------------+ +--------------------+
| Site A (REGION: EU) | | Site B..N |
| | | (Copy of site A) |
| +----------- Floating IP (keepalived/etcd) ---+ | +--------------------+
| | | |
| | +-------------+ +-------------+ | |
| | | HAProxy-1 | | HAProxy-2 | | |
| | | (ACTIVE) | | (STANDBY) | | |
| | +------+------+ +-------+-----+ | |
| | └─── active / standby ──┘ | |
| | | |
| +----------------------+----------------------+ |
| |
| (SSL termination + readiness checks) |
| |
| v |
| +-------+---------+ |
| | haproxy (LB) | |
| +-----+----+--+---+ |
| │ │ A |
| direct │ │ │ via cache |
| │ v │ |
| │ +-+--+---------+ |
| │ | varnish (n) | |
| │ +------+-------+ |
| │ │ HIT / MISS |
| │ │ |
| └─────────┤ |
| │ |
| v |
| +---------+--------+ |
| | web servers (n) | |
| +------------------+ |
| |
+----------------------------------------------------+
```
Where varnish & web server is 2-n number of instances. Currently two regions, EU & US.
Where varnish & web server are minimum of 2 instances. Currently three regions, EU, US & schleppe on-prem.
There is always only a single haproxy (with fallback) routing traffic per site, but multiple varnish & webservers all connected together w/ shared routing tables.
## Configure environment
Ensure that the following environment variables exist. It is smart to disable history in your terminal before pasting any API keys, (`unset HISTFILE` for bash, or `fish --private` for fish).
- `CLOUDFLARE_API_TOKEN`: update DNS for given zones
- `HCLOUD_TOKEN`: permissions to create cloud resources
## infrastructure
Configured cloud resources in hezner with Pulumi.
```bash
cd hetzner-pulumi
# first time, init pulumi stack (name optional)
pulumi stack init kevinmidboe/hetzner
# required configuration values
pulumi config set sshPublicKey "$(cat ~/.ssh/id_ed25519.pub)"
pulumi config set --secret hcloud:token $HETZNER_API_KEY
# up infrastructure
pulumi up
@@ -63,9 +93,11 @@ pulumi up
Ansible is used to provision software and environments for software needed and services.
get ansible configuration values from pulumi output:
Get ansible configuration values from pulumi output:
```bash
cd ansible
# generate inventory (manualy update inventory file)
./scripts/generate-inventory.sh | pbcopy
@@ -74,7 +106,7 @@ get ansible configuration values from pulumi output:
./scripts/update-config_webserver-ips.sh
```
run playbooks:
Run playbooks:
```bash
# install, configure & start haproxy
@@ -88,14 +120,43 @@ ansible-playbook plays/docker.yml -i hetzner.ini -l web
ansible-playbook plays/web.yml -i hetzner.ini -l web
```
# Manual steps
### ansible play: haproxy
- [x] floating ip DNS registration
- [x] extract variables from pulumi stack outputs
- [ ] add all cloudflare api keys
- `mkdir /root/.ssh/certbot/cloudflare_k9e-no.ini`
- [ ] generate certs for appropriate domains
- `certbot certonly --agree-tos --dns-cloudflare --dns-cloudflare-credentials /root/.secrets/certbot/cloudflare_k9e-no.ini -d k9e.no`
- [ ] combine generated certs into a cert for traefik
- `cat /etc/letsencrypt/live/k9e.no/fullchain.pem /etc/letsencrypt/live/k9e.no/privkey.pem > /etc/haproxy/certs/ssl-k9e.no.pem`
roles:
- haproxy
- certbot
The vars `haproxy_varnish_ip` & `haproxy_traefik_ip` defines IPs iterated over when copying template to hosts. These respectively point to available varnish cache servers & webservers.
> `certbot_cloudflare_domains` runs certbot to make sure valid certs exists for instances serving traffic attached to DNS.
### ansible play: varnish
roles:
- varnish
installs and configures varnish. Iterates over all `haproxy_traefik_ip` when copying varnish.vcl template. Make sure to update these IP's with the current webservers we want to point varnish to. These should match the same webservers haproxy might directly point at if not proxying through varnish.
### ansible play: docker + web
## manual steps / TODO
Still issuing certs manually:
```bash
cd /root/.secrets/certbot
touch cloudflare_k9e-no.ini; touch cloudflare_planetposen-no.ini; touch cloudflare_schleppe-cloud.ini
certbot certonly --dns-cloudflare --dns-cloudflare-credentials /root/.secrets/certbot/cloudflare_schleppe-cloud.ini -d whoami.schleppe.cloud --agree-tos && \
certbot certonly --dns-cloudflare --dns-cloudflare-credentials /root/.secrets/certbot/cloudflare_k9e-no.ini -d k9e.no --agree-tos && \
certbot certonly --dns-cloudflare --dns-cloudflare-credentials /root/.secrets/certbot/cloudflare_planetposen-no.ini -d planetposen.no --agree-tos
cat /etc/letsencrypt/live/k9e.no/fullchain.pem /etc/letsencrypt/live/k9e.no/privkey.pem > /etc/haproxy/certs/ssl-k9e.no.pem && \
cat /etc/letsencrypt/live/planetposen.no/fullchain.pem /etc/letsencrypt/live/planetposen.no/privkey.pem > /etc/haproxy/certs/ssl-planetposen.no.pem && \
cat /etc/letsencrypt/live/whoami.schleppe.cloud/fullchain.pem /etc/letsencrypt/live/whoami.schleppe.cloud/privkey.pem > /etc/haproxy/certs/ssl-whoami.schleppe.cloud.pem
systemctl restart haproxy.service
```
Need to have a shared storage between all the instances, e.g. `etcd`.