-
Notifications
You must be signed in to change notification settings - Fork 1
Meeting Notes
Explain project details Help set up dev environments Agree on a timeline Discuss deliverables
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/balancer/module.py -- code you want to check out
Important ceph commands:
Check the status of your cluster: ./bin/ceph -s
Check balancer status: ./bin/ceph balancer status
Stop your cluster: ../src/stop.sh (good practice after you're done playing with the cluster)
Restart the mgr daemon: ./bin/init-ceph restart mgr (do if you want to test out a code change you made in the balancer)
Run 'nproc" to see how many processers you have to compile your project, and then specify a slightly lower amount:
ninja vstart -j(ncproc - a bit)
Working on a shared branch (don't worry about this for now; Laura will guide you through it)
Recording: https://drive.google.com/file/d/1BcrUELCXJc0V42xvs0VEhXPSKW4pZpAM/view
Follow these steps for VPN access: https://wiki.sepia.ceph.com/doku.php?id=vpnaccess Relevant tracker ticket: https://tracker.ceph.com/issues/63084
Student action items: Watch "Intro to Ceph" video: https://youtu.be/PmLPbrf-x9g?si=Zhrv9Nb6DR7rQKbd Read this page on Balancer design: https://docs.ceph.com/en/reef/dev/balancer-design/ The goal is to get a better understanding of what Ceph is and how balancing works at a high level, which will help with your poster project.
Student action items: Experiment with a vstart cluster Experiment with the balancer commands Try committing something and pushing it to your local repository (so you can get comfortable sharing a link to your work) Get familiar with the balancer code (linked above)
Laura action items: Set up students with sepia lab computers to help with compile time
Recording: https://drive.google.com/file/d/1PGp841k0BW61DVbeXDhQ6h5lfmFgaVSb/view?usp=sharing
When collaborating in git:
git checkout <your local branch>
Always pull changes from the remote repo first!
git pull <remote repo> <remote branch> --rebase
If conflicts, make sure to resolve:
Open the file (i.e. with vim)
Search for "HEAD"
Change the line so no conflicts; remove "HEAD" and surrounding lines
Save and exit file
Add file (git add <file name>)
git rebase --continue
Finally, push your changes: git push <remote repo> <remote branch>
If no conflicts,
Make commit as usual
git push <remote repo> <remote branch>
Recording: https://drive.google.com/file/d/1FYhJky-LLLIQxuVLu-AgrU-k2eS74YpN/view See this link for the demo from the recording: https://pad.ceph.com/p/unbalanced_cluster_scenario
Important Commands:
show current evaluations
./bin/ceph balancer eval-verbose
set max deviation for pgs
./bin/ceph config get mgr mgr/balancer/upmap_max_deviation
show osd maps
ceph osd dump
turn balancer off
./bin/ceph balancer off
turn balancer on
./bin/ceph balancer on
show numbers assigned to pool
./bin/ceph osd lspools
moves the objects around
./bin/ceph osd pg-upmap-items
restart manager
./bin/init-ceph restart mgr
In-person poster project in a month! (Nov 17) Balancer demo (in video) Establish milestones for poster project
Create an unbalanced cluster scenario:
#Start a cluster with 4 OSDs
OSD=4 ../src/vstart.sh --debug --new -x --localhost --bluestore
Items from osdmap (epoch 61)
- pg_upmap_items 2.5 [0,2]
- pg_upmap_items 3.b [0,2]
- pg_upmap_items 3.10 [0,2]
- pg_upmap_items 3.12 [0,2]
- pg_upmap_items 3.17 [0,2]
- pg_upmap_items 3.18 [0,2]
- pg_upmap_items 3.23 [0,1]
- pg_upmap_items 3.2e [0,1]
- pg_upmap_items 3.30 [3,1]
- pg_upmap_items 3.39 [0,2]
- pg_upmap_items 3.44 [0,1]
- pg_upmap_items 3.52 [0,2]
- pg_upmap_items 3.70 [0,2]
- pg_upmap_items 3.79 [3,2]
BEFORE unbalancing the cluster: 'cephfs.a.meta': {'pgs': {3: 12, 1: 13, 0: 12, 2: 11}
AFTER unbalancing the cluster: 'cephfs.a.meta': {'pgs': {3: 11, 1: 16, 0: 12, 2: 9}
From osdmap epoch 67:
pg_upmap_items 2.1 [2,1]pg_upmap_items 2.5 [0,2]pg_upmap_items 2.7 [3,1]pg_upmap_items 2.e [2,1]pg_upmap_items 3.b [0,2]pg_upmap_items 3.10 [0,2]pg_upmap_items 3.12 [0,2]pg_upmap_items 3.17 [0,2]pg_upmap_items 3.18 [0,2]pg_upmap_items 3.23 [0,1]pg_upmap_items 3.2e [0,1]pg_upmap_items 3.30 [3,1]pg_upmap_items 3.39 [0,2]pg_upmap_items 3.44 [0,1]pg_upmap_items 3.52 [0,2]pg_upmap_items 3.70 [0,2]pg_upmap_items 3.79 [3,2]
AFTER rebalancing the cluster: 'cephfs.a.meta': {'pgs': {3: 12, 1: 13, 0: 12, 2: 11}
From osdmap epoch 69: pg_upmap_items 2.5 [0,2] pg_upmap_items 3.b [0,2] pg_upmap_items 3.10 [0,2] pg_upmap_items 3.12 [0,2] pg_upmap_items 3.17 [0,2] pg_upmap_items 3.18 [0,2] pg_upmap_items 3.23 [0,1] pg_upmap_items 3.2e [0,1] pg_upmap_items 3.30 [3,1] pg_upmap_items 3.39 [0,2] pg_upmap_items 3.44 [0,1] pg_upmap_items 3.52 [0,2] pg_upmap_items 3.70 [0,2] pg_upmap_items 3.79 [3,2]
Recording: https://drive.google.com/file/d/1okMd3D-nF2O5DLpwZwo-YY0h7mEGBBW6/view?usp=sharing
Meeting Notes - 2023-10-31 Recording: https://drive.google.com/file/d/1pWn_Zq74zaiqfXj2-aT2D4hPqlCFY1-d/view?usp=sharing
$ git diff
diff --git a/src/pybind/mgr/balancer/module.py b/src/pybind/mgr/balancer/module.py
index 1c40425115c..1c7294ae228 100644
--- a/src/pybind/mgr/balancer/module.py
+++ b/src/pybind/mgr/balancer/module.py
@@ -345,6 +345,7 @@ class Module(MgrModule):
'optimize_result': self.optimize_result,
'no_optimization_needed': self.no_optimization_needed,
'mode': self.get_module_option('mode'),
+ 'osdmap': self.get_osdmap().dump(),
}
return (0, json.dumps(s, indent=4, sort_keys=True), '')
Recording: https://drive.google.com/file/d/1Tm869Xwt29ivHvEjkTuSFSRU4P3rlDsC/view?usp=sharing
Watch the ceph status: watch ./bin/ceph -s
Recording: https://drive.google.com/file/d/1Cn731yvBs2tJ5rtIPXPf-0D8aFZ2G2Uf/view?usp=sharing
Before
"pg_upmap_items": [ { "pgid": "3.10", "mappings": [ { "from": 0, "to": 2 } ] }, { "pgid": "3.12", "mappings": [ { "from": 0, "to": 2 } ] }, { "pgid": "3.14", "mappings": [ { "from": 3, "to": 1 } ] }, { "pgid": "3.20", "mappings": [ { "from": 0, "to": 1 } ] }, { "pgid": "3.53", "mappings": [ { "from": 0, "to": 2 } ] }, { "pgid": "3.5f", "mappings": [ { "from": 0, "to": 1 } ] }, { "pgid": "3.7d", "mappings": [ { "from": 0, "to": 2 } ] }, { "pgid": "3.7f", "mappings": [ { "from": 0, "to": 2 } ] } ],
After
"pg_upmap_items": [
{
"pgid": "3.10",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.12",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.14",
"mappings": [
{
"from": 3,
"to": 1
}
]
},
{
"pgid": "3.20",
"mappings": [
{
"from": 0,
"to": 1
}
]
},
{
"pgid": "3.53",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.5f",
"mappings": [
{
"from": 0,
"to": 1
}
]
},
{
"pgid": "3.7d",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.7f",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "4.0",
"mappings": [
{
"from": 3,
"to": 2
}
]
},
{
"pgid": "4.c",
"mappings": [
{
"from": 3,
"to": 2
}
]
},
{
"pgid": "4.f",
"mappings": [
{
"from": 1,
"to": 0
}
]
},
{
"pgid": "4.10",
"mappings": [
{
"from": 3,
"to": 0
}
]
},
{
"pgid": "4.18",
"mappings": [
{
"from": 1,
"to": 0
}
]
}
],
Instructions on creating changes in the balancer
-
Create a cluster with 4 OSDs
OSD=4 ../src/vstart.sh --debug --new -x --localhost --bluestore -
Run the balancer
./bin/ceph balancer on -
Check the status to see if it says "Optimized plan created successfully" (this means the balancer has created mappings)
./bin/ceph balancer status3.a OR, check the osdmap to see if any pg_upmap_items entries have been created./bin/ceph osd dump -f json-pretty(the json-pretty part shows the osdmap exacty as it is structured when you're accessing it in the code) -
Grab the pg_upmap_items output from the osdmap:
./bin/ceph osd dump -f json-pretty(copy where it says pg_upmap_items) -
To make the cluster need to rebalance itself, create a pool (this creates more placement groups):
./bin/ceph osd pool create <pool_name> -
Check the status to see it says "Optimized plan created successfully" (this means the balancer has created mappings)
./bin/ceph balancer status -
Grab the pg_upmap_items output from the osdmap:
./bin/ceph osd dump -f json-pretty(copy where it says pg_upmap_items)
Before
"pg_upmap_items": [
{
"pgid": "3.12",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.15",
"mappings": [
{
"from": 0,
"to": 1
}
]
},
{
"pgid": "3.1d",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.39",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.51",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.63",
"mappings": [
{
"from": 0,
"to": 1
}
]
},
{
"pgid": "3.69",
"mappings": [
{
"from": 0,
"to": 2
}
]
}
],
After:
"pg_upmap_items": [
{
"pgid": "3.12",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.15",
"mappings": [
{
"from": 0,
"to": 1
}
]
},
{
"pgid": "3.1d",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.39",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.51",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "3.63",
"mappings": [
{
"from": 0,
"to": 1
}
]
},
{
"pgid": "3.69",
"mappings": [
{
"from": 0,
"to": 2
}
]
},
{
"pgid": "4.b",
"mappings": [
{
"from": 1,
"to": 2
}
]
},
{
"pgid": "4.14",
"mappings": [
{
"from": 3,
"to": 2
}
]
},
{
"pgid": "4.18",
"mappings": [
{
"from": 3,
"to": 0
}
]
},
{
"pgid": "4.1b",
"mappings": [
{
"from": 3,
"to": 0
}
]
},
{
"pgid": "4.1c",
"mappings": [
{
"from": 1,
"to": 0
}
]
}
],
Recording: https://drive.google.com/file/d/1exa416wyn581DyaKQky1PMzN9lsjv4xF/view?usp=sharing
`@CLIReadCommand('balancer status')
def show_status(self) -> Tuple[int, str, str]:
"""
Show balancer status
"""
self.log.debug("osdmap_rcos {}".format(self.get_osdmap().dump().get('epoch', '')))
s = {
'plans': list(self.plans.keys()),
'active': self.active,
'last_optimize_started': self.last_optimize_started,
'last_optimize_duration': self.last_optimize_duration,
'optimize_result': self.optimize_result,
'no_optimization_needed': self.no_optimization_needed,
'mode': self.get_module_option('mode'),
}
return (0, json.dumps(s, indent=4, sort_keys=True), '')`
This is the line where the actual optimization part takes place. Reference this line for deciding when to update pg_upmap_items. https://github.com/ceph/ceph/blob/785c1083fa93f41d0dcbb7f16a651615bbb44771/src/pybind/mgr/balancer/module.py#L694