diff --git a/README.md b/README.md index b9ced4a..c0ee580 100755 --- a/README.md +++ b/README.md @@ -40,6 +40,7 @@ _Our method introduces a novel differentiable mesh extraction framework that ope - ⬛ Implement a simple training viewer using the GraphDeco viewer. - ⬛ Add the mesh-based rendering evaluation scripts in `./milo/eval/mesh_nvs`. +- ✅ Add DTU training and evaluation scripts. - ✅ Add low-res and very-low-res training for light output meshes (under 50MB and under 20MB). - ✅ Add T&T evaluation scripts in `./milo/eval/tnt/`. - ✅ Add Blender add-on (for mesh-based editing and animation) to the repo. @@ -244,6 +245,9 @@ with `--sampling_factor 0.1`, for instance. Please refer to the DepthAnythingV2 repo to download the `vitl` checkpoint required for Depth-Order regularization. Then, move the checkpoint file to `./submodules/Depth-Anything-V2/checkpoints/`. +You can also use the `train_regular_densification.py` script instead of `train.py` to replace the fast densification from Mini-Splatting2 with a more traditional densification strategy for Gaussians, as used in [Gaussian Opacity Fields](https://github.com/autonomousvision/gaussian-opacity-fields/tree/main) and [RaDe-GS](https://baowenz.github.io/radegs/). +By default, this script will use its own config file `--mesh_config default_regular_densification`. + ### Example Commands Basic training for indoor scenes with logging: @@ -271,6 +275,11 @@ Training with depth-order regularization: python train.py -s -m --imp_metric indoor --rasterizer radegs --depth_order --depth_order_config strong --log_interval 200 --data_device cpu ``` +Training with a traditional, slower densification strategy for Gaussians: +```bash +python train_regular_densification.py -s -m --imp_metric indoor --rasterizer radegs --log_interval 200 --data_device cpu +``` + ## 3. Extracting a Mesh after Optimization @@ -329,6 +338,19 @@ python mesh_extract_integration.py \ The mesh will be saved at either `/mesh_integration_sdf.ply` or `/mesh_depth_fusion_sdf.ply` depending on the SDF computation method. +### 3.3. Use regular, non-scalable TSDF + +We also propose a script to extract a mesh using traditional TSDF on a regular voxel grid. This script is heavily inspired from the awesome work [2D Gaussian Splatting](https://github.com/hbb1/2d-gaussian-splatting). This mesh extraction process does not scale to unbounded real scenes with background geometry. +```bash +python mesh_extract_regular_tsdf.py \ + -s \ + -m \ + --rasterizer radegs \ + --mesh_res 1024 +``` + +The mesh will be saved at `/mesh_regular_tsdf_res.ply`. A cleaned version of the mesh will be saved at `/mesh_regular_tsdf_res_post.ply`, following 2DGS's postprocessing. + ## 4. Using our differentiable Gaussians-to-Mesh pipeline in your own 3DGS project @@ -562,6 +584,8 @@ If you get artifacts in the rendering, you can try to play with the various foll Click here to see content.
+### Tanks and Temples + For evaluation, please start by downloading [our COLMAP runs for the Tanks and Temples dataset](https://drive.google.com/drive/folders/1Bf7DM2DFtQe4J63bEFLceEycNf4qTcqm?usp=sharing), and make sure to move all COLMAP scene directories (Barn, Caterpillar, _etc._) inside the same directory. Then, please download ground truth point cloud, camera poses, alignments and cropfiles from [Tanks and Temples dataset](https://www.tanksandtemples.org/download/). The ground truth dataset should be organized as: @@ -637,6 +661,36 @@ python render.py \ python metrics.py -m # Compute error metrics on renderings ``` +### DTU + +MILo is designed for maximum scalability to allow for the reconstruction of full scenes, including background elements. We optimized our method and hyperparameters to strike a balance between performance and scalability. + +However, we also evaluate MILo on small object-centric scenes from the DTU dataset, to verify that our mesh-in-the-loop regularization does not hurt performance in highly controlled scenarios. + +For these smaller scenes, the aggressive densification strategy from Mini-Splatting2 is unnecessary. Instead, we use the traditional progressive densification strategy proposed in [GOF](https://github.com/autonomousvision/gaussian-opacity-fields/tree/main) and [RaDe-GS](https://baowenz.github.io/radegs/), which is better suited for highly controlled scenarios. + +Similarly, since DTU scans focus on small objects of interest without background reconstruction, we employ a regular grid for mesh extraction after training (similar to GOF and RaDe-GS) rather than our scalable extraction method. + +We use the preprocessed DTU dataset from [2D GS](https://github.com/hbb1/2d-gaussian-splatting) for training. Please refer to the corresponding repo for downloading instructions. +Evaluation scripts are adapted from [GOF](https://github.com/autonomousvision/gaussian-opacity-fields/tree/main) and [RaDe-GS](https://baowenz.github.io/radegs/). + +Please run the following commands to evaluate MILo on a single DTU scan: +```bash +# Training with regular densification +python train_regular_densification.py -s -m -r 2 --rasterizer radegs --imp_metric indoor --mesh_config default_dtu --decoupled_appearance --log_interval 200 + +# Mesh extraction +python ./eval/dtu/mesh_extract_dtu.py -s -m -r 2 --rasterizer radegs + +# Evaluation +python ./eval/dtu/evaluate_dtu_mesh.py -s -m -r 2 --DTU --scan_id +``` + +You can also run the following scripts for evaluating on the full DTU dataset: +```bash +python scripts/evaluate_dtu.py --data_dir --gt_dir --rasterizer radegs --log_interval 200 +``` + ## 8. Acknowledgements diff --git a/milo/configs/mesh/default_dtu.yaml b/milo/configs/mesh/default_dtu.yaml new file mode 100644 index 0000000..d7e2222 --- /dev/null +++ b/milo/configs/mesh/default_dtu.yaml @@ -0,0 +1,62 @@ +# Regularization schedule +start_iter: 20_001 +mesh_update_interval: 1 +stop_iter: 30_000 + +# Depth loss weight +use_depth_loss: true +depth_weight: 0.01 # 0.05 +depth_ratio: 1.0 # 0.6 +mesh_depth_loss_type: "log" # "log" + +# Normal loss weight +use_normal_loss: true # true +normal_weight: 0.01 # 0.05 +use_depth_normal: true # true + +# Delaunay computation +delaunay_reset_interval: 500 # 500 +n_max_points_in_delaunay: 5_400_000 # -1 for no limit +delaunay_sampling_method: "surface" # "random", "surface" or "surface+opacity" +filter_large_edges: true # true or false? +collapse_large_edges: false # true or false? + +# Rasterization +use_scalable_renderer: false + +# SDF computation +sdf_reset_interval: 500 # 500 +sdf_default_isosurface: 0.5 # 0.5 +transform_sdf_to_linear_space: false # SHOULD BE FALSE +min_occupancy_value: 0.0000000001 # 0.0000000001 + +# > For Integrate +use_ema: true # true +alpha_ema: 0.4 # 0.4 + +# > For TSDF +trunc_margin: 0.002 # 0.002 + +# > For learnable +occupancy_mode: "occupancy_shift" # "occupancy_shift" or "density_shift" +# > Gaussian centers regularization +enforce_occupied_centers: true # true +occupied_centers_weight: 0.001 # 0.005 +# > Occupancy labels loss +use_occupancy_labels_loss: true # true or false? +reset_occupancy_labels_every: 200 # 200 +occupancy_labels_loss_weight: 0.001 # 0.005 +# > SDF reset +fix_set_of_learnable_sdfs: true # false or true? +learnable_sdf_reset_mode: "ema" # "none" or "ema" +learnable_sdf_reset_stop_iter: 25_001 +learnable_sdf_reset_alpha_ema: 0.4 # 0.4 + +method_to_reset_sdf: "depth_fusion" # "integration" or "depth_fusion" +n_binary_steps_to_reset_sdf: 0 # Use 0 to disable binary search +sdf_reset_linearization_n_steps: 20 +sdf_reset_linearization_enforce_std: 0.5 # 0.5 +depth_fusion_reset_tolerance: 0.1 + +# Foreground Culling +radius_culling: -1.0 # -1.0 for no culling \ No newline at end of file diff --git a/milo/eval/dtu/eval.py b/milo/eval/dtu/eval.py new file mode 100644 index 0000000..5892f76 --- /dev/null +++ b/milo/eval/dtu/eval.py @@ -0,0 +1,167 @@ +# adapted from https://github.com/jzhangbs/DTUeval-python +import numpy as np +import open3d as o3d +import sklearn.neighbors as skln +from tqdm import tqdm +from scipy.io import loadmat +import multiprocessing as mp +import argparse + +def sample_single_tri(input_): + n1, n2, v1, v2, tri_vert = input_ + c = np.mgrid[:n1+1, :n2+1] + c += 0.5 + c[0] /= max(n1, 1e-7) + c[1] /= max(n2, 1e-7) + c = np.transpose(c, (1,2,0)) + k = c[c.sum(axis=-1) < 1] # m2 + q = v1 * k[:,:1] + v2 * k[:,1:] + tri_vert + return q + +def write_vis_pcd(file, points, colors): + pcd = o3d.geometry.PointCloud() + pcd.points = o3d.utility.Vector3dVector(points) + pcd.colors = o3d.utility.Vector3dVector(colors) + o3d.io.write_point_cloud(file, pcd) + +if __name__ == '__main__': + mp.freeze_support() + + parser = argparse.ArgumentParser() + parser.add_argument('--data', type=str, default='data_in.ply') + parser.add_argument('--scan', type=int, default=1) + parser.add_argument('--mode', type=str, default='mesh', choices=['mesh', 'pcd']) + parser.add_argument('--dataset_dir', type=str, default='.') + parser.add_argument('--vis_out_dir', type=str, default='.') + parser.add_argument('--downsample_density', type=float, default=0.2) + parser.add_argument('--patch_size', type=float, default=60) + parser.add_argument('--max_dist', type=float, default=20) + parser.add_argument('--visualize_threshold', type=float, default=10) + args = parser.parse_args() + + thresh = args.downsample_density + if args.mode == 'mesh': + pbar = tqdm(total=9) + pbar.set_description('read data mesh') + data_mesh = o3d.io.read_triangle_mesh(args.data) + + vertices = np.asarray(data_mesh.vertices) + triangles = np.asarray(data_mesh.triangles) + tri_vert = vertices[triangles] + + pbar.update(1) + pbar.set_description('sample pcd from mesh') + v1 = tri_vert[:,1] - tri_vert[:,0] + v2 = tri_vert[:,2] - tri_vert[:,0] + l1 = np.linalg.norm(v1, axis=-1, keepdims=True) + l2 = np.linalg.norm(v2, axis=-1, keepdims=True) + area2 = np.linalg.norm(np.cross(v1, v2), axis=-1, keepdims=True) + non_zero_area = (area2 > 0)[:,0] + l1, l2, area2, v1, v2, tri_vert = [ + arr[non_zero_area] for arr in [l1, l2, area2, v1, v2, tri_vert] + ] + thr = thresh * np.sqrt(l1 * l2 / area2) + n1 = np.floor(l1 / thr) + n2 = np.floor(l2 / thr) + + with mp.Pool() as mp_pool: + new_pts = mp_pool.map(sample_single_tri, ((n1[i,0], n2[i,0], v1[i:i+1], v2[i:i+1], tri_vert[i:i+1,0]) for i in range(len(n1))), chunksize=1024) + + new_pts = np.concatenate(new_pts, axis=0) + data_pcd = np.concatenate([vertices, new_pts], axis=0) + + elif args.mode == 'pcd': + pbar = tqdm(total=8) + pbar.set_description('read data pcd') + data_pcd_o3d = o3d.io.read_point_cloud(args.data) + data_pcd = np.asarray(data_pcd_o3d.points) + + pbar.update(1) + pbar.set_description('random shuffle pcd index') + shuffle_rng = np.random.default_rng() + shuffle_rng.shuffle(data_pcd, axis=0) + + pbar.update(1) + pbar.set_description('downsample pcd') + nn_engine = skln.NearestNeighbors(n_neighbors=1, radius=thresh, algorithm='kd_tree', n_jobs=-1) + nn_engine.fit(data_pcd) + rnn_idxs = nn_engine.radius_neighbors(data_pcd, radius=thresh, return_distance=False) + mask = np.ones(data_pcd.shape[0], dtype=np.bool_) + for curr, idxs in enumerate(rnn_idxs): + if mask[curr]: + mask[idxs] = 0 + mask[curr] = 1 + data_down = data_pcd[mask] + + pbar.update(1) + pbar.set_description('masking data pcd') + obs_mask_file = loadmat(f'{args.dataset_dir}/ObsMask/ObsMask{args.scan}_10.mat') + ObsMask, BB, Res = [obs_mask_file[attr] for attr in ['ObsMask', 'BB', 'Res']] + BB = BB.astype(np.float32) + + patch = args.patch_size + inbound = ((data_down >= BB[:1]-patch) & (data_down < BB[1:]+patch*2)).sum(axis=-1) ==3 + data_in = data_down[inbound] + + data_grid = np.around((data_in - BB[:1]) / Res).astype(np.int32) + grid_inbound = ((data_grid >= 0) & (data_grid < np.expand_dims(ObsMask.shape, 0))).sum(axis=-1) ==3 + data_grid_in = data_grid[grid_inbound] + in_obs = ObsMask[data_grid_in[:,0], data_grid_in[:,1], data_grid_in[:,2]].astype(np.bool_) + data_in_obs = data_in[grid_inbound][in_obs] + + pbar.update(1) + pbar.set_description('read STL pcd') + stl_pcd = o3d.io.read_point_cloud(f'{args.dataset_dir}/Points/stl/stl{args.scan:03}_total.ply') + stl = np.asarray(stl_pcd.points) + + pbar.update(1) + pbar.set_description('compute data2stl') + nn_engine.fit(stl) + dist_d2s, idx_d2s = nn_engine.kneighbors(data_in_obs, n_neighbors=1, return_distance=True) + max_dist = args.max_dist + mean_d2s = dist_d2s[dist_d2s < max_dist].mean() + + pbar.update(1) + pbar.set_description('compute stl2data') + ground_plane = loadmat(f'{args.dataset_dir}/ObsMask/Plane{args.scan}.mat')['P'] + + stl_hom = np.concatenate([stl, np.ones_like(stl[:,:1])], -1) + above = (ground_plane.reshape((1,4)) * stl_hom).sum(-1) > 0 + stl_above = stl[above] + + nn_engine.fit(data_in) + dist_s2d, idx_s2d = nn_engine.kneighbors(stl_above, n_neighbors=1, return_distance=True) + mean_s2d = dist_s2d[dist_s2d < max_dist].mean() + + pbar.update(1) + pbar.set_description('visualize error') + vis_dist = args.visualize_threshold + R = np.array([[1,0,0]], dtype=np.float64) + G = np.array([[0,1,0]], dtype=np.float64) + B = np.array([[0,0,1]], dtype=np.float64) + W = np.array([[1,1,1]], dtype=np.float64) + data_color = np.tile(B, (data_down.shape[0], 1)) + data_alpha = dist_d2s.clip(max=vis_dist) / vis_dist + data_color[ np.where(inbound)[0][grid_inbound][in_obs] ] = R * data_alpha + W * (1-data_alpha) + data_color[ np.where(inbound)[0][grid_inbound][in_obs][dist_d2s[:,0] >= max_dist] ] = G + write_vis_pcd(f'{args.vis_out_dir}/vis_{args.scan:03}_d2s.ply', data_down, data_color) + stl_color = np.tile(B, (stl.shape[0], 1)) + stl_alpha = dist_s2d.clip(max=vis_dist) / vis_dist + stl_color[ np.where(above)[0] ] = R * stl_alpha + W * (1-stl_alpha) + stl_color[ np.where(above)[0][dist_s2d[:,0] >= max_dist] ] = G + write_vis_pcd(f'{args.vis_out_dir}/vis_{args.scan:03}_s2d.ply', stl, stl_color) + + pbar.update(1) + pbar.set_description('done') + pbar.close() + over_all = (mean_d2s + mean_s2d) / 2 + print(mean_d2s, mean_s2d, over_all) + + import json + with open(f'{args.vis_out_dir}/results.json', 'w') as fp: + json.dump({ + 'mean_d2s': mean_d2s, + 'mean_s2d': mean_s2d, + 'overall': over_all, + }, fp, indent=True) + diff --git a/milo/eval/dtu/evaluate_dtu_mesh.py b/milo/eval/dtu/evaluate_dtu_mesh.py new file mode 100644 index 0000000..304e278 --- /dev/null +++ b/milo/eval/dtu/evaluate_dtu_mesh.py @@ -0,0 +1,223 @@ +# The evalution code is fork from GOF https://github.com/autonomousvision/gaussian-opacity-fields/blob/main/evaluate_dtu_mesh.py +import os +import sys +BASE_DIR = os.path.dirname( # milo/ + os.path.dirname( # milo/eval/ + os.path.dirname(os.path.abspath(__file__)) # milo/eval/dtu/ + ) +) +sys.path.append(BASE_DIR) +import numpy as np +import torch +import torch.nn.functional as F +from scene import Scene +import cv2 +import random +from os import path +from argparse import ArgumentParser +from arguments import ModelParams, PipelineParams, get_combined_args +from gaussian_renderer import GaussianModel +import trimesh +from skimage.morphology import binary_dilation, disk + +def best_fit_transform(A, B): + ''' + Calculates the least-squares best-fit transform that maps corresponding points A to B in m spatial dimensions + Input: + A: Nxm numpy array of corresponding points + B: Nxm numpy array of corresponding points + Returns: + T: (m+1)x(m+1) homogeneous transformation matrix that maps A on to B + R: mxm rotation matrix + t: mx1 translation vector + ''' + + assert A.shape == B.shape + + # get number of dimensions + m = A.shape[1] + + # translate points to their centroids + centroid_A = np.mean(A, axis=0) + centroid_B = np.mean(B, axis=0) + AA = A - centroid_A + BB = B - centroid_B + + # rotation matrix + H = np.dot(AA.T, BB) + U, S, Vt = np.linalg.svd(H) + R = np.dot(Vt.T, U.T) + + # special reflection case + if np.linalg.det(R) < 0: + Vt[m-1,:] *= -1 + R = np.dot(Vt.T, U.T) + + # translation + t = centroid_B.T - np.dot(R,centroid_A.T) + + # homogeneous transformation + T = np.identity(m+1) + T[:m, :m] = R + T[:m, m] = t + + return T, R, t + + +def load_dtu_camera(DTU): + # Load projection matrix from file. + camtoworlds = [] + for i in range(1, 64+1): + fname = path.join(DTU, f'Calibration/cal18/pos_{i:03d}.txt') + + projection = np.loadtxt(fname, dtype=np.float32) + + # Decompose projection matrix into pose and camera matrix. + camera_mat, rot_mat, t = cv2.decomposeProjectionMatrix(projection)[:3] + camera_mat = camera_mat / camera_mat[2, 2] + pose = np.eye(4, dtype=np.float32) + pose[:3, :3] = rot_mat.transpose() + pose[:3, 3] = (t[:3] / t[3])[:, 0] + pose = pose[:3] + camtoworlds.append(pose) + return camtoworlds + +import math +def fov2focal(fov, pixels): + return pixels / (2 * math.tan(fov / 2)) + +def cull_mesh(cameras, mesh): + + vertices = mesh.vertices + + # project and filter + vertices = torch.from_numpy(vertices).cuda() + vertices = torch.cat((vertices, torch.ones_like(vertices[:, :1])), dim=-1) + vertices = vertices.permute(1, 0) + vertices = vertices.float() + + sampled_masks = [] + + for camera in cameras: + c2w = (camera.world_view_transform.T).inverse() + w2c = torch.inverse(c2w).cuda() + mask = camera.gt_mask + fx = fov2focal(camera.FoVx, camera.image_width) + fy = fov2focal(camera.FoVy, camera.image_height) + intrinsic = torch.eye(4) + intrinsic[0, 0] = fx + intrinsic[1, 1] = fy + intrinsic[0, 2] = camera.image_width / 2. + intrinsic[1, 2] = camera.image_height / 2. + intrinsic = intrinsic.cuda() + + W, H = camera.image_width, camera.image_height + + with torch.no_grad(): + # transform and project + cam_points = intrinsic @ w2c @ vertices + pix_coords = cam_points[:2, :] / (cam_points[2, :].unsqueeze(0) + 1e-6) + pix_coords = pix_coords.permute(1, 0) + pix_coords[..., 0] /= W - 1 + pix_coords[..., 1] /= H - 1 + pix_coords = (pix_coords - 0.5) * 2 + valid = ((pix_coords > -1. ) & (pix_coords < 1.)).all(dim=-1).float() + + # dialate mask similar to unisurf + maski = mask[0, :, :].cpu().numpy().astype(np.float32) / 256. + # maski = torch.from_numpy(binary_dilation(maski, disk(14))).float()[None, None].cuda() + maski = torch.from_numpy(binary_dilation(maski, disk(6))).float()[None, None].cuda() + + sampled_mask = F.grid_sample(maski, pix_coords[None, None], mode='nearest', padding_mode='zeros', align_corners=True)[0, -1, 0] + + sampled_mask = sampled_mask + (1. - valid) + + sampled_masks.append(sampled_mask) + + sampled_masks = torch.stack(sampled_masks, -1) + + # filter + mask = (sampled_masks > 0.).all(dim=-1).cpu().numpy() + face_mask = mask[mesh.faces].all(axis=1) + + mesh.update_vertices(mask) + mesh.update_faces(face_mask) + return mesh + + +def evaluate_mesh(dataset : ModelParams, iteration : int, DTU_PATH : str, ply_name : str): + + gaussians = GaussianModel(dataset.sh_degree) + scene = Scene(dataset, gaussians, load_iteration=iteration, shuffle=False) + + train_cameras = scene.getTrainCameras() + test_cameras = scene.getTestCameras() + dtu_cameras = load_dtu_camera(args.DTU) + + gt_points = np.array([cam[:, 3] for cam in dtu_cameras]) + + points = [] + for cam in train_cameras: + c2w = (cam.world_view_transform.T).inverse() + points.append(c2w[:3, 3].cpu().numpy()) + points = np.array(points) + gt_points = gt_points[:points.shape[0]] + + # align the scale of two point clouds + scale_points = np.linalg.norm(points - points.mean(axis=0), axis=1).mean() + scale_gt_points = np.linalg.norm(gt_points - gt_points.mean(axis=0), axis=1).mean() + + points = points * scale_gt_points / scale_points + _, r, t = best_fit_transform(points, gt_points) + + + # load mesh + mesh_file = os.path.join(dataset.model_path, ply_name) + print("load") + mesh = trimesh.load(mesh_file) + + print("cull") + mesh = cull_mesh(train_cameras, mesh) + + culled_mesh_file = os.path.join(dataset.model_path, "recon_culled.ply") + mesh.export(culled_mesh_file) + + # align the mesh + mesh.vertices = mesh.vertices * scale_gt_points / scale_points + mesh.vertices = mesh.vertices @ r.T + t + + aligned_mesh_file = os.path.join(dataset.model_path, "recon_aligned.ply") + mesh.export(aligned_mesh_file) + + # evaluate + out_dir = os.path.join(dataset.model_path, "vis") + os.makedirs(out_dir,exist_ok=True) + # scan = dataset.model_path.split("/")[-1][4:] + scan = int(dataset.source_path.split("/")[-1][4:]) + + cmd = f"python ./eval/dtu/eval.py --data {aligned_mesh_file} --scan {scan} --mode mesh --dataset_dir {DTU_PATH} --vis_out_dir {out_dir}" + print(cmd) + os.system(cmd) + +if __name__ == "__main__": + # Set up command line argument parser + parser = ArgumentParser(description="Testing script parameters") + model = ModelParams(parser, sentinel=True) + pipeline = PipelineParams(parser) + parser.add_argument("--iteration", default=30_000, type=int) + parser.add_argument('--scan_id', type=str, help='scan id of the input mesh') + parser.add_argument('--DTU', type=str, default='dtu_eval/Offical_DTU_Dataset', help='path to the GT DTU point clouds') + parser.add_argument('--ply_name', type=str, default='recon_tsdf.ply', help='name of the input mesh') + + parser.add_argument("--quiet", action="store_true") + + args = get_combined_args(parser) + print("evaluating " + args.model_path) + print("ply_name: ", args.ply_name) + + random.seed(0) + np.random.seed(0) + torch.manual_seed(0) + torch.cuda.set_device(torch.device("cuda:0")) + + evaluate_mesh(model.extract(args), args.iteration, args.DTU, args.ply_name) \ No newline at end of file diff --git a/milo/eval/dtu/mesh_extract_dtu.py b/milo/eval/dtu/mesh_extract_dtu.py new file mode 100644 index 0000000..c621525 --- /dev/null +++ b/milo/eval/dtu/mesh_extract_dtu.py @@ -0,0 +1,137 @@ +import os +import sys +BASE_DIR = os.path.dirname( # milo/ + os.path.dirname( # milo/eval/ + os.path.dirname(os.path.abspath(__file__)) # milo/eval/dtu/ + ) +) +sys.path.append(BASE_DIR) +import torch +from scene import GaussianModel +from argparse import ArgumentParser +from arguments import ModelParams, PipelineParams +import math +import numpy as np +import open3d as o3d +import open3d.core as o3c +from scene.dataset_readers import sceneLoadTypeCallbacks +from utils.camera_utils import cameraList_from_camInfos + + +def load_camera(args): + if os.path.exists(os.path.join(args.source_path, "sparse")): + scene_info = sceneLoadTypeCallbacks["Colmap"](args.source_path, args.images, args.eval) + elif os.path.exists(os.path.join(args.source_path, "transforms_train.json")): + print("Found transforms_train.json file, assuming Blender data set!") + scene_info = sceneLoadTypeCallbacks["Blender"](args.source_path, args.white_background, args.eval) + return cameraList_from_camInfos(scene_info.train_cameras, 1.0, args) + + + + +def extract_mesh(dataset, pipe, checkpoint_iterations=None, occupancy_mode="occupancy_shift"): + gaussians = GaussianModel(dataset.sh_degree) + output_path = os.path.join(dataset.model_path,"point_cloud") + iteration = 0 + if checkpoint_iterations is None: + for folder_name in os.listdir(output_path): + iteration= max(iteration,int(folder_name.split('_')[1])) + else: + iteration = checkpoint_iterations + output_path = os.path.join(output_path,"iteration_"+str(iteration),"point_cloud.ply") + print(f"[INFO] Extracting mesh from model {output_path}") + + gaussians.load_ply(output_path) + try: + gaussians.set_occupancy_mode(occupancy_mode) + except: + print(f'[ WARNING ] Failed to set occupancy mode to {occupancy_mode}') + print(f'Loaded gaussians from {output_path}') + + kernel_size = dataset.kernel_size + + bg_color = [1, 1, 1] + background = torch.tensor(bg_color, dtype=torch.float32, device="cuda") + viewpoint_cam_list = load_camera(dataset) + + depth_list = [] + color_list = [] + alpha_thres = 0.5 + for viewpoint_cam in viewpoint_cam_list: + # Rendering offscreen from that camera + render_pkg = render(viewpoint_cam, gaussians, pipe, background, kernel_size) + rendered_img = torch.clamp(render_pkg["render"], min=0, max=1.0).cpu().numpy().transpose(1,2,0) + color_list.append(np.ascontiguousarray(rendered_img)) + depth = render_pkg["median_depth"].clone() + if viewpoint_cam.gt_mask is not None: + depth[(viewpoint_cam.gt_mask < 0.5)] = 0 + depth[render_pkg["mask"] DIST_MIN) & (dist < DIST_MAX) + Xc = Xc[mask] + C = C[mask] + dist = dist[mask] + + # 最多取 POINTS_PER_FRAME 个最近点 + if len(Xc) > POINTS_PER_FRAME: + idx = np.argsort(dist)[:POINTS_PER_FRAME] + Xc = Xc[idx] + C = C[idx] + + # 转世界坐标 + Xc_t = Xc.T + Xw = (Rwc @ Xc_t + twc.reshape(3, 1)).T + + all_pts_world.append(Xw) + all_cols_world.append(C) + + # 合并所有帧点 + all_pts_world = np.vstack(all_pts_world) + all_cols_world = np.vstack(all_cols_world) + + print(f"最终点云数量:{len(all_pts_world)}") + + ############################################################### + # 写入 COLMAP 格式 + ############################################################### + write_cameras_bin_txt( + f"{OUT_SPARSE}/cameras.bin", + f"{OUT_SPARSE}/cameras.txt" + ) + + write_images_bin_txt( + f"{OUT_SPARSE}/images.bin", + f"{OUT_SPARSE}/images.txt", + new_rgb_paths, + poses_cw + ) + + write_points3D_bin_txt( + f"{OUT_SPARSE}/points3D.bin", + f"{OUT_SPARSE}/points3D.txt", + all_pts_world, + all_cols_world + ) + + print("\n输出完成!COLMAP + Gaussian Splatting 数据集已生成到 dataset/") + + +if __name__ == "__main__": + main() diff --git a/milo/utils/custom2colmap_gui.py b/milo/utils/custom2colmap_gui.py new file mode 100644 index 0000000..683bf7a --- /dev/null +++ b/milo/utils/custom2colmap_gui.py @@ -0,0 +1,280 @@ +import os +import struct +import numpy as np +from glob import glob +from scipy.spatial.transform import Rotation as R +import open3d as o3d +import cv2 + +# ========================================= +# CONFIG 可根据需要修改 +# ========================================= +FX = 617.832096 +FY = 621.743364 +CX = 427.447679 +CY = 233.692923 +WIDTH, HEIGHT = 848, 480 + +END_ID = 1000 + +INTERVAL = 1 # 稀疏化间隔 +DT_THRESH = 0.05 # 平移阈值 (m) +DROT_THRESH = np.deg2rad(0) # 旋转阈值(度→弧度) + +MAX_DIST = 4.0 # 从 ply 中选择点的最大距离 (m) +VOXEL_SIZE = 0.01 # 最终全局点云体素下采样 (m) + +OUT_DIR = "dataset" +SPARSE_DIR = "dataset/sparse/0" + +def numeric_sort(files): + return sorted(files, key=lambda x: int(os.path.splitext(os.path.basename(x))[0])) +# ========================================= +# 读取PLY(ascii 或 binary) +# ========================================= +def load_ply(filename): + pcd = o3d.io.read_point_cloud(filename) + pts = np.asarray(pcd.points) + cols = np.asarray(pcd.colors) * 255 + return pts, cols + + +# ========================================= +# ORB-SLAM Tcw → Twc Twc->Tcw +# ========================================= +def Tcw_to_Twc(Rcw, tcw): + Rwc = Rcw.T + twc = -Rwc @ tcw + return Rwc, twc + +def Twc_to_Tcw(Rwc, twc): + Rcw = Rwc.T + tcw = -Rcw @ twc + return Rcw, tcw + + + +# pose difference +def pose_difference(T1, T2): + R1, t1 = T1 + R2, t2 = T2 + dR = R2 @ R1.T + drot = np.linalg.norm(R.from_matrix(dR).as_rotvec()) + dt = np.linalg.norm(t2 - t1) + return dt, drot + + +# ========================================= +# 写 cameras.bin / cameras.txt +# ========================================= +def write_cameras(path_bin, path_txt): + with open(path_bin, "wb") as f: + f.write(struct.pack(" END_ID: + continue + if i % INTERVAL == 0: + selected.append(i) + last_pose = Twc_list[i] + continue + + dt, drot = pose_difference(last_pose, Twc_list[i]) + if dt > DT_THRESH or drot > DROT_THRESH: + selected.append(i) + last_pose = Twc_list[i] + + selected = sorted(list(set(selected))) + print("选中帧:", len(selected), "/", len(rgb_files)) + + # 输出图像 + os.makedirs(f"{OUT_DIR}/images", exist_ok=True) + new_rgb_paths = [] + + for new_id, old_id in enumerate(selected): + img = cv2.imread(rgb_files[old_id]) + out = f"{OUT_DIR}/images/{new_id:06d}.png" + cv2.imwrite(out, img) + new_rgb_paths.append(out) + + selected_Tcw = [Tcw_list[i] for i in selected] + selected_Twc = [Twc_list[i] for i in selected] + + # ============================= + # 构建 Points3D(含距离过滤) + # ============================= + all_pts = [] + all_cols = [] + + for i, old_id in enumerate(selected): + Rcw, tcw = Tcw_list[old_id] + Rwc, twc = Twc_list[old_id] + + pts, cols = load_ply(ply_files[old_id]) + + # 距离过滤 + dist = np.linalg.norm(pts, axis=1) + mask = dist < MAX_DIST + pts = pts[mask] + cols = cols[mask] + + # 相机 → 世界 + pts_w = (Rwc @ pts.T + twc.reshape(3, 1)).T + + all_pts.append(pts_w) + all_cols.append(cols) + + all_pts = np.vstack(all_pts) + all_cols = np.vstack(all_cols) + + print("点云合并后数量:", len(all_pts)) + + # ============================= + # 体素化滤波 (减少点数量) + # ============================= + pcd = o3d.geometry.PointCloud() + pcd.points = o3d.utility.Vector3dVector(all_pts) + pcd.colors = o3d.utility.Vector3dVector(all_cols / 255) + + pcd = pcd.voxel_down_sample(VOXEL_SIZE) + + all_pts = np.asarray(pcd.points) + all_cols = (np.asarray(pcd.colors) * 255).astype(np.uint8) + + print("体素滤波后点数:", len(all_pts)) + + # ============================= + # 输出 COLMAP 格式 + # ============================= + os.makedirs(SPARSE_DIR, exist_ok=True) + + write_cameras(f"{SPARSE_DIR}/cameras.bin", f"{SPARSE_DIR}/cameras.txt") + write_images(f"{SPARSE_DIR}/images.bin", f"{SPARSE_DIR}/images.txt", + new_rgb_paths, selected_Tcw) + + write_points(all_pts, all_cols, + f"{SPARSE_DIR}/points3D.bin", + f"{SPARSE_DIR}/points3D.txt", + f"{SPARSE_DIR}/points3D.ply") + + # ============================= + # 可视化(相机轨迹 + 全局点云) + # ============================= + visualize(selected_Twc, all_pts) + + print("全部完成!") + + +if __name__ == "__main__": + main()