Doc2Xls Studio is a web application that converts Word document tables into structured Excel files based on a user-defined template. It extracts tables, form fields, and metadata from DOCX files and maps them to target Excel column structures.
For user documentation, see English | German (Deutsch).
- Frontend: React 19 + TypeScript + Vite
- Backend: Express.js + TypeScript
- Styling: Tailwind CSS v4 + shadcn/ui components
- Document Processing: mammoth.js (DOCX parsing) + xlsx (Excel generation)
├── client/ # Frontend React application
│ ├── src/
│ │ ├── components/ # UI components
│ │ │ ├── ui/ # shadcn/ui base components
│ │ │ ├── FileUploader.tsx # DOCX file upload component
│ │ │ ├── TemplateUploader.tsx # Excel template upload component
│ │ │ ├── ColumnMapper.tsx # Column mapping interface
│ │ │ └── DataPreview.tsx # Data preview and export component
│ │ ├── pages/
│ │ │ └── Converter.tsx # Main converter page (orchestrates workflow)
│ │ ├── hooks/ # Custom React hooks
│ │ ├── lib/ # Utility functions
│ │ ├── App.tsx # Root component with routing
│ │ └── index.css # Global styles and CSS variables
│ └── index.html # HTML entry point
├── server/ # Backend Express server
│ ├── index.ts # Server entry point
│ ├── routes.ts # API routes and document processing logic
│ ├── storage.ts # Storage interface (for future DB integration)
│ ├── static.ts # Static file serving
│ └── vite.ts # Vite dev server integration
├── shared/ # Shared types and schemas
│ └── schema.ts # Drizzle ORM schemas (for future use)
└── package.json # Dependencies and scripts
- User uploads an Excel template (.xlsx/.xls)
- System extracts column headers from the template
- Supports complex templates with multiple sheets
- Auto-detects the header row based on cell density
- User uploads one or more Word documents (.docx/.doc)
- Batch mode: Upload multiple files at once for combined processing
- System parses documents using mammoth.js
- Extracts multiple table types:
- Data tables: Traditional row/column format
- Form tables: Label-value pair format (e.g., "Test Case Type: Funktionstest")
- Document metadata: Filename, summary text
- Groups columns by source table with prefixes like
[Table Name] Column - In batch mode, displays file list with add/remove controls and processing status
- Visual interface to map source columns to target template columns
- Auto-mapping based on column name similarity
- Supports mapping from any source table to any target column
- Shows transformed data preview before download
- Batch mode: Displays combined data from all documents with source file column
- Generates single merged Excel file with data from all processed documents
- Preserves column order from template
Parses an Excel template and extracts column headers.
Request: multipart/form-data with file field
Response:
{
"columns": ["Column1", "Column2", ...],
"sheetName": "Sheet Name",
"headerRowIndex": 5,
"fileName": "template.xlsx",
"allSheets": ["Sheet1", "Sheet2", ...]
}Parses a single Word document and extracts all tables and metadata.
Request: multipart/form-data with file field
Response:
{
"tables": [
{
"name": "Document Info",
"columns": ["Dateiname", "Kurze Zusammenfassung"],
"rows": [{ "Dateiname": "...", "Kurze Zusammenfassung": "..." }]
},
{
"name": "Testfall-Erstellung",
"columns": ["Test Case Type", "Version/Release", ...],
"rows": [{ ... }]
}
],
"fileName": "document.docx",
"tablesFound": 5,
"freeTexts": ["Summary text..."],
"headerInfo": "Document title"
}Batch parses multiple Word documents (up to 50 files).
Request: multipart/form-data with files field (array)
Response:
{
"documents": [
{
"fileName": "doc1.docx",
"tables": [...],
"tablesFound": 5
},
{
"fileName": "doc2.docx",
"tables": [...],
"tablesFound": 4
}
],
"totalFiles": 2,
"processedFiles": 2
}Generates an Excel file from mapped data.
Request:
{
"targetColumns": ["Target Col 1", "Target Col 2", ...],
"mapping": { "Target Col 1": "[Source Table] Source Col", ... },
"sourceData": [{ "[Source Table] Source Col": "value", ... }]
}Response: Binary Excel file (.xlsx)
The parser handles two table formats:
-
Form-Style Tables: Tables with label-value pairs
- Detected by cells ending with ":"
- Extracts each label as a column name, next cell as value
- Example:
| Test Case Type: | Funktionstest |
-
Data Tables: Traditional tabular data
- First non-title row treated as headers
- Subsequent rows are data records
- Example: Standard tables with header row
- Filename: Extracted from document header or paragraph containing ".docx"
- Summary Text: Paragraphs following "Kurze Zusammenfassung" header
- Free Text: Non-table content between sections
mammoth.js converts DOCX to HTML, then custom regex-based parsing extracts:
<table>elements with all nested<tr>,<td>,<th><p>paragraphs for free text- Cell content cleaned of HTML tags, normalized whitespace
interface State {
step: 'template' | 'upload' | 'map' | 'preview';
file: File | null;
isProcessing: boolean;
parsedData: ParsedData | null; // Source document data
templateData: TemplateData | null; // Target template structure
mapping: Record<string, string>; // target -> source column mapping
}handleTemplateSelect→ Parse template → StoretemplateDatahandleFileSelect→ Parse document → StoreparsedDataColumnMapper→ User maps columns → UpdatemappinghandleDownload→ Generate Excel withgetCombinedData()+mapping
The getCombinedData() function merges data from all tables:
- Main table (most rows) provides base records
- Other tables' first row data is added to each record
- All columns prefixed with
[Table Name]for unique identification
Defined in client/src/index.css:
- Colors: Primary blue (
221 83% 53%), slate backgrounds - Typography: Inter (body), Space Grotesk (headings)
- Spacing: 0.25rem base unit
- Border radius: 0.5rem default
- shadcn/ui components for consistent UI
- Tailwind utilities for layout and spacing
- Framer Motion for animations and transitions
mammoth: DOCX to HTML conversionxlsx: Excel file reading/writingmulter: File upload handlingreact-dropzone: Drag-and-drop file uploadsframer-motion: Animations@radix-ui/*: Accessible UI primitives
vite: Build tool and dev servertsx: TypeScript executiontailwindcss: CSS frameworktypescript: Type checking
# Development
npm run dev
# Production build
npm run build
npm startThe application runs on port 5000 (configurable via PORT env variable).
- Database Integration: Store templates and conversion history using Drizzle ORM
- Template Management: Save/load custom Excel templates
- Batch Processing: Convert multiple documents at once
- Advanced Mapping: Rule-based transformations and data validation
- Authentication: User accounts for saved preferences