-
Notifications
You must be signed in to change notification settings - Fork 89
Open
Labels
featureNew feature or requestNew feature or request
Description
问题描述
qq-chat-exporter在导出因内存限制无法处理的超大群聊记录时只能使用jsonL格式,但该格式不受ChatLab原生支持
格式描述如下
= OutputFolder
|- manifest.json
|= chunks
|- chunk_0001.jsonl
|- chunk_0002.jsonl
|- ...
其中群聊元数据和chunk序列的描述存在manifest.json中,chunk_xxxx.jsonl仅存储聊天记录列表
manifest示例如下
{
"metadata": {
"exportTime": "2025-12-27T03:33:25.460Z",
"version": "5.0.0",
"format": "chunked-jsonl"
},
"chatInfo": {
"name": "xxxxxxxxxxxx",
"type": "group",
"selfUid": "u_w-fLQ9bMQHLkZpHNzh_p6A",
"selfUin": "123456",
"selfName": "Ellu"
},
"statistics": {
"totalMessages": 1768832,
"chunkCount": 36
},
"chunked": {
"format": "jsonl",
"chunksDir": "chunks",
"chunkFileExt": ".jsonl",
"maxMessagesPerChunk": 50000,
"maxBytesPerChunk": 52428800,
"chunks": [
{
"file": "chunks/chunk_0001.jsonl",
"messages": 50000,
"bytes": 34676792,
"startTime": 1765504601000,
"endTime": 1766804055000
},
{
"file": "chunks/chunk_0002.jsonl",
"messages": 50000,
"bytes": 31139309,
"startTime": 1764508356000,
"endTime": 1765546217000
},
...
]
}
}chunk.jsonl示例如下,并不是标准json,而是每行一条完整json记录,换行分隔
{"id":"7587998612871331832","seq":"7107700","timestamp":1766718601000,"time":"2025-12-26T03:10:01.000Z","sender":{"uid":"u_xxxxxx_xxxxxxxxx","uin":"xxxxxxxxxx","name":"用户A","nickname":"用户A的昵称","groupCard":"用户A的群名片"},"type":"text","content":{"text":"[表情5][表情5][表情5]","html":"[表情5][表情5][表情5]","elements":[{"type":"face","data":{"id":"5","name":"/流泪"}},{"type":"face","data":{"id":"5","name":"/流泪"}},{"type":"face","data":{"id":"5","name":"/流泪"}}],"resources":[],"mentions":[]},"recalled":false,"system":false}
{"id":"7587998630100730690","seq":"7107701","timestamp":1766718605000,"time":"2025-12-26T03:10:05.000Z","sender":{"uid":"u_xxxxxx_xxxxxxxxx","uin":"xxxxxxxxxx","name":"用户B","nickname":"用户B的昵称","groupCard":"用户B的群名片","remark":"用户B的备注"},"type":"text","content":{"text":"有啊","html":"有啊","elements":[{"type":"text","data":{"text":"有啊"}}],"resources":[],"mentions":[]},"recalled":false,"system":false}
{"id":"7587998635703326372","seq":"7107702","timestamp":1766718605000,"time":"2025-12-26T03:10:05.000Z","sender":{"uid":"u_xxxxxx_xxxxxxxxx","uin":"xxxxxxxxxx","name":"用户C","groupCard":"用户C的群名片"},"type":"text","content":{"text":"maa有烧水","html":"maa有烧水","elements":[{"type":"text","data":{"text":"maa有烧水"}}],"resources":[],"mentions":[]},"recalled":false,"system":false}
...目前我是写了一个python脚本来手动合并jsonl为超大json,提供给有相同问题的人做解决参考,但我感觉这个其实可以内置在ChatLab中,读取manifest后如果发现chunks,就逐个遍历加入数据库中
总之谢谢作者开发这么好玩的一个群聊分析
Metadata
Metadata
Assignees
Labels
featureNew feature or requestNew feature or request