Summary
Self-hosted supermemory-server v0.0.3 accepts an .xlsx upload and eventually marks the document done, but the stored chunks are raw ZIP/OpenXML bytes (PK\u0003\u0004...xl/worksheets/sheet1.xml...) rather than extracted spreadsheet cell text. Search then returns binary/OpenXML fragments instead of useful spreadsheet content.
Environment
- Release:
server-v0.0.3
- Binary:
supermemory-server-linux-x64
- OS: Linux/WSL
- Upload endpoint:
POST /v3/documents/file
- File MIME:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Reproduction
Upload an Excel workbook:
curl -sS -X POST http://127.0.0.1:6767/v3/documents/file \
-H 'Authorization: Bearer <api-key>' \
-F 'file=@MAG Agent Data.xlsx;type=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' \
-F 'containerTag=debug:xlsx' \
-F 'customId=debug-xlsx-1'
The document is accepted and later reports done:
{
"id": "E63cFP2EuiVgMefznAxADD",
"status": "done",
"title": "MAG Agent Data.xlsx",
"type": "text",
"metadata": {
"mimeType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}
}
But GET /v3/documents/{id}/chunks returns chunks like this:
{
"position": 0,
"content": "PK\u0003\u0004...xl/worksheets/sheet1.xml..."
}
and later chunks include more OpenXML/ZIP internals instead of cell values:
xl/sharedStrings.xml
xl/styles.xml
xl/workbook.xml
[Content_Types].xml
Search also returns those binary/OpenXML chunks instead of spreadsheet rows/cells.
Suspected cause
The self-hosted content detection appears to classify Office MIME types as generic text:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet -> text
Then the text extractor decodes the raw .xlsx ZIP bytes as text and embeds that, rather than using an XLSX/OpenXML extractor to unpack worksheets/shared strings and serialize cells.
Docs ambiguity
The docs are inconsistent:
Supported Content Types says Microsoft Office Excel .xlsx is supported with content type xlsx.
Upload Files / file upload docs list spreadsheets as CSV / Google Sheets, but not XLSX.
Either way, marking an uploaded .xlsx as done while indexing ZIP bytes is misleading. If XLSX is unsupported for file upload/self-hosted, the document should fail with a clear unsupported-content error instead of indexing binary content.
Expected behavior
One of:
.xlsx uploads extract worksheet cell text into searchable chunks, preserving useful sheet/row context; or
.xlsx uploads are rejected/marked failed with an explicit unsupported file type error.
Suggested fix
Add an XLSX/OpenXML extractor for uploaded files, or treat Excel MIME/extension as unsupported rather than text. A regression test could upload a minimal workbook with a unique cell value and assert that GET /v3/documents/{id}/chunks or document search contains that cell value, not PK ZIP bytes.
Summary
Self-hosted
supermemory-serverv0.0.3 accepts an.xlsxupload and eventually marks the documentdone, but the stored chunks are raw ZIP/OpenXML bytes (PK\u0003\u0004...xl/worksheets/sheet1.xml...) rather than extracted spreadsheet cell text. Search then returns binary/OpenXML fragments instead of useful spreadsheet content.Environment
server-v0.0.3supermemory-server-linux-x64POST /v3/documents/fileapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheetReproduction
Upload an Excel workbook:
The document is accepted and later reports
done:{ "id": "E63cFP2EuiVgMefznAxADD", "status": "done", "title": "MAG Agent Data.xlsx", "type": "text", "metadata": { "mimeType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" } }But
GET /v3/documents/{id}/chunksreturns chunks like this:{ "position": 0, "content": "PK\u0003\u0004...xl/worksheets/sheet1.xml..." }and later chunks include more OpenXML/ZIP internals instead of cell values:
Search also returns those binary/OpenXML chunks instead of spreadsheet rows/cells.
Suspected cause
The self-hosted content detection appears to classify Office MIME types as generic
text:Then the text extractor decodes the raw
.xlsxZIP bytes as text and embeds that, rather than using an XLSX/OpenXML extractor to unpack worksheets/shared strings and serialize cells.Docs ambiguity
The docs are inconsistent:
Supported Content Typessays Microsoft Office Excel.xlsxis supported with content typexlsx.Upload Files/ file upload docs list spreadsheets as CSV / Google Sheets, but not XLSX.Either way, marking an uploaded
.xlsxasdonewhile indexing ZIP bytes is misleading. If XLSX is unsupported for file upload/self-hosted, the document should fail with a clear unsupported-content error instead of indexing binary content.Expected behavior
One of:
.xlsxuploads extract worksheet cell text into searchable chunks, preserving useful sheet/row context; or.xlsxuploads are rejected/marked failed with an explicit unsupported file type error.Suggested fix
Add an XLSX/OpenXML extractor for uploaded files, or treat Excel MIME/extension as unsupported rather than text. A regression test could upload a minimal workbook with a unique cell value and assert that
GET /v3/documents/{id}/chunksor document search contains that cell value, notPKZIP bytes.