ELPX Import Pipeline¶
Reference document for the end-to-end process that reads an .elpx (or legacy
.elp) file and populates a Yjs document with the project structure.
Related documents: export-pipeline.md | validation.md
Architecture Summary¶
The import path is entirely browser-side in the normal workarea flow. The
server never parses ELP/ELPX files during a regular import (AGENTS.md §7.8).
Two paths exist:
- Direct import (primary): user selects
.elp/.elpx→ browser imports in memory viaElpxImporter→ UI refreshes from Y.Doc → saved to server on explicit save/autosave. - Chunked upload (fallback): browser uploads in 15 MB chunks to
POST /api/project/upload-chunk→ server concatenates into a temp file (no parsing) → browser reloads workarea with?import=…→ browser callsDELETE /api/project/cleanup-importto remove the temp file.
The server-side ElpxImporter (src/shared/import/ElpxImporter.ts) is also
used by CLI export commands and the external API when they need to build a Y.Doc
from a file on disk.
Phase 1 — Decompression¶
Responsibility: ElpxImporter.importFromBuffer() (ElpxImporter.ts:160)
const zip = fflate.unzipSync(buffer); // ElpxImporter.ts:173
fflate.unzipSync decompresses the entire ZIP into a
Record<string, Uint8Array> keyed by file path.
After decompression, unwrapSingleTopLevelDirectory() (ElpxImporter.ts:112)
strips a single top-level folder prefix if every entry shares the same prefix.
This handles archives exported from GitHub (e.g. repo-main/content.xml
becomes content.xml).
Nested ELP detection: if neither content.xml nor contentv3.xml exists at
root, the importer searches for a single .elp/.elpx file at root level
(ElpxImporter.ts:183-195). If exactly one is found it is recursively
decompressed. If more than one is found an error is thrown.
Phase 2 — Format Detection¶
Responsibility: ElpxImporter.importFromBuffer() (ElpxImporter.ts:196-266)
The detection logic checks file presence in the following order:
content.xml present? --> modern ODE format
contentv3.xml present? --> legacy Python pickle format
EPUB/content.xml present? --> EPUB3 package (paths stripped of EPUB/ prefix,
then re-evaluated as modern ODE format)
none found? --> Error: "content.xml is missing"
Once the content file is found and decoded, it is parsed with
@xmldom/xmldom's DOMParser (ElpxImporter.ts:241). Parsing errors raise
immediately.
Root element inspection determines the format branch
(ElpxImporter.ts:250-266):
| Root element | Format | Handler |
|---|---|---|
<ode> |
Modern ODE XML | importStructure() |
<instance class="exe.engine.package.Package"> or <dictionary> |
Legacy Python pickle | LegacyXmlParser + importLegacyStructure() |
Phase 3 — Asset Extraction¶
Responsibility: importAssets() called at the start of both
importStructure() and importLegacyStructure() (ElpxImporter.ts:376,
ElpxImporter.ts:547)
Assets live inside the ZIP under content/resources/. The AssetHandler
implementation (assetHandler) determines where they end up:
- Browser: the
BrowserAssetHandlerstores blobs in the Cache API underexe-assets-{uuid}. - Server / CLI: the
FileSystemAssetHandlerwrites files toFILES_DIR/assets/{projectUuid}/.
After extraction, assetMap is populated (Map<string, string>) mapping
original file paths to their new asset UUIDs or internal references. This map
is used later to convert {{context_path}}/content/resources/<file> paths back
to asset://<uuid> internal references inside htmlView and jsonProperties.
Progress: reportProgress('assets', 10 → 50, ...) (ElpxImporter.ts:373, 379`).
Phase 4 — XML Parsing and Page/Block/Component Reconstruction¶
4a — Modern ODE Format: importStructure()¶
Responsibility: ElpxImporter.ts:364
findNavStructures(xmlDoc)— collects all<odeNavStructure>elements.- A
pageMapis built keying each element by its<odePageId>text content. - Root-level pages are identified by empty or missing
<odeParentPageId>, then sorted by<odeNavStructureOrder>. buildFlatPageList()(ElpxImporter.ts:952) performs a depth-first traversal. For each page:- A fresh
newPageIdis generated (generateId('page')). idRemaprecordsoriginalId → newPageIdfor later link rewriting.buildPageData()(ElpxImporter.ts:1012) extracts page name, order, properties, and then iterates<odePagStructure>elements to buildBlockDataobjects viabuildBlockData()(ElpxImporter.ts:1055).- Within each block,
buildComponentData()(ElpxImporter.ts:1095) extracts the iDevice type,htmlViewCDATA content,jsonPropertiesCDATA content (parsed as JSON), and structure properties.
ID remap for collisions: because the same ELPX file might be imported
multiple times into the same Y.Doc (incremental import), all page IDs are
regenerated on every import. If clearExisting = false (incremental import),
getNextAvailableOrder() is called to compute an order offset so new root pages
are appended after existing ones.
Type normalisation in buildComponentData() (ElpxImporter.ts:1101-1109):
the iDevice type is read first from odeIdeviceTypeDirName attribute, then
odeIdeviceTypeName element text. The LEGACY_TYPE_ALIASES map
(interfaces.ts:193) is applied:
| Old type name | Mapped to |
|---|---|
download-package |
download-source-file |
4b — Legacy Python Pickle Format: LegacyXmlParser¶
Responsibility: src/shared/import/LegacyXmlParser.ts
LegacyXmlParser.parse() (LegacyXmlParser.ts:379) preprocesses the XML
(whitespace normalisation, hex escape decoding), then parses with
@xmldom/xmldom.
It finds all <instance class="exe.engine.node.Node"> elements
(findAllNodes()), builds a parent reference map, and reconstructs the page
hierarchy with buildPageHierarchy().
iDevice instances are read from <instance class="...Idevice"> elements inside
each node's idevices list. Type determination follows a three-way branch
(LegacyXmlParser.ts:1235-1313):
Branch 1 — JsIdevice (exe.engine.jsidevice.JsIdevice): the _iDeviceDir
dict entry path suffix maps to a modern type via jsIdeviceTypeMap
(LegacyXmlParser.ts:1242):
Legacy _iDeviceDir suffix |
Modern type |
|---|---|
adivina-activity |
guess |
candado-activity |
padlock |
clasifica-activity |
classify |
completa-activity |
complete |
desafio-activity |
challenge |
descubre-activity |
discover |
flipcards-activity |
flipcards |
identifica-activity |
identify |
listacotejo-activity |
checklist |
mapa-activity |
map |
mathematicaloperations-activity |
mathematicaloperations |
mathproblems-activity |
mathproblems |
ordena-activity |
sort |
quext-activity |
quick-questions |
relaciona-activity |
relate |
rosco-activity |
az-quiz-game |
selecciona-activity |
quick-questions-multiple-choice |
seleccionamedias-activity |
select-media-files |
sopa-activity |
word-search |
trivial-activity |
trivial |
videoquext-activity |
quick-questions-video |
download-package |
download-source-file |
form-activity |
form |
rubrics |
rubric |
pbl-tools |
text |
| (unknown) | text |
Branch 2 — GenericIdevice: the __name__ dict entry is read and passed to
mapGenericIdeviceType().
Branch 3 — all other class names: mapIdeviceType(className)
(LegacyXmlParser.ts:1110) is called, which applies two lookup tables:
Text-based legacy types (all map to text):
FreeTextIdevice, FreeTextfpdIdevice, GenericIdevice, TextIdevice,
ActivityIdevice, TaskIdevice, ObjectivesIdevice, PreknowledgeIdevice,
ReadingActivityIdevice, ReflectionIdevice, ReflectionfpdIdevice,
ReflectionfpdmodifIdevice, TareasIdevice, ListaApartadosIdevice,
ComillasIdevice, NotaInformacionIdevice, NotaIdevice,
CasopracticofpdIdevice, CitasparapensarfpdIdevice, DebesconocerfpdIdevice,
DestacadofpdIdevice, OrientacionestutoriafpdIdevice,
OrientacionesalumnadofpdIdevice, ParasabermasfpdIdevice,
RecomendacionfpdIdevice, WikipediaIdevice, RssIdevice, AppletIdevice,
FileAttachIdevice, AttachmentIdevice
Interactive legacy types (interactiveTypeMap, LegacyXmlParser.ts:1151):
| Legacy class name | Modern type |
|---|---|
TrueFalseIdevice |
trueorfalse |
VerdaderofalsofpdIdevice |
trueorfalse |
MultichoiceIdevice |
form |
EleccionmultiplefpdIdevice |
form |
MultiSelectIdevice |
form |
SeleccionmultiplefpdIdevice |
form |
ClozeIdevice |
complete |
ClozefpdIdevice |
complete |
ClozelangfpdIdevice |
complete |
ImageMagnifierIdevice |
magnifier |
GalleryIdevice |
image-gallery |
CasestudyIdevice |
casestudy |
EjercicioresueltofpdIdevice |
casestudy |
ExternalUrlIdevice |
external-website |
QuizTestIdevice |
quick-questions |
(anything else matching \w+Idevice) |
text |
Phase 5 — Metadata Extraction and Screenshot¶
Responsibility: extractMetadata() (ElpxImporter.ts:881) and
extractScreenshotFromZip() (ElpxImporter.ts:866)
For modern ODE format, metadata is read from the <odeProperties> element via
getMetadataProperty() / getBooleanMetadataProperty(). Properties use the
pp_ prefix convention (pp_title, pp_author, pp_lang, pp_license, etc.)
(ElpxImporter.ts:901-921).
Theme is extracted from <userPreferences> first (key theme), falling back to
odeProperties key pp_style (ElpxImporter.ts:884-896).
setMetadata() (ElpxImporter.ts:927) writes all extracted values into the
Yjs metadata Y.Map.
Screenshot: extractScreenshotFromZip() looks for screenshot.png at the
archive root, base64-encodes it, and stores it as a data:image/png;base64,...
data URL in metadata.screenshot (ElpxImporter.ts:866-876, 423-426).
For legacy format, setLegacyMetadata() (ElpxImporter.ts:793) sets the same
keys. Legacy files always get theme = 'base' and addMathJax = false
(ElpxImporter.ts:800-811).
Phase 6 — Internal Link Remap¶
Responsibility: remapInternalPageLinks() (ElpxImporter.ts:1453)
Because all page IDs are regenerated during import, any href="exe-node:<oldId>"
links inside htmlView content and jsonProperties string values must be
updated to point to the new IDs.
A single regex is built from the union of all old IDs
(ElpxImporter.ts:1457-1458), then applied to both comp.htmlView and all
string values inside comp.properties recursively via remapLinksInObject()
(ElpxImporter.ts:1486). Anchor fragments (#section) are preserved.
The same remapping is applied in the legacy path via
convertLegacyPagesToPageData() (ElpxImporter.ts:686).
Phase 7 — Yjs Transaction¶
Responsibility: ElpxImporter.ts:466 (modern) / ElpxImporter.ts:573
(legacy)
All Y.Doc mutations are wrapped in a single ydoc.transact() call to produce
one combined undo step and avoid incremental observer firing.
Inside the transaction:
1. If clearExisting = true, the navigation Y.Array is cleared
(while (navigation.length > 0) navigation.delete(0)).
2. Metadata is written to the metadata Y.Map (only when clearing).
3. Each PageData in the flat list is converted to a Y.Map via
createPageYMap() (ElpxImporter.ts:1279) and pushed to navigation.
Progress advances from 50% to 80% over this phase.
After the transaction, assetHandler.preloadAllAssets() is called if available
(phase 4 / 80–100%).
Progress Phases Summary¶
| Phase constant | Percent range | Description |
|---|---|---|
decompress |
0 → 10 | ZIP decompression |
assets |
10 → 50 | Asset extraction |
structure |
50 → 80 | Page/block/component reconstruction |
precache |
80 → 100 | Asset preloading |
End-to-End Flow Diagram¶
Buffer (.elpx / .elp)
|
v
fflate.unzipSync() [ElpxImporter.ts:173]
|
v
unwrapSingleTopLevelDirectory()
|
+-- nested ELP? --> unzipSync again
|
v
content.xml? contentv3.xml? EPUB/content.xml?
| | |
v v v
DOMParser LegacyXmlParser strip EPUB/
| | prefix, retry
| |
| +----------+
| |
v v
importStructure() importLegacyStructure()
| |
+----------+----------+
|
v
importAssets() [content/resources/ --> AssetHandler]
|
v
extractMetadata() + extractScreenshotFromZip()
|
v
buildFlatPageList() / convertLegacyPagesToPageData()
-- generateId() for all page/block/component IDs
-- buildComponentData(): read htmlView, jsonProperties
-- convertContextPathToAssetRefs(): {{context_path}} --> asset://
|
v
remapInternalPageLinks() [exe-node:<old> --> exe-node:<new>]
|
v
ydoc.transact()
-- navigation.delete(0) x N (if clearExisting)
-- metadata.set(...)
-- navigation.push([createPageYMap(pageData)])
|
v
assetHandler.preloadAllAssets()
|
v
ElpxImportResult { pages, blocks, components, assets, theme, zipContents }