Skip to content

Using Memory_FS to Build a File-Based Representation of the GDPR Standard

by Dinis Cruz and ChatGPT Deep Research, 2025/07/02

Download LinkedIn Post Back to Cyber Security

Abstract

This white paper presents a method for converting the General Data Protection Regulation (GDPR) text into a structured, file-based representation using the Memory_FS framework. Targeted at technical professionals in compliance, data privacy, and regulatory data management, it outlines how to parse the official GDPR document into a hierarchical set of files (down to each article paragraph or bullet point) stored in Memory_FS. Each regulatory element is captured using Memory_FS’s three-file pattern – content, config, and metadata – providing a robust foundation for further processing. We detail the motivation for modeling GDPR as a graph and how Memory_FS serves as the foundational layer for this transformation. The methodology covers parsing strategies for the GDPR text, organizing content in Memory_FS, and ensuring fidelity through round-trip conversions (to Markdown and PDF). Implementation guidance is given with code examples illustrating how to ingest the GDPR document into Memory_FS and subsequently export or utilize the data. We also discuss how the resulting Memory_FS output can feed into graph databases (such as MGraph-DB) to generate knowledge graphs of the law, including storing intermediate representations (ZIP or SQLite) for cloud deployment. The paper highlights the advantages of Memory_FS’s pluggable storage backends (in-memory for development, ZIP/SQLite for versioned releases, S3 for scalable deployment) and introduces the concept of “G³” (Graphs-of-Graphs-of-Graphs), where Memory_FS archives are nested to enable recursive metadata structures.

Introduction

Modern compliance and data privacy initiatives demand structured regulatory data for robust analysis and automation. The GDPR, with its 99 Articles and 173 Recitals, is a prime example of a complex legal text that benefits from structured representation. For technical teams, representing such regulations in a machine-readable, hierarchical form enables easier querying, traceability, and integration with tools like knowledge graphs. Graph-based models for laws allow connecting related provisions, mapping obligations to controls, and visualizing relationships (e.g. which Recitals relate to which Articles) – essential for compliance management. Graph technology is well-suited for scenarios where relationships are as important as the data itself, and GDPR’s provisions are highly interlinked (articles reference each other, recitals contextualize articles, etc.).

Memory_FS provides an ideal foundation for this task. Memory_FS is a type-safe in-memory filesystem abstraction that offers a unified interface to store and retrieve files across multiple backends. It was designed with features that align with the needs of representing structured documents: strong typing and validation, a flexible folder/file model, and pluggable storage (memory, local disk, databases, cloud). Critically, it implements a “three-file pattern” wherein each logical file is backed by three physical files: a content file (holding the data), a config file (with file metadata and configuration), and a metadata file (with auto-generated metadata like hashes and timestamps). This pattern ensures each piece of information is richly described and versionable, which is advantageous for storing legal text with traceability.

This paper details the motivation and process for representing the GDPR as a graph of files using Memory_FS. We first discuss the methodology for deconstructing the GDPR document into a hierarchical filesystem reflecting its structure (Chapters, Articles, paragraphs). Then we dive into the implementation: using Memory_FS to create and manage these files, preserving the legal document’s structure down to each paragraph or bullet point. We demonstrate how to perform round-trip conversions – ingesting the official text into Memory_FS, exporting it to a convenient format like Markdown, and even regenerating a PDF – to verify fidelity and enable collaboration. Next, we outline how the Memory_FS output can be used to build a graph database representation (for example, loading into MGraph-DB, a memory-first graph database) and how intermediate formats (such as a ZIP archive or SQLite database) can facilitate cloud-based deployments of the data. We highlight the use of different Memory_FS storage providers in various stages: in-memory for development/testing, and persisted (on disk or cloud) for production scaling. Finally, we introduce an advanced concept dubbed G³ (Graphs of Graphs of Graphs) – a strategy for nesting Memory_FS instances within each other to create recursive graph structures. This concept shows how one can manage multiple layers of structured data (e.g., a corpus of standards) in a unified framework, which points to future applications beyond a single regulation.

Methodology

Representing GDPR as a Graph of Files: The first step is to define how the GDPR document’s structure will map to a filesystem hierarchy. The GDPR’s text is organized into Chapters, Sections (in some cases), Articles, and within articles, paragraphs and sub-points. We preserve this hierarchy by treating each structural unit as a folder or file in Memory_FS. The motivation is to capture the law in atomic units (individual provisions) that can be independently referenced, yet maintain the context of the larger hierarchy (which article a paragraph belongs to, etc.). By modeling each unit as a file node in a graph (the file system is inherently a tree structure), we lay the groundwork for building a full knowledge graph of the regulation.

Designing the Hierarchical Structure: We choose to represent top-level groupings (like Chapters or Sections) as directories, and individual provisions (Articles and paragraphs) as files. For example, “Chapter 1 – General Provisions” can be a folder containing files for each article in that chapter. Each Article might be a subfolder containing files for each paragraph if fine-grained access to paragraphs is needed. Alternatively, each Article could be a single file if the text is short, but GDPR articles often have multiple numbered paragraphs and lettered sub-points, so representing each as a separate file makes it easier to link or annotate them individually. This approach aligns with the note that most GDPR Articles have numbered paragraphs and sub-paragraphs, which we model as distinct file nodes. The hierarchy ensures that context isn’t lost: a paragraph file’s path encodes its location (e.g., GDPR/Chapter_1/Article_5/Article5-Paragraph2.md could represent Article 5, paragraph 2). We preserve article and paragraph numbering in file names or IDs to maintain readability and ordering.

Parsing the Official Document: To build this hierarchy, we parse the official GDPR text (available as PDF via EUR-Lex, or websites like gdpr-info.eu). Parsing involves extracting the structure: identifying chapter titles, article headings, and paragraph breaks. This can be done with text processing: for example, detecting lines that match the pattern “Article X” (with X as a number) as article boundaries, and numerical or alphabetical labels as paragraph or sub-point boundaries. Tools like PDF text extractors or even regex on provided text can assist in splitting the document. The parsing logic must capture the nested structure correctly (for instance, an article may have paragraphs labeled 1., 2., etc., and within those, points (a), (b), etc.). We also capture Recitals (the prefatory statements numbered 1–173) as part of the structure – these could be stored in a separate top-level directory “Recitals” with one file per Recital, since each Recital is typically a distinct paragraph of text.

Using Memory_FS Three-File Pattern: Each identified text unit (e.g., a particular paragraph of an article) will be stored following Memory_FS’s three-file pattern. Concretely, for each unit we create:

  • a Content file containing the exact text of that unit (e.g., the paragraph’s text in Markdown or plain text format),
  • a corresponding Config file (.config extension) holding metadata like the file’s unique ID, type, and perhaps structured identifiers (article number, paragraph number),
  • and a Metadata file (.metadata extension) that Memory_FS automatically generates/updates with properties like content length, hash, and timestamps for version control.

The choice of file format for content is important. We use Markdown (.md) for textual content because it can preserve formatting (lists, headings, emphasis) and is easily convertible to other formats (HTML, PDF). Memory_FS natively supports a Markdown file type via Memory_FS__File__Type__Markdown, which ensures content is handled as UTF-8 text and can be serialized/deserialized without losing structure. Alternatively, plain text could be used (Memory_FS__File__Type__Text), but Markdown allows us to keep bullet lists or references in format.

Preserving Structure and Metadata: We ensure each file’s config contains a logical identifier that encodes its position (for example, an ID like "Art5-para2" or a composite key). Memory_FS’s type-safe IDs (Safe_Id) can be used to enforce valid naming. The config can also include a human-readable title or reference (e.g., “Article 5 Paragraph 2”) if needed for documentation. The hierarchical placement (via directories or path prefixes) will already convey the structure; for instance, the path GDPR/Chapter_2/Article_5/Art5-para2.md inherently tells us the context. We leverage Memory_FS path strategies to handle these directories. Specifically, we can use a custom path strategy or explicitly set the file_paths in the file’s config to include the directory path. For example, specifying file_paths=["GDPR/Chapter_2/Article_5"] for a file config will place that file (and its config/metadata) under the desired folder structure. Memory_FS’s Path__Handler__Custom allows user-defined hierarchical paths for files, which we use to mirror the legal document outline in the filesystem.

Round-Trip Conversion Approach: A critical part of the methodology is ensuring that our file-based model is faithful to the source and can be converted back into a human-readable document. We plan a round-trip:

  1. Ingestion – the original GDPR PDF/Word is parsed into Memory_FS (as described above),
  2. Export – the Memory_FS content is then programmatically exported to a compiled format like a Markdown document or PDF. The export entails iterating through the structured files in the correct order and reconstructing the text (in Markdown we can also include the hierarchical headings).
  3. Verification – the generated Markdown or PDF is compared to the original to verify no text was lost or altered. This round-trip ensures our parsing logic captured everything and the Memory_FS representation is complete. It also demonstrates the utility of the structured data: once in Memory_FS, we can produce updated outputs (for instance, a company-specific GDPR handbook with annotations) easily by traversing the file graph.

By outlining these steps before implementation, we ensure that the approach is robust. The end result of the methodology is a plan to have each GDPR provision as a node in a file graph, stored with content and rich metadata. This forms the basis for a knowledge graph and further automated processing, as described in later sections.

Methodology Summary

  • Target Structure: Use Memory_FS to create a directory tree for GDPR (Chapters → Articles → Paragraphs/Points).
  • Parsing Strategy: Programmatically detect headings and numbering in the official text to split into discrete units while preserving hierarchy.
  • Memory_FS Mapping: For each unit, create a content file (text/Markdown) plus config and metadata files (three-file pattern) to store the unit’s data and metadata.
  • Path Convention: Utilize custom path strategies to place files in directories reflecting the GDPR sections. Use descriptive file IDs (e.g., “Art5-Para2”) for clarity.
  • Round-Trip Fidelity: After populating Memory_FS, concatenate or traverse the files to regenerate the full document (e.g., as Markdown, then PDF) and verify it matches the source. This ensures the file-based representation is lossless.

With this methodology established, we proceed to implement the process, providing concrete examples and code snippets to illustrate how Memory_FS is used to achieve the above.

Implementation

Parsing the GDPR Document into Memory_FS

Document Ingestion: We begin by acquiring the GDPR text. For reproducibility, one can use the official PDF from EUR-Lex or an existing text source (some websites provide the full GDPR text in HTML). Using a PDF parsing library (or manual copy if needed), we extract the text while preserving indicators of structure. Pseudo-code for parsing might look like:

text = extract_text("GDPR.pdf")  # use a PDF text extraction utility
sections = split_into_sections(text)  # custom logic to split by Chapter/Article

The function split_into_sections would implement rules to detect lines like “Chapter 1”, “Article 5”, etc., and break the text accordingly. For example, whenever a line matches the regex ^Article\s+(\d+), that indicates a new Article. The following lines up to the next article heading belong to that article. Within an article, paragraphs are often numbered (“1. …”, “2. …”) – we can further split those. Many articles contain lists labeled (a), (b), … which we treat as sub-paragraphs. We handle those by splitting paragraphs by patterns like “(a) ” if needed. Each identified unit (be it a full Article or an individual paragraph) will be represented as a node in Memory_FS.

Creating Memory_FS Files: With the text units identified, we instantiate a Memory_FS object and create file entries for each unit. Below is an example code snippet illustrating how to create Memory_FS entries for a single paragraph. In practice, this would be inside loops iterating over chapters/articles/paragraphs:

from memory_fs.Memory_FS import Memory_FS
from memory_fs.file_types.Memory_FS__File__Type__Markdown import Memory_FS__File__Type__Markdown
from memory_fs.schema.Schema__Memory_FS__File__Config import Schema__Memory_FS__File__Config

# Initialize the in-memory filesystem
memory_fs = Memory_FS()

# Example: Save Article 5, Paragraph 2 text into Memory_FS
paragraph_text = "2. The controller shall ... (text of Article 5(2))"
file_config = Schema__Memory_FS__File__Config(
    file_id="Art5-Para2",  # a unique ID for the file
    file_type=Memory_FS__File__Type__Markdown(),
    file_paths=["GDPR/Chapter_2/Article_5"]  # hierarchical path within the FS
)
memory_fs.save().save(paragraph_text, file_config)

In this snippet, Memory_FS__File__Type__Markdown() is used to specify that the content is Markdown text. The file_paths parameter is a list of directory paths within the Memory_FS storage where this file should be saved – here we indicate the file lives under GDPR/Chapter_2/Article_5. Memory_FS will ensure that the content, config, and metadata files for Art5-Para2 are all stored in that directory, with appropriate naming (e.g., Art5-Para2.md, Art5-Para2.md.config, Art5-Para2.md.metadata). The first call to memory_fs.save().save(data, config) will automatically generate the trio of files: a config JSON (immutable after creation, containing file identity and settings), the content file with the paragraph text, and a metadata file with properties like content length and a hash. Each subsequent save (if content is edited) would update the content and metadata files, but not the config (since config is fixed per file identity).

We repeat this process for all content units. For instance, we create a file for each Recital (with file_paths=["GDPR/Recitals"] and IDs like “Recital-1”), for each Article title or full article text (if storing as a whole), and for each paragraph within articles. In a structured approach, one might create a directory for each Article and store each paragraph as a separate file within it, as shown. Alternatively, to simplify, one could store each article’s entire text as a single Markdown file (with internal headings or list for paragraphs); however, we choose granular files to enable fine-grained graph nodes and easier cross-referencing of specific clauses.

Configuring File Metadata: The Schema__Memory_FS__File__Config we instantiate typically includes the file’s unique ID, type, and path as shown. Additional metadata (like human-friendly names, cross-references) can be embedded in various ways:

  • The config file (a JSON) might include fields for article number, paragraph number, etc., if the schema allows extension. By default, the config captures the file_id, file_type, and possibly default storage info. We primarily rely on naming conventions and hierarchy to convey identity.
  • The metadata file (automatically maintained) logs size, hash, and timestamps. This is useful for verifying integrity (e.g., ensuring no content corruption – vital if multiple people or tools modify the data). For instance, each save operation updates a cryptographic hash in the metadata, so one can detect if the content differs from an expected version.

Memory_FS ensures strong consistency across these three files. Using its API, we can check existence or retrieve info easily. For example, after creation, memory_fs.load().load_data(file_config) would load and return the paragraph text we saved, and memory_fs.data().exists(file_config) would indicate the file’s presence. These high-level APIs wrap around the core File_FS operations (create, exists, delete, etc.), simplifying file handling. This means once our parsing loop populates Memory_FS, we have an in-memory repository of GDPR content that we can query or manipulate with simple calls.

Bulk Insertion and Automation: In practice, creating hundreds of files (99 articles plus recitals and subdivisions) is feasible through loops. We can optimize by reusing Schema__Memory_FS__File__Config objects or caching frequently used components (like the Markdown file_type object). Memory_FS is efficient in-memory; creating many small files in memory is not expensive. If needed, the process can be batched or transactional. For example, one could accumulate all file configs and content first, then save them in one pass. Memory_FS could also potentially use transactions to ensure that either all files are saved or none (this feature might tie into future transaction support). For initial implementation, straightforward iterative creation is sufficient.

Round-Trip Conversion (Export to Markdown/PDF)

Once the GDPR content is stored in Memory_FS, we validate the representation by reconstructing the document from it. This round-trip conversion demonstrates that our file-based model retains all information and ordering.

Export to Markdown: We traverse the Memory_FS structure in the logical order of the document. Because we used directory names and file IDs that sort in the correct sequence (e.g., “Chapter_1” comes before “Chapter_2”; within a chapter directory, “Article_5” comes before “Article_6”; within an article, “Para1” before “Para2”, etc.), we can rely on lexicographic order of file paths to assemble the content. Memory_FS provides a method files__paths() that can list all stored file paths. We can filter/sort that list, or we can navigate directory by directory using a known structure.

For example, to reconstruct Chapter 2’s text:

chapter2_paths = [p for p in memory_fs.storage.files__paths() if p.startswith("GDPR/Chapter_2/")]
for path in sorted(chapter2_paths):
    content = memory_fs.storage.file__str(path)  # get file content as string
    append_to_markdown(chapter2_markdown, content)

Here, memory_fs.storage.file__str(path) reads the content file bytes and decodes to string. We then append to an output, adding appropriate Markdown headings for chapter and article titles if those are stored separately. (If we saved article titles as separate files or as part of the first paragraph file, we handle that accordingly). In the simplest approach, if each article’s first paragraph file actually contains the article title as part of content (e.g., in Markdown "# Article 5 – Title" followed by the text), then the export is straightforward concatenation in order.

After assembling all chapters into a Markdown text, we have a full Markdown version of GDPR. This can be checked for consistency. Each article and paragraph should appear exactly as in the original, just now each piece came from our structured store.

Conversion to PDF: To complete the round-trip, we convert the Markdown to PDF. This can be done with existing tools (for instance, using pandoc or a Markdown-to-PDF library). The specifics are outside the scope of Memory_FS, but an example command might be: pandoc GDPR.md -o GDPR_Reconstructed.pdf. The expectation is that GDPR_Reconstructed.pdf should match the official PDF (barring minor formatting differences). This demonstrates that our Memory_FS representation did not lose any content and that it could serve as a reliable source of truth.

Benefits of Round-Trip: This flow not only validates correctness but also opens up useful workflows:

  • Editable Markdown: The exported Markdown could be given to subject-matter experts to annotate or redline changes (for example, if GDPR is amended, they could edit the Markdown). Those changes could then be parsed back in – since our Memory_FS structure is already in Markdown, one could even directly edit the Memory_FS content in memory or reload from an updated Markdown file. This two-way editability means our file-based representation can integrate into document editing pipelines.
  • Automated Document Generation: With content in Memory_FS, generating different formats (HTML for a web viewer, PDF for printing, etc.) is straightforward. It also allows generating partial documents, like just a single chapter’s PDF, by extracting that subset of files.
  • Version Control: Because Memory_FS content files are text (Markdown), they can be version-controlled (e.g., via git) at a very granular level. Each paragraph file can have its own history. Moreover, the Memory_FS metadata (hashes, timestamps) provides an additional layer of version tracking within the system. This is particularly important if using a cloud backend or if multiple systems generate updated content – you can detect divergence by comparing hashes.

In summary, the implementation of round-trip conversion confirms that Memory_FS acts as a lossless intermediate representation for the GDPR document, enabling both programmatic transformations and human-readable outputs as needed.

Generating a Graph Database from Memory_FS Output

With GDPR content structured in Memory_FS, we can now leverage it to create a graph representation – turning the file hierarchy into nodes and edges in a graph database. The impetus for this is to enable complex queries like “show all references to consent across the regulation” or “link Recital 50 to the Articles it explains”. While Memory_FS itself organizes data in a tree, a graph database can store richer relations (including cross-links that are not strictly hierarchical).

Using MGraph-DB (Memory-First Graph Database): The MGraph-DB (also referred to as MGraph-AI in some contexts) is a graph database designed to work seamlessly with JSON and file-based storage, prioritizing in-memory operation for speed. We can use the GDPR Memory_FS output as input data for such a graph DB. There are two primary ways to do this:

  1. Direct Loading: If the graph database can directly consume the Memory_FS structure (for example, by reading the JSON config files or a consolidated export), we write a loader that goes through each Memory_FS file and creates graph nodes and edges. Each paragraph file becomes a node (with properties like “text” = content, “article” = number, etc.). We also create nodes for higher-level constructs like Articles or Chapters – these could either be explicit nodes or implicitly inferred from grouping nodes.
  2. Intermediate Format: Alternatively, we first export the entire Memory_FS as an intermediate format, such as a SQLite database or a single ZIP file, and then have the graph database ingest that. Memory_FS’s storage abstraction can help here; for instance, one could use the Storage_FS__Sqlite backend (once implemented) to save all files into a SQLite DB. This would yield a file (say gdpr.db) containing tables/entries for all content, config, metadata. A graph tool could then read from this database to create nodes and edges. Similarly, using a ZIP archive (via a Storage_FS__Zip if available, or by zipping the Memory_FS’s content from memory) could package all JSON and MD files into one archive that a graph ingestion script can parse.

Defining Graph Nodes and Edges: In our graph model, we define:

  • Node types: e.g., Article, Paragraph, Recital. Each node carries attributes from Memory_FS:

  • Paragraph nodes contain the text content (and maybe an ID like “Art5(2)” for reference).

  • Article nodes might carry the title text and number.
  • Recital nodes carry recital text and number.
  • Edges: We add hierarchical edges such as Article_contains_Paragraph (linking an Article node to each of its Paragraph nodes), and Chapter_contains_Article (linking chapter groupings if we model Chapter as a node type). These edges can be derived from the directory structure and naming (since our file paths already encode which article a paragraph belongs to, etc.). Additionally, we can create reference edges: if a paragraph mentions another article (common in legal texts: “...as provided in Article 25”), we could detect that via regex and create an edge Paragraph_refers_to_Article. Memory_FS does not automatically capture such references, but since the content is easily accessible (as text), we can scan for patterns like “Article 25” and if found, link the corresponding nodes in the graph.
  • Graph Storage: With MGraph-DB, which is JSON-based, we can create the graph by writing a script or using MGraph’s API. MGraph likely provides Python classes or methods to add nodes/edges (given it was designed to be type-safe and layered, similar in philosophy to Memory_FS). We could do:
from mgraph_db import MGraph
graph = MGraph()
# create nodes and edges using data from memory_fs
for para_file in all_paragraph_files:
    node = graph.edit().add_node(node_id=para_file.file_id, attributes={"text": para_file.content, "type": "Paragraph"})

(This is pseudo-code; actual API may differ, but conceptually we add nodes with attributes.)

Once the graph is constructed, we can persist it using MGraph’s storage (likely also JSON or a file-based snapshot). At this point, we have effectively transformed the GDPR text into a knowledge graph, where each clause is a node and relationships like containment or reference are edges. This graph can then be queried with graph queries (e.g., find all paragraphs related to a certain topic, traverse from a Recital to related Articles, etc.). It can also support advanced use cases like semantic search or linking GDPR to other regulations (if those are also ingested similarly).

Storing Intermediate Representations: As mentioned, intermediate formats help in scaling deployment:

  • If using SQLite via Memory_FS backend, the entire Memory_FS content is in one .db file. This can be versioned (for releases like “GDPR graph v1.0”) and distributed. It’s ACID-compliant, meaning updates can be done safely if needed.
  • If using ZIP archives, we can distribute the GDPR Memory_FS as a .zip containing the folder structure of JSON and MD files. This is convenient for cloud functions or client-side apps: they can download one file and load it into Memory_FS (Memory_FS could have a method to import from a zip, or we manually iterate through zip entries and populate Memory_FS).
  • For cloud deployment, the S3 backend of Memory_FS (planned) can be used. In that scenario, each file’s content, config, and metadata are stored as objects in an S3 bucket. This is highly scalable – multiple instances of an application can read from the same bucket. And versioning in S3 can track changes to the files over time. For example, an official update to GDPR could be applied by updating some content files in the bucket; clients could detect newer versions via changed metadata.

Using these intermediate forms, organizations can deploy the structured GDPR in a cloud environment where it can be accessed via APIs. One could imagine a service that serves parts of GDPR via a GraphQL API, backed by Memory_FS + S3. In fact, Memory_FS’s roadmap includes integration points like a GraphQL/REST API and CLI tools, which would allow querying the filesystem (and thus the regulatory content) in flexible ways. For instance, a GraphQL query could retrieve an article by number, or fetch all paragraphs that contain a keyword, all powered by the structured store.

Validation in Graph Form: A quick verification when generating the graph is to ensure node counts and certain relationships match expectations. For example, ensure that “contains” edges count matches the number of paragraphs per article from the source (Article 5 had X paragraphs, so the Article 5 node should have X outgoing contains edges to paragraph nodes). This cross-checks the integrity of the graph against the Memory_FS source. Since Memory_FS is the source of truth, if any discrepancy is found in the graph, it likely means an error in the graph construction logic, which can be corrected.

In conclusion, the downstream graph generation is a natural extension of the Memory_FS representation. Memory_FS provides the clean, file-granular data needed to populate a graph database effectively. By using intermediate formats and the flexibility of Memory_FS to plug into different storage, we ensure that this step can scale from a local environment (loading from a file or memory) to an enterprise setup (loading from a cloud store or database). The resulting graph (which could be implemented in MGraph-DB or even Neo4j, etc.) becomes a powerful tool for compliance experts, enabling queries that answer complex questions about the regulation’s content and its interrelationships.

Deployment Considerations: Memory_FS Storage Providers

During development and prototyping, we use the in-memory storage of Memory_FS (the default Storage_FS__Memory), which stores files in Python dictionaries in memory. This is fast and convenient for testing, as all operations are simply manipulating in-memory bytes. However, for persistence and sharing of data, Memory_FS supports multiple storage providers via its Storage Abstraction Layer. We leverage these to transition from development to release:

  • In-Memory (Volatile) Storage: Ideal for development, unit tests, or ephemeral analysis. We utilized it for initially building the GDPR structure. It offers quick reads/writes and easy teardown (simply clear the storage or let it go out of scope). For example, Storage_FS__Memory is used by default for Memory_FS__Storage. In our process, once the Memory_FS is populated, we can call memory_fs.storage.clear() to reset if needed (after we’ve exported or saved it) – during iterative development this helps to re-run parsing without persistence issues.

  • Local Disk Storage: Although not yet implemented at the time of writing (marked as planned), a Storage_FS__Local_Disk would map the Memory_FS structure onto the actual OS filesystem. This could be useful for debugging (one could inspect the created files on disk) or for small scale usage where a simple folder of files is acceptable. If this backend was available, saving the GDPR Memory_FS to disk could produce a folder “GDPR/” with all subfolders and files, which is human-readable and can be managed with standard tools.

  • SQLite Storage: A planned Storage_FS__Sqlite provider would embed the filesystem into a single SQLite database file. The benefit here is having one compact file representing the entire dataset, with transaction support and the ability to query content with SQL if needed. For distributing the GDPR representation, a SQLite file is excellent – it’s a well-understood format, can be zipped further for compression, and ensures atomic updates. In a versioned release scenario, we might publish gdpr_v1.sqlite which contains the snapshot of the Memory_FS. Clients can load this via Memory_FS by initializing storage with that SQLite file. The snippet in the technical debrief shows how one could specify a custom storage: Memory_FS__Storage(storage_fs=Storage_FS__Sqlite(db_path="data.db")) to use SQLite.

  • S3 (Cloud Object) Storage: For cloud deployments and multi-user access, Storage_FS__S3 is a planned backend that would connect Memory_FS to an AWS S3 bucket (or by extension, any object storage). Using S3 provides durability, scalability, and access control. If we deployed the GDPR Memory_FS to S3, any number of serverless functions or microservices could access the regulation data without each needing their own copy in memory – they could query specific keys (files) on-demand through Memory_FS’s interface which would fetch from S3. This also enables centralized updates: updating a file in the S3 store would make the new content available to all clients (assuming eventual consistency is handled). The metadata files could be used to ensure consistency and versioning on the client side (e.g., verifying the hash after download).

Benefits of Pluggable Storage: The ability to switch storage by changing a single configuration means our implementation is not locked into one environment. We used in-memory for parsing and validation; we can then save to a durable format for distribution. For example, after building the in-memory FS, we could instantiate a new Memory_FS with a SQLite storage and programmatically copy files over (or in the future, Memory_FS might support directly migrating storage). This yields a production-ready artifact. In a continuous integration pipeline, one could automate: parse latest GDPR -> produce Memory_FS -> save to gdpr.zip (zip storage) or gdpr.db (SQLite) -> publish artifact. Consumers of the artifact can either use Memory_FS to load it or use it as input to their database as described. The pluggable layer thus supports both versioned releases (where you want read-only snapshots) and live services (where cloud storage enables concurrent read/write).

Future Work and Advanced Concepts

Graphs of Graphs of Graphs (G³) – Nested Memory_FS Structures

One forward-looking concept is G³ (Graphs of Graphs of Graphs), which in our context translates to nesting Memory_FS representations within one another to create layered graphs. Practically, this means we can treat an entire Memory_FS (like our GDPR file graph) as a single file in a higher-level Memory_FS. For instance, imagine we have multiple regulatory standards (GDPR, CCPA, HIPAA, etc.), each parsed and stored as a Memory_FS archive (perhaps each saved as a ZIP or SQLite file). We could create a top-level Memory_FS called “Compliance_Library” and store each regulation’s archive as a binary content file in it. Each of those files (say gdpr.zip stored in the parent FS) carries its own config and metadata, and perhaps metadata indicating it’s a Memory_FS archive of a regulation (a custom file type or a flag in config could denote this). The parent Memory_FS thus becomes a graph of graphs – its files are themselves graphs (the regulations), and you could even nest further if needed (graphs of graphs of graphs).

The benefit of this G³ approach is a recursive metadata structure: the parent layer can store high-level metadata about each graph (e.g., GDPR version, last updated date, jurisdiction), and the child layers (the actual Memory_FS of GDPR) store the fine-grained metadata of the content. This enables complex queries like: “find a concept in all regulations” – one could open each nested Memory_FS in turn and search within, or even index them collectively at the parent level by storing summary info in the parent config. It also mirrors how knowledge graphs can have sub-graphs or partitions; here each Memory_FS archive is an independent sub-graph that can be connected via the containing structure.

Implementing G³ with Memory_FS is straightforward given its design:

  • We treat a Memory_FS archive file as an opaque binary from the parent’s perspective. For example, we could use Memory_FS__File__Type__Data() for storing the binary (since Memory_FS supports binary file types). The content file would be the raw bytes of the zip or DB file representing the child graph. The config might label it as type “regulation-archive” and include metadata like name or schema version.
  • To inspect or use a child graph, one would extract that file (e.g., unzip it to a Memory_FS instance, or load the SQLite via Memory_FS storage). This can be done on-the-fly when needed, enabling lazy loading of graphs.
  • The parent Memory_FS can also store relationships between those archives, if applicable (for example, if we had a meta-graph connecting GDPR and another law through related concepts, the parent FS could have a JSON file enumerating those links).

As Dinis Cruz notes, G³ is about having “ontologies of ontologies” and ways to connect multiple graphs/domains. In our implementation context, using Memory_FS recursively provides a clean, file-based way to encapsulate and connect these multi-graph structures. It opens the door to composite regulatory graphs – e.g., a European Privacy graph that contains GDPR plus related regulations, each as subgraphs, all managed in a unified system.

Enhancements and Future Features

While the current approach successfully represents GDPR in Memory_FS and exports it, there are several areas for future enhancement:

  • Automated Cross-References: We discussed adding edges in the graph for references (Article-to-Article mentions). In the file representation, we could also embed cross-reference metadata. For instance, if Article 17 refers to Article 12, we might add a note in Article 17’s metadata file or content (as a hyperlink in Markdown) which could later be picked up by a graph builder. Automating detection of these references could be integrated into the parsing step. This would enrich the dataset with relational data before even constructing the graph database.

  • Semantic Tagging: Beyond the raw text, adding semantic metadata (tags for topics like “consent”, “data breach”, etc.) to each paragraph could be very useful. This could be done manually by experts or via NLP. Memory_FS can accommodate this by adding additional JSON files or extending the config. For example, one could have a parallel structure of metadata files (distinct from the auto-generated .metadata) that store tags or classification for each node. These could then translate into attributes on graph nodes or facilitate advanced queries (e.g., find all provisions tagged “Security”).

  • Performance Considerations: Memory_FS is in-memory, so extremely large documents or many concurrent accesses may require tuning. The technical architecture emphasizes type safety and layering which might introduce overhead. In practice, GDPR text is not huge (under a few hundred KB), so performance is fine. But if scaling to a library of thousands of documents, one should profile memory usage and perhaps use the SQLite backend to handle larger volume on disk. Caching strategies (Memory_FS could cache frequently accessed files or use lazy loading for rarely used parts) might become relevant, and indeed the design of Target_FS allows for future caching layers.

  • Collaboration and Editing: In a multi-user scenario (for example, a team curating annotations on GDPR), we might want a mechanism for concurrent editing. Future features like file locking, diffing versions, or transaction support across multiple file edits would be valuable. The Memory_FS metadata (hash) can detect conflicts (if two edits happen on the same base version). A possible future addition is a merge tool for Memory_FS, to intelligently merge two versions of the same file graph (much like git merges text changes). This would be useful if, say, one branch adds tags while another updates text – a three-way merge could integrate both.

  • Integration APIs: As noted in the roadmap, adding a GraphQL or REST API on top of Memory_FS would greatly facilitate building applications over this data. A GraphQL API could allow queries like { article(number:5) { paragraphs { text } } } which the service would resolve by reading Memory_FS. This would abstract away the file system details and present the data as structured JSON to clients. Implementing such an API is more about building on top of Memory_FS rather than changing Memory_FS itself, but it’s a logical next step for making the structured GDPR widely accessible.

  • Security and Access Control: If deploying this in a real environment, especially using cloud storage, features like encryption and access control are important. The roadmap mentions encryption at rest and ACLs as future items. For our GDPR use case, we might not need to restrict read access (it’s public law), but if this approach is used for internal standards or documents, one might need Memory_FS to enforce permissions on certain files or directories. Similarly, audit logging (tracking who accessed or modified which file) could be useful in a multi-user compliance platform.

Broader Applications

While developed for GDPR, the file-based graph representation approach can be applied to any structured text (other laws, standards, policies). The combination of Memory_FS and a graph database provides a pipeline from unstructured documents to a rich, queryable knowledge base. Future work could include applying this to build a “compliance knowledge graph” containing multiple regulations linked by common concepts (using the G³ approach). Another avenue is integrating with AI tools: for instance, using LLMs to automatically summarize each file (paragraph) and storing the summary in a parallel file structure, or using the structured data as a retrieval source for GPT-based question answering. Memory_FS’s type-safe design and serialization support ensures that even if we store complex objects (like embedding vectors for paragraphs, or Type_Safe objects), they can be serialized to JSON or binary and stored alongside the text. This means we could augment the regulatory graph with AI-generated insights without leaving the Memory_FS-managed ecosystem.

In summary, the work done here to represent GDPR in Memory_FS is a foundational step. Future enhancements will deepen the intelligence of the data (through metadata and links) and broaden its accessibility (through APIs and integration into cloud systems). The concept of graphs of graphs will enable scaling to entire libraries of knowledge, maintaining structure at every level. By continuing to leverage Memory_FS’s flexible architecture and extending it where needed, we can build robust, scalable systems for managing and querying complex bodies of text like the GDPR.

References

  1. Memory-FS Technical Debrief (2025) – Architecture and Key Features of Memory_FS, esp. type safety, storage backends, and three-file pattern.
  2. GDPRhub – Overview of GDPR – description of GDPR structure (173 recitals, 99 articles with paragraphs and sub-points).
  3. Neo4j Blog (2018) – Graph Databases as GDPR Compliance Tools – discusses why graph relationships are crucial for GDPR data management.
  4. LinkedIn Post by Dinis Cruz (2025) – On G³ (Graphs of Graphs of Graphs) – explains the concept of connecting multiple graphs and ontologies.
  5. Dinis Cruz (2025) – MGraph-AI introduction – outlines a memory-first graph database approach that complements Memory_FS for persistence and querying of graph-structured JSON data.