Skip to content

July 2025 Published Materials

During July 2025, I published 12 comprehensive research documents with a strong focus on the practical applications of semantic knowledge graphs in cybersecurity and enterprise systems. The month's publications demonstrate particular emphasis on cybersecurity innovations (7 papers covering SIEM reimagination, GDPR compliance, IAM evolution, and risk management), the evolution of AI from experimental LLMs to production-ready systems (2 papers on small models and AI-assisted development), and strategic approaches to modern challenges in news monetization and SaaS pricing models. My research this month showcases how ephemeral, serverless architectures combined with graph-based knowledge representation can fundamentally transform traditional enterprise security and compliance approaches.

The publications reveal several interconnected themes in my work: the shift from continuous data ingestion to on-demand, context-aware processing (exemplified in the Ephemeral GenAI SIEM), the progression from large language models to deterministic code as AI matures, and the critical importance of finding "good enough" thresholds in risk management and product development. My exploration of GDPR compliance through Memory_FS and interactive Q&A graphs demonstrates practical implementations of theoretical concepts, while the proposals for graph-based cloud IAM and Project VulnAI show how these ideas scale to enterprise-level security challenges. Throughout July, I consistently explored how semantic knowledge graphs, sustainable AI practices, and serverless architectures converge to create more efficient, transparent, and economically viable solutions for modern organizations.

July 2025 Research Publications

Overview

Date Title Focus Area Key Concepts
07/02 Ephemeral GenAI SIEM: A Serverless, Graph-Driven Approach to Security Event Management Cyber Security Serverless architecture, semantic knowledge graphs, LLM integration, LETS pipeline
07/02 Using Memory_FS to Build a File-Based Representation of the GDPR Standard Cyber Security Memory_FS framework, GDPR compliance, file-based storage, three-file pattern
07/03 LLM-Driven GDPR Compliance Q&A Graph – Technical Brief Cyber Security Interactive chatbot, GDPR compliance, knowledge graphs, client-side implementation
07/04 FAQ - Evolving Semantic Graphs and Ontologies with LLMs and MGraph-DB Graphs MGraph-DB, semantic graphs, ontology evolution, confidence scoring
07/04 From Free Scraping to Fair Compensation: Cloudflare's GenAI Crawler Charges and the Future of News Monetization The Future of News AI crawler fees, content monetization, micropayments, trust services
07/04 The Joy of Programming in the Age of AI-Assisted Development Development and GenAI Vibe coding, flow state, citizen developers, no-code development
07/04 Usage-Based Billable Entities: Aligning SaaS Pricing with Customer Usage Projects Consumption-based pricing, value metrics, SaaS billing, usage tracking
07/06 Finding the "Good Enough" Threshold: Optimizing Risk, Creativity, and Product Decisions Cyber Security Risk appetite, inefficiency delta, Wardley Maps, EVTP model
07/06 From Large Language Models to Small Models and Code: The Evolution of AI Solutions Development and GenAI LLMs, SLMs, deterministic code, AI commoditization, edge computing
07/06 Graph-Based Cloud IAM in the GenAI Agentic World Cyber Security Cloud IAM, least privilege, ephemeral permissions, semantic knowledge graphs
07/06 Semantic Knowledge Graphs, G³, and Sustainable AI: Aligning Innovations with ESG Objectives Cyber Security ESG alignment, sustainable AI, G³ concept, carbon-efficient computing
07/27 Project VulnAI: AI-Powered Vulnerability Risk Management Platform Projects Risk-based vulnerability management, AI analysis, semantic graphs, serverless architecture

Detailed Summaries

Ephemeral GenAI SIEM: A Serverless, Graph-Driven Approach to Security Event Management

July 2, 2025

This comprehensive white paper introduces the Ephemeral GenAI SIEM, a revolutionary approach to Security Information and Event Management that addresses the critical shortcomings of traditional SIEM solutions. The paper highlights how conventional SIEMs struggle with scale, context, and cost, often ingesting massive volumes of log data while generating thousands of alerts with little context. The proposed solution leverages serverless computing, semantic knowledge graphs, and generative AI to fundamentally redefine how security data is collected, analyzed, and acted upon, with a focus on on-demand data collection rather than continuous ingestion.

The architecture employs a deterministic LETS (Load, Extract, Transform, Save) pipeline pattern where every processing step writes its output to persistent storage, enabling full traceability and reproducibility. The system uses Large Language Models for tasks requiring human-like reasoning while maintaining explainability through structured output schemas and controlled sandboxing. By treating cloud object storage as the database and using MGraph-DB as the in-memory graph engine, the solution achieves high scalability and cost-efficiency while providing context-rich investigations that bridge the gap from data detection to actionable intelligence.

Using Memory_FS to Build a File-Based Representation of the GDPR Standard

July 2, 2025

This technical white paper presents an innovative method for converting the General Data Protection Regulation text into a structured, file-based representation using the Memory_FS framework. The approach parses the official GDPR document into a hierarchical set of files, with each regulatory element (down to individual paragraphs or bullet points) captured using Memory_FS's three-file pattern consisting of content, config, and metadata files. This structured representation provides a robust foundation for further processing and analysis of the complex 99-article regulation.

The methodology demonstrates how to maintain fidelity through round-trip conversions between formats (including Markdown and PDF) while ensuring the legal document's structure is preserved. The paper provides detailed implementation guidance with code examples for ingesting the GDPR document into Memory_FS and subsequently exporting or utilizing the data. The resulting Memory_FS output can feed into graph databases like MGraph-DB to generate knowledge graphs of the law, with support for intermediate representations (ZIP or SQLite) for cloud deployment, showcasing the advantages of Memory_FS's pluggable storage backends.

LLM-Driven GDPR Compliance Q&A Graph – Technical Brief

July 3, 2025

This technical brief outlines the development of an interactive chatbot UI that guides users through GDPR compliance questions while dynamically building a knowledge graph from their answers. Unlike static questionnaires, this approach uses Large Language Models to adapt questions to the user's context in real-time, creating a personalized Q&A flow that captures up to 10 questions about GDPR compliance. The system constructs a graph of nodes and edges representing the user's context and answers, with the LLM using this growing graph to inform subsequent questions.

The implementation is entirely client-side with no server storage, leveraging the user's browser and local storage to maintain state while using an LLM API for intelligence. The workflow includes a confirmation step where the system shows users what information has been captured for verification, followed by generating a final guidance report with personalized GDPR recommendations. The brief provides detailed prompt schemas for each phase of interaction and discusses front-end implementation details including chat interface design, state management, and optional graph visualization components.

FAQ - Evolving Semantic Graphs and Ontologies with LLMs and MGraph-DB

July 4, 2025

This FAQ-style white paper addresses key technical questions about Dinis Cruz's approach to building and evolving semantic knowledge graphs using Large Language Models and the MGraph-DB platform. The paper explores how this architecture differs from traditional Semantic Web frameworks like OWL ontologies, focusing on concept identification, confidence scoring, reasoning methods, and graph lifecycle management. It explains how concepts are handled more fluidly than in OWL, with human-friendly labels and contextual uniqueness rather than rigid IRIs, allowing for emergent alignment through iterative refinement.

The document details how confidence and relevance scores are calculated and used throughout the system, including LLM-generated relevance ratings and algorithmic composite scores. It describes the storage and management of graphs using MGraph-DB's file-based, version-controlled approach, and introduces the concept of an "ontology of ontologies" (G³) that enables multiple domain ontologies to coexist and interlink. This federated approach to knowledge management allows different teams to maintain their own taxonomies while enabling interoperability, representing a shift from centralized to distributed knowledge organization.

From Free Scraping to Fair Compensation: Cloudflare's GenAI Crawler Charges and the Future of News Monetization

July 4, 2025

This white paper examines Cloudflare's groundbreaking initiative to block AI crawlers by default from websites on its network unless they have permission or provide compensation to content owners. The paper contextualizes this move within the broader crisis facing content publishers, who bear infrastructure costs from AI scraping while being cut out of the value chain as AI models can regurgitate aggregated knowledge without directing users back to original sites. Cloudflare's permission-based model forces AI companies to obtain consent and potentially strike licensing deals before ingesting content.

The paper outlines a comprehensive vision for news monetization in the GenAI era that extends beyond crawler access fees to include structured content APIs, trust and verification services, and micro-payments from readers. It provides detailed technical implementation guidance for enabling fair access and payments, including traffic identification and control, authentication mechanisms, metering and logging usage, and billing integration. The analysis demonstrates how publishers can create multiple, complementary revenue streams while maintaining quality journalism, with Cloudflare's initiative representing just the first step toward a more balanced ecosystem where content creators are compensated when their work generates value.

The Joy of Programming in the Age of AI-Assisted Development

July 4, 2025

This white paper explores how recent advances in generative AI are unlocking the joy of programming for a new wave of developers through "vibe coding" or No Code Development (NCD). The paper examines how AI-powered platforms dramatically lower barriers to entry, enabling citizen developers across business functions to rapidly turn ideas into working applications. It analyzes the psychological aspects of programming, including the flow state and creative delight that has long motivated software engineers, and how AI tools now deliver that instant interactivity to non-coders through natural language interfaces.

The paper presents compelling evidence that this democratization of programming will actually increase rather than decrease the need for professional developers, as they will be needed to provide robust architectures, governance, and maintainability for the explosion of citizen-created applications. It explores how organizations can harness this influx of new programmers while maintaining software quality, with professional developers evolving into mentors and architects who build platforms and guardrails for safe no-code development at scale. The analysis concludes that programming skills are becoming widespread, with the creative joy of coding transitioning from a specialist's privilege to a universal experience.

Usage-Based Billable Entities: Aligning SaaS Pricing with Customer Usage

July 4, 2025

This comprehensive white paper examines the shift from traditional fixed subscription models to usage-based pricing in SaaS, where customers pay only for what they actually use based on discrete billable units. The paper addresses the fundamental misalignment in traditional subscription models where customers often pay for unused capacity while heavy users consume far more resources than their subscription fee covers. It presents usage-based billing as a solution that aligns revenue with value, delivering fairness to customers and sustainability to providers.

The document provides detailed guidance on defining billable units or value metrics, from cloud infrastructure units and API calls to data volume and operational events. It outlines the technical requirements for building a usage-based billing system, including instrumentation and metering, usage attribution, billing engines, real-time visibility, and payment management. Through analysis of successful implementations at companies like AWS, Snowflake, Twilio, and Stripe, the paper demonstrates how usage-based models enable faster growth, higher net retention, and better alignment between provider success and customer value.

Finding the "Good Enough" Threshold: Optimizing Risk, Creativity, and Product Decisions

July 6, 2025

This white paper explores the critical concept of the "good enough" threshold across three scenarios: cybersecurity risk management, independent music production, and product design. The paper introduces the concept of the "Inefficiency Delta" - the gap between optimal outcomes and over-engineered or over-cautious approaches that yield diminishing returns. Through detailed case studies, it demonstrates how overshooting or undershooting the optimal point can hurt outcomes in each domain.

The analysis applies Wardley Maps' Explorers-Villagers-Town Planners (EVTP) model to show how success comes from operating in the middle zone between chaotic exploration and rigid execution. The paper provides actionable strategies for finding the optimal point, including clearly defining acceptance criteria, using time boxes and cost caps, leveraging early feedback, breaking big bets into small ones, and aligning incentives with optimal risk-taking. The conclusion emphasizes that perfectionism, over-engineering, and ultra-conservatism are all manifestations of failing to define "good enough" and pulling the trigger when it's met.

From Large Language Models to Small Models and Code: The Evolution of AI Solutions

July 6, 2025

This white paper charts the evolution from Large Language Models to Small Language Models and ultimately to deterministic code, presenting this as the natural maturation of AI technology. The paper frames this evolution through Wardley Maps, showing how language processing capabilities progress from novel, chaotic solutions (LLMs) to stable, commoditized utilities (code/APIs). It explains how LLMs served as exploratory tools that demonstrated what's possible, but their expense, non-determinism, and black-box nature make them unsuitable for many production use cases.

The document details the rise of Small Language Models (SLMs) as focused specialists that can match or surpass large models on specific tasks while offering lower costs, faster performance, on-device processing capabilities, and better customizability. It then explores the path to full determinism, where AI capabilities are promoted into hard-coded solutions or API calls once thoroughly understood. The analysis concludes that the future of AI is not one mega-model to rule them all, but rather a constellation of smaller models, micro-models, and conventional software working in concert, with organizations using the right tool for each job.

Graph-Based Cloud IAM in the GenAI Agentic World

July 6, 2025

This white paper proposes a revolutionary graph-based IAM and permission workflow for cloud providers to address the security challenges posed by generative AI agents that can dynamically invoke cloud APIs in unpredictable ways. The paper highlights how current cloud IAM systems lead to over-provisioned permissions, where applications run with far more privileges than actually needed, creating significant security risks that are amplified when AI agents have flexible decision spaces influenced by prompts or even malicious injections.

The proposed solution models cloud permissions, resources, and API calls as a semantic knowledge graph that can precisely determine the exact privileges required per action and issue ephemeral, context-specific credentials. The architecture would enable just-in-time permissions minted per action, with AI agents receiving temporary credentials scoped to exactly the permissions needed for each operation. Implementation details include using semantic knowledge graphs like MGraph-DB, creating policy decision engines, and potentially establishing cloud-agnostic abstraction layers. This approach would dramatically reduce the blast radius of compromised components while enabling intelligent, context-aware cloud security workflows.

Semantic Knowledge Graphs, G³, and Sustainable AI: Aligning Innovations with ESG Objectives

July 6, 2025

This comprehensive white paper explores how semantic knowledge graphs and the G³ concept (Graphs of Graphs of Graphs) align with efforts to make IT and AI systems more sustainable and ESG-compliant. The paper examines how these practices intersect with carbon-efficient computing, ethical AI, stakeholder transparency, open-source collaboration, responsible data governance, and AI explainability. G³ advocates connecting multiple graphs and ontologies rather than forcing a single dominant hierarchy, enabling systems to remain flexible, adaptive, and inclusive of diverse viewpoints.

The analysis demonstrates how semantic knowledge graphs contribute to environmental sustainability through efficiency gains and knowledge reuse, support ethical AI by avoiding one-dimensional bias and ensuring transparency in meaning-making, and enable stakeholder empowerment through open-source development and knowledge sharing. The paper shows how these approaches facilitate responsible data governance by making relationships and rules explicit in machine-readable form, and enhance AI explainability through provenance-enabled systems that can trace their reasoning. The conclusion positions Dinis Cruz's work as exemplifying a path where advanced technology and ESG ideals exist in harmony, with knowledge elevated to the same stature as data and code in engineering priorities.

Project VulnAI: AI-Powered Vulnerability Risk Management Platform

July 27, 2025

This detailed project brief presents VulnAI, a next-generation SaaS platform for AI-driven vulnerability management that prioritizes risk context over raw vulnerability counts. The platform leverages semantic knowledge graphs, automated AI analysis, and a deterministic data pipeline to unify diverse security data into a coherent risk knowledge base. Unlike traditional vulnerability management tools that overwhelm teams with endless lists, VulnAI helps security teams and developers make smarter decisions by focusing remediation efforts where they matter most to the business.

The platform features risk-centric prioritization that contextualizes every vulnerability with business impact and exploitability, AI-powered analysis with full traceability through a controlled LETS pipeline, and a semantic knowledge graph backbone that enables complex querying and mapping of technical issues to business concerns. The architecture employs ephemeral and serverless components for cost-efficiency and scalability, with an open-source core that fosters community contributions and transparency. The implementation plan outlines a phased approach from foundation through targeted MVP to production hardening, with a clear business case demonstrating value for both customers (through improved risk posture and efficiency) and the SaaS business (through a large addressable market and differentiated offering).