This page presents the scientific papers that I published.

  • Papers
    -Language Description
    -Theoretic Foundations
    -Implementation algorithms
  • Slides

See also:

Papers and talks



  • Théorie, conception et réalisation d'un langage adapté à XML. A. Frisch. Ph.D thesis (in French). See this page.

    (English) This thesis describes the theoretical foundations of a type-safe and higher-order functional language, adapted to the manipulation of XML documents. The first part presents the semantical bases: type algebra with recursive types, boolean combination, arrow and product constructors; definition of a semantic subtyping relation via a set-theoretic notion of model for types; description of the functional kernel of the language, in particular its type system and its type-driven dynamic semantics. The second part focuses on the algorithmical aspects: computing the subtyping relation and compiling pattern matching with optimizations. The third part presents the CDuce language, built on top of the functional kernel, together with some of the original techniques used in its implementation.

    (French) Cette thèse décrit les fondements théoriques d'un langage de programmation fonctionnel d'ordre supérieur, typé, adapté à la manipulation de documents XML. La première partie présente les bases sémantiques: algèbre de types avec types récursifs, combinaisons boolénnes et constructeurs flèche et produit; définition d'une relation de sous-typage sémantique en passant par une notion de modèle ensembliste des types; présentation du noyau fonctionnel du langage, en particulier son système de types et sa sémantique dynamique dirigée par les types. La deuxième partie étudie les aspects algorithmiques: calcul de la relation de sous-typage et compilation optimisée du filtrage par motifs. La troisième partie présente le langage CDuce, construit au dessus du noyau fonctionnel, ainsi que certaines des techniques originales mises en oeuvre dans son implémentation.

Language Description

  • CDuce: An XML-Centric General-Purpose Language. V. Benzaken, G. Castagna, and A. Frisch. Proceedings of the ACM International Conference on Functional Programming, 2003.

    We present the functional language CDuce, discuss some design issues, and show its adequacy for working with XML documents. Distinctive features of CDuce are a powerful pattern matching, first class functions, overloaded functions, a very rich type system (arrows, sequences, pairs, records, intersections, unions, differences), precise type inference for patterns and error localization, and a natural interpretation of types as sets of values. We also outline some important implementation issues; in particular, a dispatch algorithm that demonstrates how static type information can be used to obtain very efficient compilation schemas.

  • CDuce: a white paper. V. Benzaken, G. Castagna, and A. Frisch. Workshop PLAN-X: Programming Language Technologies for XML Pittsburgh PA, Oct. 2002.

    Superseded by the previous paper

Theoretic Foundations

  • A Gentle Introduction to Semantic Subtyping. G. Castagna, and A. Frisch. Second workshop on Programmable Structured Documents (Hakone).

    Subtyping relations are usually defined either syntactically by a formal system or semantically by an interpretation of types into an untyped denotational model. In this work we show step by step how to define a subtyping relation semantically in the presence of functional types and dynamic dispatch on types, without the complexity of denotational models, and how to derive a complete subtyping algorithm.

    The presentation is voluntarily kept informal and discursive and the technical details are reduced to a minimum since we rather insist on the motivations, the intuition, and the guidelines to apply the approach, the technical details being already published in the paper below.

  • Semantic subtyping. A. Frisch, G. Castagna, and V. Benzaken. In LICS 2002.

    Usually subtyping relations are defined either syntactically by a formal system or semantically by an interpretation of types in an untyped denotational model. In this work we show how to define a subtyping relation semantically, for a language whose operational semantics is driven by types; we consider a rich type algebra, with product, arrow, recursive, intersection, union and complement types. Our approach is to "bootstrap" the subtyping relation through a notion of set-theoretic model of the type algebra.

    The advantages of the semantic approach are manifold. Foremost we get "for free" many properties (e.g., the transitivity of subtyping) that, with axiomatized subtyping, would require tedious and error prone proofs. Equally important is that the semantic approach allows one to derive complete algorithms for the subtyping relation or the propagation of types through patterns. As the subtyping relation has a natural (inasmuch as semantic) interpretation, the type system can give informative error messages when static type-checking fails. Last but not least the approach has an immediate impact in the definition and the implementation of languages manipulating XML documents, as this was our original motivation.

  • The Relevance of Semantic Subtyping. M. Dezani-Ciancaglini, A. Frisch, E. Giovannetti, and Y. Motohama. In Intersection Types and Related Systems Electronic Notes in Theoretical Computer Science 70 No.1 (2002).

    We compare Meyer and Routley's minimal relevant logic B+ with the recent semantics-based approach to subtyping introduced by Frisch, Castagna and Benzaken in the definition of a type system with intersection and union. We show that - for the functional core of the system - such notion of subtyping, which is defined in purely set-theoretical terms, coincides with the relevant entailment of the logic B+.

  • Types récursifs, combinaisons booléennes et fonctions surchargées: application au typage de XML. A. Frisch. In french. Mémoire du DEA « Programmation: Sémantique, Preuves et Langages » (Paris VII)

    Nous étudions un lambda-calcul typé avec une opération de filtrage qui permet d'exprimer des fonctions surchargées. L'algèbre de types a des types de base, les types produit et flèche, les types récursifs, les combinaisons booléennes finies arbitraires. Nous considérons une notion de sous-typage sémantique, issue de l'interprétation des types comme ensembles de valeurs du langage.

    Les caractéristiques présentes dans le calcul et l'algèbre de types sont motivées par l'utilisation du calcul comme un noyau pour un langage adapté à la manipulation de documents XML.

    Nous utilisons une approche sémantique pour étudier l'algèbre de types, tout en conservant une preuve syntaxique de préservation du typage par réduction.

Implementation algorithms

  • Regular tree language recognition with static information. A. Frisch. In TCS 2004 (Toulouse).

    This paper presents our compilation strategy to produce efficient code for pattern matching in the CDuce compiler, taking into account static information provided by the type system.

  • Regular tree language recognition with static information. A. Frisch. In PLAN-X 2004. Preliminary version of the TCS paper.

    This paper presents our compilation strategy to produce efficient code for pattern matching in the CDuce compiler, taking into account static information provided by the type system. Indeed, this information allows in many cases to compute the result (that is, to decide which branch to consider) by looking only at a small fragment of the tree. Formally, we introduce a new kind of deterministic tree automata that can efficiently recognize regular tree languages with static information about the trees and we propose a compilation algorithm to produce these automata.

  • Greedy regular expression matching. A. Frisch, and L. Cardelli. In ICALP 2004 (Turku).

    This paper studies the problem of matching sequences against regular expressions in order to produce structured values.

  • Greedy regular expression matching. A. Frisch, and L. Cardelli. In PLAN-X 2004. Preliminary version of the ICALP paper.

    This paper studies the problem of matching sequences against regular expressions in order to produce structured values. More specifically, we formalize in an abstract way a greedy disambiguation policy and propose efficient matching algorithms. We also formalize and address a folklore problem of non-termination in naive implementations of the greedy semantics.

    Regular expression types and patterns have been introduced in the setting of XML-oriented functional languages. Traditionnaly, all the XML values and sequences share a common uniform runtime representation. Our work suggests an alternative implementation technique, where regular expression types define not only a set of abstract flat sequences, but also a custom structured representation for such values. This paves the way to a variety of language designs and implementations to integrate XML structural types in existing languages (class-based OO languages, imperative features, constrained runtime environment, ..).


  • Error Mining for Regular Expression Patterns. Giuseppe Castagna, Dario Colazzo, and Alain Frisch. In the 9th Italian Conference On Theoretical Computer Science, ICTCS 2005.

    In the design of type systems for XML programming languages based on regular expression types and patterns the focus has been over result analysis, with the main aim of statically checking that a transformation always yields data of an expected output type. While being crucial for correct program composition, result analysis is not sufficient to guarantee that patterns used in the transformation are correct. In this paper we motivate the need of static detection of incorrect patterns, and provide a formal characterization based on pattern matching operational semantics, together with locally exact type analysis techniques to statically detect them.

  • Parametric polymorphism for XML. H. Hosoya, A. Frisch, and G. Castagna. In The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2005.

    Although several type systems have been investigated for XML, parametric polymorphism is rarely treated. This well-established typing discipline can also be useful in XML processing in particular for programs involving "parametric schemas," i.e., schemas parameterized over other schemas (e.g., SOAP). The difficulty in treating polymorphism for XML lies in how to extend the "semantic" approach used in the mainstream (monomorphic) XML type systems. A naive extension would be "semantic" quantification over all substitutions for type variables. However, this approach reduces to an NEXPTIME-complete problem for which no practical algorithm is known, and, moreover, the type system obtained in this way is characterized by a hardly useful expressiveness. In this paper, we propose a lighter-weight way of extending the semantic approach, where we interpret type variables as markings in data values for indicating the parameterized subparts. As a result, we can construct a sensible polymorphic type system both with a semantic flavor and a set of practical algorithms needed for typechecking. Most of these algorithms can be obtained by local modifications to existing ones for a monomorphic system.

  • OCaml + XDuce. A. Frisch. ICFP 2006

    This paper presents the core type system and type inference algorithm of OCamlDuce, a merger between OCaml and XDuce. The challenge was to combine two type checkers of very different natures while preserving the best properties of both (principality and automatic type reconstruction on one side; very precise types and implicit subtyping on the other side). Type inference can be described by two successive passes: the first one is an ML-like unification-based algorithm which also extracts data flow constraints about XML values; the second one is an XDuce-like algorithm which computes XML types in a direct way. An optional preprocessing pass, called strengthening, can be added to allow more implicit use of XML subtyping. This pass is also very similar to an ML type checker.