1. What’s literate programming?
2. Why another preprocessor?
Because no literate programming preprocessor allows any document language and any programming language. I, for one, like to embed fragments of OCaml in asciidoc files.
Here are the requirements and short rationals:
R1. Must be able to build itself
That’s the fun part.
R2. Must be bootstrapable with funnelweb
Because funnelweb is quite usable. Especially, it is agnostic with regard to the programming language and almost so regarding the documentation language.
It follows from these requirements that we should either follow funnelweb syntax, which is ugly, or build a tool that’s flexible enough to act like funnelweb (or at least, that can understand a workable subset of funnelweb syntax).
Ideally, all escaping sequences of the macro system can be redefined. When bootstrapping (with the actual funnelweb) we do not mind the quality of the generated documentation since we can produce a better documentation (and source code, thanks to requirement R3 & R4) from recompiling with the bootstrapped processor.
R3. Add correct file and line number informations into generated source code
Whatever the programming language that’s being used of course.
R4. Do not output doc; rather, make code extraction flexible enough for the doc to be written in any documentation language in the first place
The documentation being the important part, do not interfere with it.
R5. Code blocks may be inline
Like short mathematical formulas are better inline.
R6. Must not require to have all the text, nor all the code, present in memory
Funnelweb builds a whole source code representation in memory before outputing anything. This is frightening. Despite I’ve never used literate style for anything but trivial program I believe the technique suits huge programs just as well.
R7. Can split content into several files
Quite obvious.
R8. Named code fragments may be defined from several places (even in different files)
In literate programming, the human reader must remember the main names used by the program as well as the names used to reference important code fragments. In order to limit the number of unessential definitions, Funnelweb allows to build a macro definition incrementally so that you must not introduce temporary names just to insert commentary in between two related code fragments.
R9. Support an include directive
Since we do not care about the order in which the code fragments will be encountered, and we do not care neither in what order we scan the documentation (since we do not produce a documentation according to R4), then we can merely be given a list of files to scan from the command line. We do not need an include directive as funnelweb (and other) does.
Still, many documentation language has an include directive and if we were able to follow it then we could alleviate the user from the need to maintain this list of files (since we could find everything from the root document).
So we do both: we will scan everything from the command line and additionally, if we are taught how to spot an include directive, then we will try to follow it.
3. Overview
For generating source code (remember R4: we do not have to generate the doc), we need a few directives:
-
define and name a block of code (see R5)
-
reference a block of code by name (from another block of code or from the literate text)
-
define and output to a file a block of code
-
add a file to the list of processed files (see R9)
We are going to use the OCaml language because it’s compendious yet fast.
The program basically reads its configuration file (or load its compiled configuration file), then proceed with reading the given file, building a dictionary of block definitions and loading additional files along the way, keeping track of code blocks definitions and modification time of the used files, and checking everything.
Then it outputs the code, rereading the files to fetch code blocks so that we do not have to hold in memory a quantity of information equivalent of the resulting source code.
So the basic skeleton, given a set of configuration files plugins
and a list
of source files to proceed srcfiles
, looks like:
@$@<Skeleton@>==@{ List.iter (load_lib !libdir) !plugins ; List.iter PortiaParse.parse !srcfiles ; Output.all () @}
Notice that configuration does not appear here, nor does harvested definitions.
They lies in global variables, which suits this short lived program just fine.
We won’t worry about configuration parameters yet. Suffice to say that all of
these global parameters, regular expressions and functions (remember some
filters may be functions) are references defined in a module unambiguously
named Config
.
By loading a conf, we merely want to load a compiled .cmo file:
@$@<ConfigLoad@>==@{@- let load_lib libdir fname = let libname = libdir ^"/"^ fname ^".cmo" in PortiaLog.debug "loading lib %s\n" libname ; Dynlink.(loadfile (adapt_filename libname)) @}
This is enough for a user to choose between several plugins (funnelweb, …).
So we only need this entry point to parse command line arguments and we are done with the boring work:
@$@<EntryPoint@>==@{@- let main = let plugins = ref [] in let libdir = ref PkgConfig.plugindir in let srcfiles = ref [] in let addlst l s = l := s :: !l in Arg.(parse [ "-syntax", String (addlst plugins), "Name of the plugin to use for parsing files \ (default to funnelweb)" ; "-libdir", Set_string libdir, "Where to read plugins from" ; "-ignore-missing", Set PortiaDefinition.ignore_missing, "Referenced but never defined blocks are not \ an error" ; "-debug", Set PortiaLog.verbose, "Output debug messages" ] (addlst srcfiles) "portia - literate programming preprocessor\n\ \n\ portia [options] files...\n\ Will output source code from given files.\n") ; if !plugins = [] then addlst plugins "funnelweb" ; (* default syntax *) @<Skeleton@> @}
So we have the Main module (linked last):
@O@<main.ml@>==@{@-
open Batteries
@<ConfigLoad@>
@<EntryPoint@>
@}
Let’s focus now on our main data type, the code block definition.
4. Code block definitions
It is well known that naming things is the most difficult part in programming. This has to do with the fact that a large part of programming consists in inventing many simple but abstract concepts with no counterpart in actual life or pre-existing abstractions from other intellectual fields. Sometime we can borrow a term or two from mathematics but mostly we have to give names to things that are relevant only to our program, maybe not even for its full lifetime.
So we have to repeatedly look for designations close enough to what we actually mean that, with enough context, it will become clear what concept they refer to. Of course, this context is not easy to acquire without prior knowledge of these somewhat arbitrary definitions, so it takes some time and effort to pull oneself out of this catch-22 situation.
So, after this short introduction have hopefully inclined my reader to leniency, let’s ask ourself what the name of a program fragment should be. Shall we keep calling it a "program fragment"? But this implies that this is part of a program, which is not required. Should it be an "extract", considered this fragment is separated from the rest for display purpose? Or a "phrase", considered it’s part of a larger "discourse" (the program)?
Or, if we forget what we manipulate to consider how we manipulate it, should it be called a "macro body"? Merely a "macro" or "body"? Or a "definition"? This point of view is seducing since it makes our program a more general purpose text processor rather than a specialized tool for literate programming (of course a type or variable name is not really part of the running program and so cannot alter its behavior in any way, but I believe in the power of names to influence our reasoning about abstractions and that giving generic names help building generic programs).
Let’s call these code fragments "definitions", then.
What information is there in a definition? We have already seen that we need its location used both for error reporting (file name, line number and column) and for fetching it quickly on demand (offset and size in bytes).
Also note the recording of the mtime so that we can tell, when fetching the body, that the file have not changed since we collected this definition.
@$@<Location@>==@{@- type location = { file : string ; mtime : float ; lineno : int ; colno : int ; offset : int ; size : int }@}
with the convenient convention that line and column numbers are actually offset from the start, thus start at 0, which allows us to add them naturally.
Rather than having a single location, we want to allow for a definition to be split across many locations (the body of the definition is then the concatenation of all fragment in order of appearance - which is specified by the depth first exploration of included files, the ordering of files in command line and finally the order we met definitions in a given file)
-
an identifier (unique for the file its located in) which can be any string:
@$@<DefName@>==@{type id = string@}
-
a flag to tell us if this definition is supposed to be output in a file, with two consequences:
-
of course, the expanded body of this definition will be written into a file (which name will be the identifier);
-
and you are not allowed to refer to this definition from another one.
-
The user should be warned about any code fragment that is not, directly or indirectly, referenced from an output definition.
@$@<DefinitionType@>==@{@- @<Location@> @<DefName@> type t = { locs : location list ; (* reverse order *) id : id ; output : bool } @}
Notice that it’s possible that there is no location at all (empty list),
meaning the definition was missing (might be conveniently allowed with
ignore_missing
flag, to make it possible to generate a valid program and
write the extensions later on).
It’s always a good idea to write proper printers for any new type. This may looks fastidious but you are actually doing yourself a favor: better have these printers ready before they are needed than to have to write them quickly while struggling with a bug. Especially when using Batteries which make writing and using such printers so easy.
So here they are:
@$@<Definitions@>+=@{@- let location_print fmt loc = Printf.fprintf fmt "%s:%d.%d-%d" loc.file loc.lineno loc.colno (loc.colno+loc.size) let mtime_print = Float.print (* TODO: user friendly date&time? *) let rec locations_print fmt = function | [] -> Printf.fprintf fmt "undefined" | [loc] -> location_print fmt loc | _loc::locs' -> locations_print fmt locs' (* print only the first location *) let print fmt t = Printf.fprintf fmt "%s@%a" t.id locations_print t.locs @}
Now obviously we also want to fetch a definition body from its file (checking mtime):
@$@<Definitions@>+=@{@- exception FileChanged of string let fetch_loc loc = let open Unix in let fname = loc.file in if (stat fname).st_mtime > loc.mtime then raise (FileChanged fname) ; read_file fname loc.offset loc.size @}
Then, we will need a way to add definitions to a global registry, and the associated lookup function. Definitions are created from a file, offset and length (line number and column number are not given and will be computed when registering, so that plugins author work is limited to the minimum) and of course the identifier for the definition.
@$@<Definitions@>+=@{@- let ignore_missing = ref false let registry = Hashtbl.create 31 let add id output fname off sz = let loc = location_in_file fname off sz in PortiaLog.debug "Add definition for %s at position %a\n" id location_print loc ; Hashtbl.modify_opt id (function | None -> Some { id ; output ; locs = [loc] } | Some t -> Some { t with locs = loc :: t.locs }) registry let lookup id = try Hashtbl.find registry id with Not_found -> if !ignore_missing then { id ; output = false ; locs = [] } else ( Printf.fprintf stderr "Cannot find definition for '%s'\n" id ; exit 1 ) @}
Where location_in_file is responsible to return a correct location (up to proper line and column numbers) from the file name, offset and size:
@$@<LocationInFile@>==@{@- let location_in_file file offset size = let mtime = Unix.((stat file).st_mtime) in let txt = read_file file 0 offset in let colno = colno_at txt and lineno = lineno_at offset txt in { file ; offset ; size ; mtime ; lineno ; colno } @}
Now the last part: expansion. Given a function PortiaConfig.find_references (supplied by the configuration) that’s able to spot all expansion points from a non expanded body, and the registry of all known definitions, let’s build a function that will return the complete expanded body (or signal a problem).
@$@<Definitions@>+=@{@- let line_start txt offset = let rec loop c = if c < 1 || txt.[c-1] = '\n' then c else loop (c-1) in let start_pos = loop offset in String.sub txt start_pos (offset - start_pos) (*$= line_start & ~printer:identity (line_start "glop" 0) "" (line_start "glop" 2) "gl" (line_start "glop\npas glop\n" 4) "glop" (line_start "glop\npas glop\n" 5) "" (line_start "glop\npas glop\n" 7) "pa" *) let indent_at unexpanded start = line_start unexpanded start |> String.fold_left (fun (need_nl, len) c -> (if Char.is_whitespace c then need_nl else true), len+1) (false, 0) (*$= indent_at & ~printer:dump (indent_at "glop" 0) (false, 0) (indent_at "glop" 2) (true, 2) (indent_at "glop\npas glop\n" 4) (true, 4) (indent_at "glop\npas glop\n" 5) (false, 0) (indent_at "glop\npas glop\n" 7) (true, 2) *) let rec expanded_loc tab loc = let unexpanded = fetch_loc loc in PortiaLog.debug "expand '%s'\n" unexpanded ; (* Start with line number information. *) let txt = !PortiaConfig.linenum loc.lineno loc.file in (* find_references returns a list of (id, start_offset, stop_offset) *) let refs = !PortiaConfig.find_references unexpanded |> List.sort (fun (_,o1,_) (_,o2,_) -> compare o1 o2) in PortiaLog.debug "found references: %a\n" (List.print (Tuple3.print String.print Int.print Int.print)) refs ; let txt, last_stop = List.fold_left (fun (txt,last_stop) (id,start,stop) -> assert (start >= last_stop) ; let need_new_line, tab' = indent_at unexpanded start in (* If we do not need a new_line it's because we have only blanks before the expansion, and we do not want to copy those because every body must start at column 0 *) let start = if need_new_line then start else start - tab' in if not need_new_line then ( PortiaLog.debug "no new line needed, tab=%d, tab'=%d\n" tab tab' ) ; let txt = txt ^ indent tab (String.sub unexpanded last_stop (start - last_stop)) in if not need_new_line then ( PortiaLog.debug "appended '%s'\n" (String.sub unexpanded last_stop (start - last_stop)) ); let txt = if need_new_line then txt ^ "\n" else txt in let t' = lookup id in let body = expanded_body (tab+tab') t' in let txt = txt ^ body in (* add a linenum indication that we are back in this block *) let ln = !PortiaConfig.linenum (loc.lineno + (lineno_at stop unexpanded)) loc.file in let txt = txt ^ (if String.length ln > 0 && String.length txt > 0 && txt.[String.length txt - 1] != '\n' then "\n" else "") ^ ln in txt, stop) (txt, 0) refs in (* Complete with what's left *) let rest = String.length unexpanded - last_stop in txt ^ indent tab (String.sub unexpanded last_stop rest) and expanded_body tab t = List.rev t.locs |> List.map (expanded_loc tab) |> String.concat "" @}
Where each substituted definition is properly indented according to its insertion point. We must now complete this module with the functions we used up to here for helping dealing with text files and locations:
@$@<TxtHelpers@>==@{@- let indent = let open Str in let re = regexp "\n\\([^\n]\\)" in fun tab str -> let spaces = String.make tab ' ' in spaces ^ global_replace re ("\n"^ spaces ^"\\1") str (*$= indent & ~printer:identity "glop" (indent 0 "glop") " glop" (indent 2 "glop") *) (* first char is at column 0 *) let colno_at txt = let rec aux colno p = if p = 0 || txt.[p-1] = '\n' then colno else aux (colno+1) (p-1) in aux 0 (String.length txt) (* first line is 0 *) let lineno_at pos txt = let rec aux p n = if p >= pos then n else aux (p+1) (if txt.[p] = '\n' then n+1 else n) in aux 0 0 let read_file fname offset size = let open Unix in let fd = openfile fname [O_RDONLY] 0 in lseek fd offset SEEK_SET |> ignore ; let str = String.create size in let rec read_chunk prev = if prev < size then let act_sz = read fd str prev (size-prev) in read_chunk (prev + act_sz) in read_chunk 0 ; close fd ; str @}
Regarding linenum, this function depends on the programming language used. The default implementation from Config will not output linenum directives. But dedicated plugins are easy to write. First, for ocaml:
@O@<ocaml.ml@>==@{@- let linenum lineno fname = Printf.sprintf "# %d \"%s\"\n" (lineno+1) fname let () = PortiaConfig.linenum := linenum @}
and for C:
@O@<c.ml@>==@{@- let linenum lineno fname = Printf.sprintf "#line %d \"%s\"\n" (lineno+1) fname let () = PortiaConfig.linenum := linenum @}
Notice that those directives follow the GNU convention that:
Line numbers should start from 1 at the beginning of the file, and column
numbers should start from 1 at the beginning of the line.
One more word about the linenum directive in OCaml. It is documented in the chapter 6.1 (lexical conventions) of the OCaml manual, and from this documentation it appears that it is not constrained to appear alone on a line. We do make some effort to place these directives on dedicated lines, in order to generate better looking source files.
Also, notice that we must insert a linenum directive at the insertion point of each definition body and after each expansion to return to previous location.
With these functions we are now ready to start the real job of parsing input files(s) and writing output definitions.
@O@<portiaDefinition.ml@>==@{@- open Batteries @<DefinitionType@> @<TxtHelpers@> @<LocationInFile@> @<Definitions@> @}
5. Parsing
Parsing is a pretentious appellation, since we merely need to spot three things in the input files:
-
optional include command (with its filename parameter) to instruct us how to gather other file names to inspect;
-
code definitions;
-
in the body of a definition, references to other definitions.
For now we do not want to impose any format to these marks so in all generality we are going to read in memory a whole file and ask a configuration provided function to return the list of additional files to scan and the list of definitions that can be found in the file content.
So "parsing" is just:
@$@<Parsing@>==@{@- let read_whole_file file = let ic = Unix.(openfile file [O_RDONLY] 0 |> input_of_descr) in IO.read_all ic (* autoclosed *) let rec parse file = PortiaLog.debug "Parsing file %s\n" file ; let txt = read_whole_file file in !PortiaConfig.find_definitions txt |> List.iter (fun (id, output, start, stop) -> PortiaDefinition.add id output file start (stop-start)) ; !PortiaConfig.find_inclusions txt |> List.iter parse @}
That we can group, with some helper functions to be defined later, in a parse module:
@O@<portiaParse.ml@>==@{@- open Batteries @<Parsing@> @<ParsingHelpers@> @}
Also, we want to be able to attach several code fragments to the same name (see R8), with the actual expansion being composed of the concatenation of these fragments. To handle this, we will merely register several definitions with the same name, and when writing the output of a given definition we will append all bodies in order of appearance.
5.1. FunnelWeb
Now of course the real difficulty lies in the find_definitions
and
find_inclusions
functions, which by default could be the one we need to
bootstrap (ie. funnelweb compatible).
So let’s implement at first the simpler of both. For inclusion, funnelweb uses
a very straightforward syntax: a line consisting only of @i somefilename
.
This simple regular expressions will easily collect all such commands for us:
@$@<RegexForInclusion@>==@{"^@i +\\(.+\\) *$"@}
Which leads to this find_inclusions
function:
@$@<FW_FindInclusions@>==@{@- let find_inclusions = let re = Str.regexp @<RegexForInclusion@> in fold_all_groups (fun l p -> match l with | [Some (f, _, _)] -> f::p | _ -> assert false) [] re @}
With the almighty fold_all_groups
, folding over all groups matched in a given
string:
@$@<ParsingHelpers@>==@{@- let fold_all_groups f p re str = let open Str in let rec aux p o = try search_forward re str o |> ignore ; let rec fetch_grps n groups = try let g = try Some (matched_group n str, group_beginning n, group_end n) with Not_found -> None in fetch_grps (n+1) (g::groups) with Invalid_argument _ -> List.rev groups in let groups = fetch_grps 1 [] in aux (f groups p) (Str.match_end ()) with Not_found -> p in aux p 0 |> List.rev @}
Regarding code definitions, the regular expression is more complex but can still handle the job. We have to take greater care here since code blocks typically spans several lines and regular expressions are greedy. We handle this by forbidding the ending marker (@ followed by }) from the definition; hopefully this marker is both improbable and short.
We end up with this regular expression:
@$@<RegexForDefinition@>==@{@- "^@\\(\\$\\|O\\)@<\\([^@]+\\)@>\\(==\\|\\+=\\)@{\ \\(@-\n\\)?\\(\\([^@]\\|@[^}]\\)*\\)@}"@}
Here we met another difficulty: we must be able to write strings and regular expressions that describes funnelweb special commands without triggering funnelweb (nor portia in funnelweb mode) to interpret them as actual commands! In other words we must write a regular expression that does not match itself. The easy trick is to split the regular expression into several lines right in the middle of problematic token sequences.
With the corresponding find_definitions
:
@$@<FW_FindDefinitions@>==@{@- let find_definitions = let re = Str.regexp @<RegexForDefinition@> in fold_all_groups (fun l p -> PortiaLog.debug "found def: %a\n" (List.print (Option.print (Tuple3.print String.print Int.print Int.print))) l ; match l with | [Some (c, _, _); Some (id, _, _); _; _; Some (_, start, stop); _] -> (id, c = "O", start, stop) :: p | _ -> assert false) [] re @}
Now to finish with our regular expressions, we must be able to spot references to other definitions from within definition bodies. Funnelweb uses a straightforward syntax for that, again relying on the unlikelihood of the (short) sequence of @ followed by < or >:
@$@<RegexForReference@>==@{@- "\\(@<\\([^@]+\\)@>\\)"@}
With the corresponding find_references
(identical to find_inclusions
but
with another regular expression):
@$@<FW_FindReferences@>==@{@- let find_references = let re = Str.regexp @<RegexForReference@> in fold_all_groups (fun l p -> match l with | [Some (_, start, stop); Some (id, _, _)] -> (id, start, stop)::p | _ -> assert false) [] re @}
This function will be used later when untangling code fragments into output files.
Last and least, funnelweb (and probably other literate programming preprocessors as well) uses an escape character that can be used to include its control character (@) in the source code. Thus, before outputting the code we must run a final scan to unquote all these characters, especially since we have made a heavy use of this quoting mechanism in this document:
@$@<FW_Postprocess@>==@{@- let postprocess str = String.nreplace ~str ~sub:"@@" ~by:"@" @}
Of course, all these regular expressions and substring replacement do not add up to a proper parser for funnelweb syntax, which is much richer than that. It’s enough, though, to bootstrap Portia source code, so we will leave this funnelweb module here and return to the more interesting topic of generating output files.
@O@<funnelweb.ml@>==@{@- open Batteries open PortiaParse @<FW_FindInclusions@> @<FW_FindDefinitions@> @<FW_FindReferences@> @<FW_Postprocess@> let () = PortiaConfig.find_definitions := find_definitions ; PortiaConfig.find_references := find_references ; PortiaConfig.find_inclusions := find_inclusions ; PortiaConfig.postprocess := postprocess @}
6. Output
Once all definitions have been gathered we can iterate over all of those which must be written into a file, retrieve their (expanded) body then write it into that file. We will not directly overwrite the destination file, though, rather create a temporary file and replace the older file only if the new one is different. We do this to avoid unnecessary touching files, thus triggering whole rebuilds, each time a single compilation unit is effectively modified.
@$@<Output@>==@{@- open PortiaDefinition let read_file filename = (BatFile.lines_of filename |> List.of_enum |> String.concat "\n") ^ "\n" (* output a given definition *) let definition filename def = if def.output then ( PortiaLog.debug "Generating %s...\n%!" filename ; let text = expanded_body 0 def |> !PortiaConfig.postprocess in let content_is_new = match read_file filename with | exception _ -> true | old_text -> old_text <> text in if content_is_new then ( PortiaLog.debug "Writing output file %s\n" filename ; output_file ~filename ~text ) else PortiaLog.debug "Skipping same file %s\n" filename ) else ( PortiaLog.debug "No output file for %s\n%!" filename ) (* output all registered definitions *) let all () = Hashtbl.iter definition registry @}
And that’s all we need in our Output module:
@O@<output.ml@>==@{@- open Batteries @<Output@> @}
7. Configuration
We have seen so far only five parameters taken from the configuration, the first three being references to functions taking a file content as a string and returning substrings of interest:
-
find_definitions
, that spots new definitions -
find_references
, that spots references to definitions in definition bodies -
find_inclusions
, that spots declarations of other files to parse
and the others being simpler function to post-process or beautify the output:
-
postprocess
, that perform whatever modification is required on the expanded code -
linenum
, a function to output line number indications for the compiler
So that our Config module thus far is merely:
@O@<portiaConfig.ml@>==@{@- let find_definitions = ref ((fun _txt -> []) : string -> (string * bool * int * int) list) let find_references = ref ((fun _txt -> []) : string -> (string * int * int) list) let find_inclusions = ref ((fun _txt -> []) : string -> string list) let postprocess = ref ((fun txt -> txt) : string -> string) let linenum = ref ((fun _n _f -> "") : int -> string -> string) @}
Notice that separate compilation of this module imposes that we have to declare the types of these references.
Remember form the Main module that we will load by default the funnelweb
plugin, so when running portia without option it will behave (loosely) like
funnelweb. This plugins will not implement linenum
, though, so no line
number directives will be outputted. It would be nice if by default the
linenum
function was relying on output file name to choose from a set of
predefined implementations, though.
8. Misc
We have glossed over many trivial details to get there, but the program would not be complete without those.
8.1. Log
For such a simple tool, we merely want to display debug messages or nothing at
all, so the only implemented function is PortiaLog.debug
and, depending on
flag debug
is either print on stderr or does nothing:
@O@<portiaLog.ml@>==@{@- open Batteries let verbose = ref false let debug fmt = if !verbose then Printf.fprintf stderr fmt else Printf.ifprintf stderr fmt @}
8.2. Asciidoc
Last but not least, let’s provide the configuration (in the form of the extraction functions) for asciidoc documents, both as an example and because that’s the documentation format I intend to use in the future:
@O@<asciidoc.ml@>==@{@- open Batteries open PortiaParse let ext_of_lang = function | "shell" | "bash" | "csh" -> "sh" | "autoconf" -> "m4" | "docbook" -> "xml" | x -> x let find_code_definitions = let re = Str.regexp ("^\\.\\([^:\n]+\\)\\(:[^\n]*\\)?\n" ^ "\\[source,\\([^]]+\\)\\]\n" ^ "----\n" ^ "\\(\\(\\([^-\n].*\\)?\n\\)+\\)" ^ "----\n") in fold_all_groups (fun l p -> match l with | Some (id,_,_)::_::Some (lang,_,_)::Some (_def, start, stop)::_ -> let is_file = String.ends_with id ("." ^ ext_of_lang lang) in (id, is_file, start, stop) :: p | _ -> assert false) [] re (*$= find_code_definitions & ~printer:dump [ "Foo.ml", true, 30, 37 ] \ (find_code_definitions \ ".Foo.ml: bar\n\\ [source,ml]\n\\ ----\n\\ glop.\n\\ \n\\ ----\n\\ I'm out!\n\\ ----\n") [ "Foo", false, 27, 38 ] \ (find_code_definitions \ ".Foo: bar\n\\ [source,ml]\n\\ ----\n\\ pas glop.\n\\ \n\\ ----\n") *) let find_file_content = let re = Str.regexp ("^\\.Content of \\([^\n]*\\)\n" ^ "\\[source,[^\n]*\\]\n" ^ "----\n" ^ "\\(\\(\\([^-].*\\)?\n\\)+\\)" ^ "----\n") in fold_all_groups (fun l p -> match l with | Some (id,_,_)::Some (_def, start, stop)::_ -> (id, true, start, stop)::p | _ -> assert false) [] re let find_definitions str = find_code_definitions str @ find_file_content str let find_references = let re = Str.regexp "\\((\\* \\.\\.\\.\\([^\n\\*]+\\)\\.\\.\\. \\*)\\)" in fold_all_groups (fun l p -> match l with | [ Some (_, start, stop); Some (id, _, _) ] -> (id, start, stop)::p | _ -> assert false) [] re (* Note that we have to break up the comment mark in order not to confuse qtest. *) (*$= find_references & ~printer:dump (find_references ("xx ("^"* ...Inventory.Make functor... *"^") yy")) \ [ "Inventory.Make functor", 3, 37 ] *) let find_inclusions = let re = Str.regexp "^include::\\([^\\[\n]+\\)\\[\\([^\\[\n]*\\)\\]\n" in fold_all_groups (fun l p -> match l with | [ Some (f, _, _); _ ] -> f::p | _ -> p) [] re let () = PortiaConfig.find_definitions := find_definitions ; PortiaConfig.find_references := find_references ; PortiaConfig.find_inclusions := find_inclusions ; @}
Notice here find_file_content
which allows to specify in the documentation
some content to be copied verbatim into a file. This is handy to generate files
out of band for testing fixture for instance.
The small function ext_of_lang
tries to match languages names that are
understood by asciidoc (actually, by GNU source-highlight
, its default source
code highlighter) back to file extension. Most of the time, though, asciidoc
also understand the file extension itself so we assume that’s what’s specified
in the source
command to make this function shorter.
Notice that for find_references
we use a format that conveniently (for OCaml
programmers) looks like OCaml comments. Since those expansion points will be
completely replaced by their definition it does not really matter and those
would of course work just as well regardless of the language used around. One
may prefer to use comments for that language in order not to confuse the
documentation generator when syntax-highlighting the code bloc. Therefore, it
would be nice to allow for more comment style here, or to pick the proper one
from another configuration file…
9. TODO
A mode in which portia just output Makefile compliant dependencies.
Indenting the generated source code makes column positions reported by compilers (and other tools such as annot for OCaml) different from the one in the source document, which somewhat defeats the line number information feature. Therefore it should be possible to disable indentation altogether for more accurate position reporting.
Add a warning at the beginning of generated files that they are automatically generated and should not be edited manually.
Do not output linenum for shell because they mess with the shebang.
tests/*.expected
is another case of annoying linenum. Implement per
language linenum
as suggested at the end of config.fw
. In other words,
find_definitions
should return the language (the file extension is good
enough).