Wednesday, October 27, 2021

On Haskell layout rule

Layout rule in Haskell allows programmers to omit writing braces and semicolons in let, where and do by a lexer and a parser who insert them automatically. Here are some references for Haskell layout rule:

When I read through the sections, I feel that the layout rule is quite intuitive but its formal description is a bit difficult to understand. So, I have decided to try to explain the formal description by example so that I can understand it better, which is the purpose of this article.

Suppose a simple Haskell program as follows.

1:module Main where

2:

3:main = do

4:    putStr hello

5:    putStrLn world

6:  where

7:    hello = "Hello"

8: 

9:world = " World!!"


GHC will automatically translate it into one in the following. 

1:module Main where

2: 

3:{main = do

4:    {putStr hello

5:    ;putStrLn world

6:  }where

7:    {hello = "Hello"

8: 

9:};world = " World!!"

0:}


This is a very useful feature for programmers. However, the compiler writers will be more burdensome because the implementation of the layout rule is quite tricky. The layout rule demands a lexer and a parser to work together. You can never develop a standalone Haskell lexer correctly without support by any parsers. 


In this article, let me try to explain this layout rule by showing a running example of the layout translation function in the Haskell 2010 language report. This function explains the layout rule by translating one using the layout rule into another with explicit braces and semicolons (so without using the layout rule). 


L (tokens, indentations) = tokens' where

  • tokens : indentation sensitive tokens
  • indentations : nested indentation context (e.g., [5, 1])
  • tokens' : indentation insensitive tokens using explicit braces and semicolons

To recognize spaces in Haskell lexer, special tokens <n> and {n} are introduced. 
  • Firstly, {n} indicates that there is no { right after let, where, do, or of. The integer, n, is the column (indentation) of the next lexeme. It is defined as 0 when there is no lexeme and the end of file is reached. 
  • Secondly, <n> indicates that there are only white spaces in front of a lexeme in a line. The integer, n, is the column (indentation) of the lexeme. 

Note that the first column is 1, not 0. The zero as an indentation is used specially, which will be explained later.

The definition of L function in Section 10.3 is as follows. 

(Group 1)
L (<n>: ts) (m : ms) 
  = ; : (L ts (m : ms))    if m = n
  = } : (L (< n >: ts) ms) if n < m

L (<n>: ts) ms = L ts ms


(Group 2)
L ({n} : ts) (m : ms) = { : (L ts (n : m : ms))  if n > m 
L ({n} : ts) []       = { : (L ts [n])           if n > 0
L ({n} : ts) ms       = { : } : (L (<n>: ts) ms)


(Group 3)
L (} : ts) (0 : ms) = } : (L ts ms)
L (} : ts) ms       = parse-error
L ({ : ts) ms       = { : (L ts (0 : ms))


(Group 4)
L (t : ts) (m : ms) = } : (L (t : ts) ms)  if m /= 0 and parse-error(t)

L (t : ts) ms = t : (L ts ms)


(Group 5)
L [ ] [] = []
L [ ] (m : ms) = } : L [ ] ms if m /= 0

For readability, I suggest to group similar rules as above. 

With the Haskell example, the L function will be explained. We start with

L (module : Main : where : ..., []) 
 = module : L (Main : where :..., [])
 = module : Main : L (where :..., [])
 = module : Main : where : L (..., [])

By 2nd rule of Group 2 since there is no { after where and before main and also main is in the first column i.e., n=1, 

 = module : Main : where : L ({1} : main : = : do : ..., [])
 = module : Main : where : { : L (main : = : do : ..., [1])
 = module : Main : where : { : main : L (= : do : ..., [1])
 = module : Main : where : { : main : = : L (do : ..., [1])
 = module : Main : where : { : main : = : do : L (..., [1])

By the same reason as above but with 1st rule of Group 2 since the indentation context [1] is not empty,

 = module : Main : where : { : main : = : do : L ({5} : putStr : hello : ... , [1])
 = module : Main : where : { : main : = : do : { : L (putStr : hello : ... , [5,1])
 = module : Main : where : { : main : = : do : { : putStr : L (hello : ... , [5,1])
 = module : Main : where : { : main : = : do : { : putStr : hello : L (... , [5,1])

This time it is <5>, not {5} since the lexeme processed just before is an identifier hello, not a keyword, such as do or where. So, we insert ; by applying 1st rule of Group 1.  

 = module : Main : where : { : main : = : do : { : putStr : hello : L (<5> : putStrLn : world : ... , [5,1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : L(putStrLn : world : ... , [5,1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : L(world : ... , [5,1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : L(... , [5,1])

Since n=3<5=m, we insert } by applying 2nd rule of Group 1. 

 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : L(<3> : where : ... , [5,1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : L(<3> : where : ... , [1])

Now the 3rd rule of Group 1 is applied to remove <3>. Note that the side conditions of the 1st and 2nd rules are not satisfied. 

 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : L(where : ... , [1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : L(where : {5} : hello : = : "world" : ... , [1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : L({5} : hello : = : "world" : ... , [1])

By applying 1st rule of Group 1, 

 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : L(hello : = : "world" : ... , [5,1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : L(= : "world" : ... , [5,1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : L("world" : ... , [5,1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : "world" : L(... , [5,1])

By applying 2nd rule of Group 1,

 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : "hello" : L(<1> : world : = : " world!!" : ... , [5,1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : "hello" : } :  L(<1> : world : = : " world!!" : ... , [1])

By applying 1st rule of Group ,

 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : "hello" : } :  ; : L(world : = : " world!!" : ... , [1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : "hello" : } :  ; : world : L(= : " world!!" : ... , [1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : "hello" : } :  ; : world : = : L(" world!!" : ... , [1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : "hello" : } :  ; : world : = : " world!!" : L(... , [1])

Now we arrive at the end of file. By applying 2nd rule of Group 5, 

 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : "hello" : } :  ; : world : = : " world!!" : L([] , [1])
 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : "hello" : } :  ; : world : = : " world!!" : } : L([] , [])

We finish by applying 1st rule of Group 5:

 = module : Main : where : { : main : = : do : { : putStr : hello : ; : putStrLn : world : } : where : { : hello : = : "hello" : } :  ; : world : = : " world!!" : } : []

This is how we've got the layout insensitive version of the program!


We have several rules that have not been used in the running example. 
  • 3rd rule of Group 2
  • Three rules of Group 3
  • 1st rule of Group 4
Wait, did we use 2nd rule of Group 4? Yes, we did where we consume lexemes that are not actually relevant to the layout rule, for example, as in:

L (module : Main : where : ..., []) 
 = module : L (Main : where :..., [])

The 3rd rule of Group 2 is used in a corner case that there is a use of where but is followed by nothing (i.e., EOF). In the case, we just use write { and } after the keyword where. 

The three rules of Group 3 handle programs that use braces explicitly. Let us start with the 3rd rule of Group 3. When there is an explicit opening brace, this brace is produced as it is. Importantly, the indentation context now has zero in the head position indicating the presence of an explicit opening brace.

Note that explicit braces written by programmers cannot be matched against implicit braces that are inserted by the layout rule. When there were such a case, a parse error would be issued. That is done by 1st and 2nd rules of Group 3.

In 1st rule of Group 3, an explicit closing brace is matched against the head indentation, which must be zero. Otherwise, in 2nd rule of Group 3, there is a parse error. 

The 1st rule of Group 4 is most interesting (and also problematic since this rule demands a lexer to be quite tightly coupled with a parser). Let us consider another Haskell example:


let x = 1 in x

which is expected to be translated into

let {x = 1 }in x


Let us start with 

L( let :  {5} : x : = : 1 : in : x : [], [])
 = let : L({5} : x : = : 1 : in : x : [], [])
 = let : { : L(x : = : 1 : in : x : [], [5])  
 = let : { : x : L(= : 1 : in : x : [], [5])  
 = let : { : x : = : L(1 : in : x : [], [5])  
 = let : { : x : = : 1 : L(in : x : [], [5])  

At this moment, the Haskell parser will give rise to a parse error because of the absence of a matching closing brace against the opening one inserted by the Haskell lexer. Also, note that m is 5, which is not zero. 

By applying 1st rule of Group 4, we can managed to overcome this parse error as:

 = let : { : x : = : 1 : } : L(in : x : [], [5])  
 = let : { : x : = : 1 : } : in : L(x : [], [])  
 = let : { : x : = : 1 : } : in : x : L([], [])  
 = let : { : x : = : 1 : } : in : x : []

This is what Haskell layout rule is! 

This exercise have helped me better understand the Haskell layout rule. I hope it will help you too if you are not familiar with the formal description of Haskell layout rule.  And, I will welcome any comments or corrections on this article. 

  







Friday, October 08, 2021

How to set up a HTTPS server using cohttp (in O'Caml)

Here is a short tutorial on how to set up a HTTPS server written in O'Caml using cohttp. 

Basically, there is a tutorial on writing a HTTP server in the cohttp package that we start with:


Step I. You prepare a certificate file and a key file using opensl as:

   $ openssl genrsa -out key.pem
   $ openssl req -new -key key.pem -out csr.pem
   $ openssl x509 -req -days 9999 -in csr.pem -signkey key.pem -out cert.pem
   $ rm csr.pem

Step II. You write a server using cohttp with HTTPS mode. 

As a mode for creating a server, 
  • `TCP (`Port 8000) for HTTP
  • `TLS (`Crt_file_path "cert.pem", `Key_file_path "key.pem", `No_password, `Port 8000) for HTTPS
where the port number, file names, and password can be changed for your own purpose. 

$ vi server.ml

#use "topfind";;
#require "lwt";;
#load "unix.cma";;
#load "threads.cma";;
#require "cohttp-lwt-unix";;

open Lwt
open Cohttp
open Cohttp_lwt_unix

let server =
  let callback _conn req body =
    let uri = req |> Request.uri |> Uri.to_string in
    let meth = req |> Request.meth |> Code.string_of_method in
    let headers = req |> Request.headers |> Header.to_string in
    ( body |> Cohttp_lwt.Body.to_string >|= fun body ->
      Printf.sprintf "Uri: %s\nMethod: %s\nHeaders\nHeaders: %s\nBody: %s" uri
        meth headers body )
    >>= fun body -> Server.respond_string ~status:`OK ~body ()
  in
  (* [HTTP] *)
  (*   Lwt_main.run ( Server.create ~mode:(`TCP (`Port 8000)) (Server.make ~callback ())) *)

  (* [HTTPS] *)
  Lwt_main.run ( Server.create ~mode:(`TLS (`Crt_file_path "cert.pem", `Key_file_path "key.pem", `No_password, `Port 8000)) (Server.make ~callback ()))

Step III. Run

$ ocaml -I +threads server.ml

In case you meet errors such as undefined things, you may need to install some packages as:

 $ opam install cohttp-lwt-unix cohttp-async tls lwt




Friday, June 11, 2021

Modular Happy Packages

I am so happy to find modular Happy packages. 


Happy is a parser generator system for Haskell, similar to the tool Yacc for C, OCamlYacc for OCaml, and so on. It has been used for generating a GHC's Haskell parser. 


Since its birth, it has been a monolithic architecture. Now it is time to make it modular so that Happy can be used more than just GHC's Haskell parser. 

- [PR] Modularize happy #191

- piknotech/happy{modularization branch}


It consists of happy-{frontend,middleend,backend,core,test} packages. Happy-frontend is for reading .y files and parsing them into the Grammar datatype. Happy-middleend applies LR/GLR parsing algorithms to the grammar to have action tables and goto tables. Happy-backend generates template-based Haskell code using these tables. Happy-core defines common interfaces among these packages, and Happy-test is about a testing framework. It is said to have the same CLI as the original monolithic Happy so that the transition from the existing one to this one can be as smooth as possible. Isn't it great?


This is how to install this modular Happy packages in modular_happy directory.

$ mkdir modular_happy; cd modular_happy

$ git init

$ git remote add origin https://github.com/piknotech/happy

$ git pull origin modularization 

(You don't have to build it for now.)


You can test these packages with GHC's Parser:

- Parser.y


Just to read this Parser.y and to print it in Grammar datatype, this simple program is enough.

$ stack new example; cd example       (Assume example and modular_happy are in the same level)

(Include the following in package.yaml)

dependencies:

- base >= 4.7 && < 5

- happy-frontend >= 1.21.0

- happy-middleend >= 1.21.0

- happy-backend >= 1.21.0

- happy-core >= 1.21.0

(Include the following in stack.yaml)
packages:
- .
- ../modular_happy/packages/frontend
- ../modular_happy/packages/middleend
- ../modular_happy/packages/backend
- ../modular_happy/packages/core

(Edit app/Main.hs as this)
module Main where

import Happy.Frontend.CLI
import Happy.Core.Grammar

main :: IO ()
main = do
  either <- parseAndRun [] "Parser.y" "Parser"
  case either of
    Left error -> putStrLn error
    Right grammar -> putStrLn $ show $ grammar
    
$ stack build

(Download the Parser.y into the top-level directory of stack project)

$ stack exec example-exe

(It will be able to read Parser.y and to print it.)

Yeah!!

Note: If you have a build error, it might be caused by the absence of alex and happy in your system.

 - sudo apt-get install alex happy


(Special thanks to the author of the modular Happy packages, David Knothe.)

On 25 March 2023, David Knothe's proposal seems to be gradually accepted to Happy as:

Saturday, May 15, 2021

GHCUP for managing GHC versions

GHCUP for managing GHC versions 

https://qfpl.io/posts/multiple-ghcs-ghcup/


When I develop a Haskell project, I use Stack. But some Haskell projects that I am interested in rely on Cabal (BTW, I am still confused by cabal-install, which is a Haskell package name, or by cabal, which is a command-line. Refer to [1]). 


Alex and Happy are such projects. For those who do not know what they are, Alex is a lexical analyzer tool, and Happy is a syntax analyzer tool. Both of them have their own specification languages and after some processing, they produce a Haskell program that do the analysis. 


For some reason, I wanted to build these two projects from source. This required me to install GHC (including cabal). If I use Stack, GHC is automatically downloaded when a project is built just by a command-line, stack build. If I use Cabal, it doesn't seem to be so. I need to install GHC by myself. This is where GHCUP helps me very much. 


Build systems, package installers, and package managers seem to be a huge obstacle to Haskell beginners such as myself. 


[1] Repeat after me: "Cabal is not a Package Manager"

   : https://ivanmiljenovic.wordpress.com/2010/03/15/repeat-after-me-cabal-is-not-a-package-manager/


Tuesday, January 19, 2021

How to build GHCJS

This is a story of buidling GHCJS in Ubuntu 20.04. The GHC version that GHCJS depends on is 8.6.x. 

 -  https://github.com/ghcjs/ghcjs


Preliminaries: 

 - Download GHC8.6.x binary and Cabal-3.2.0.0 binary in a path accessible by PATH

    : https://www.haskell.org/ghc/download.html

    : https://www.haskell.org/cabal/download.html

 - Install alex and happy by sudo apt-get install alex happy

(According to the Happy web site, the source of happy should be installed by git clone and cabal install happy but it failed due to a reason that I do not know. It seems that building the happy source needs a happy binary. :) If you have any hint, please leave me a message.)


(During the installation of ghc-8-6-x, you may need to install libtinfo.so.5, which is available by sudo apt-get install libtinfo.)


Now you follow the GHCJS installation instruction


1. Getting and preparing the source tree

 - git clone --branch ghc-8.6 https://github.com/ghcjs/ghcjs.git

 - cd ghcjs

 - git submodule update --init --recursive

 - ./utils/makePackages.sh

(You may need to install autoconf by sudo apt-get install autoconf before you run the shell script.)


2. Building the compiler

 - cabal new-configure (When cabal new-configure was failed, you may try cabal configure, but this may be due to some cabal version problem.)

 - cabal new-build  (When cabal new-build was failed, you may try cabal build, but this may be also due to the version problem.)

 (You may need sudo apt-get install libtinfo-dev when the build fails due to the absence of libtinfo.)


It will take long time...


 - cd $YOURBIN

  (I assume that the path $YOURBIN is accessible by $PATH.)


 - ln -s $GHCJSHOME/utils/dist-newstyle-wrapper.sh ghcjs

 - ln -s $GHCJSHOME/utils/dist-newstyle-wrapper.sh ghcjs-pkg

 - ln -s $GHCJSHOME/utils/dist-newstyle-wrapper.sh haddock-ghcjs

 - ln -s $GHCJSHOME/utils/dist-newstyle-wrapper.sh hsc2hs-ghcjs

 - ln -s $GHCJSHOME/utils/dist-newstyle-wrapper.sh ghcjs-boot

 - ln -s $GHCJSHOME/utils/dist-newstyle-wrapper.sh ghcjs-run

 (There is no instruction about ghcjs-run but I attempted to add it after I found a ghcj-run-not-found error on ghcjs-boot in the next stage.)


3. Booting GHCJS


 (You may need node by sudo apt-get install nodejs, and npm by sudo apt-get install npm.)

 - ghcjs-boot --no-haddoc -no-prof -s ./lib/boot

 (In the specified directory, boot.yaml exists. Without -s option, it will get stuck with a boot.yaml-not-found error. The default instruction was without the option. I am wondering when the default one would work. )

(--no-haddoc is given as an option not to get an error, "Haddock's resource directory does not exist!")


It will take long time again...


(Umm... Sorry I switched to Miso who is a layer on top of GHCJS. You don't have to get tangled with the installation of GHCJS any more.)