3.1. Files

3.1.1. Modules

Every Lean file defines a module. A module's name is derived from a combination of its filename and the way in which Lean was invoked: Lean has a root directory in which it expects to find code, and the module's name is the names of the directories from the root to the filename, with dots (.) interspersed and .lean removed. For example, if Lean is invoked with Projects/MyLib/src as its root, the file Projects/MyLib/src/Literature/Novel/SciFi.lean would contain a module named Literature.Novel.SciFi.

Describe case sensitivity/preservation for filenames here

3.1.1.1. Encoding and Representation

Lean modules are Unicode text files encoded in UTF-8. Figure out the status of BOM and Lean Lines may end either with newline characters ("\n", Unicode 'LINE FEED (LF)' (U+000A)) or with a form feed and newline sequence ("\r\n", Unicode 'CARRIAGE RETURN (CR)' (U+000D) followed by 'LINE FEED (LF)' (U+000A)). However, Lean normalizes line endings when parsing or comparing files, so all files are compared as if all their line endings are "\n".

Marginal note: this is to make cached files and #guard_msgs and the like work even when git changes line endings. Also keeps offsets stored in parsed syntax objects consistent.

3.1.1.2. Concrete Syntax

Lean's concrete syntax is extensible. In a language like Lean, it's not possible to completely describe the syntax once and for all, because libraries may define syntax in addition to new constants or datatypes. Rather than completely describing the language here, the overall framework is described, while the syntax of each language construct is documented in the section to which it belongs.

3.1.1.2.1. Whitespace

Tokens in Lean may be separated by any number of whitespace character sequences. Whitespace may be a space (" ", Unicode 'SPACE (SP)' (U+0020)), a valid newline sequence, or a comment. xref Neither tab characters nor carriage returns not followed by newlines are valid whitespace sequences.

3.1.1.2.2. Comments

Comments are stretches of the file that, despite not being whitespace, are treated as such. Lean has two syntaxes for comments:

Line comments

A -- that does not occur as part of a token begins a line comment. All characters from the initial - to the newline are treated as whitespace.

Block comments

A /- that does not occur as part of a token and is not immediately followed by a - character begins a block comment. /-- begins a documentation string xref rather than a comment.

3.1.1.2.3. Keywords and Identifiers

An identifier consists of one or more identifier components, separated by '.'.

Identifier components consist of a letter or letterlike character or an underscore ('_'), followed by zero or more identifier continuation characters. Letters are English letters, upper- or lowercase, and the letterlike characters include a range of non-English alphabetic scripts, including the Greek script which is widely used in Lean, as well as the members of the Unicode letterlike symbol block, which contains a number of double-struck characters (including ℕ and ℤ) and abbreviations. Identifier continuation characters consist of letters, letterlike characters, underscore ('_'), exclamation mark (!), question mark (?), subscripts, and single quotes ('). As an exception, underscore alone is not a valid identifier.

Identifiers components may also be surrounded by double guillemets ('«' and '»'). Such identifier components may contain any character at all, aside from '»', even '«' and newlines.

Some potential identifier components may be reserved keywords. The specific set of reserved keywords depends on the set of active syntax extensions, which may depend on the set of imported modules and the currently-opened xref/deftech for namespace namespaces; it is impossible to enumerate for Lean as a whole. These keywords must also be quoted with guillemets to be used as identifier components in most syntactic contexts. Contexts in which keywords may be used as identifiers without guillemets, such as constructor names in inductive datatypes, are raw identifier contexts.

Identifiers that contain one or more '.' characters, and thus consist of more than one identifier component, are called hierarchical identifiers. Hierarchical identifiers are used to represent both module names and names in a namespace.

3.1.1.3. Structure

syntax
Parser for a Lean module. We never actually run this parser but instead use the imperative definitions below that
return the same syntax tree structure, but add error recovery. Still, it is helpful to have a `Parser` definition
for it in order to auto-generate helpers such as the pretty printer. module ::=
    Parser for a Lean module. We never actually run this parser but instead use the imperative definitions below that
return the same syntax tree structure, but add error recovery. Still, it is helpful to have a `Parser` definition
for it in order to auto-generate helpers such as the pretty printer. header (command)*
    

A module consists of a module header followed by a sequence of commands.

3.1.1.3.1. Module Headers

The module header consists of a sequence of import statements.

syntax
header ::=
    (import)*
    

An optional keyword prelude, for use in Lean's implementation, is also allowed:

header ::= ...
    | prelude (import)*
      
syntax
prelude ::=
    prelude
    

The prelude keyword indicates that the module is part of the implementation of the Lean prelude, which is the code that is available without explicitly importing any modules—it should not be used outside of Lean's implementation.

syntax
import ::= ...
    | import ident
      

Imports the module. Importing a module makes its contents available in the current module, as well as those from modules transitively imported by its imports.

Modules do not necessarily correspond to namespaces. Modules may add names to any namespace, and importing a module has no effect on the set of currently open namespaces.

The imported module name is translated to a filename by replacing dots ('.') in its name with directory separators. Lean searches its include path for the corresponding importable module file.

3.1.1.3.2. Commands

Commands are top-level statements in Lean. Some examples are inductive type declarations, theorems, function definitions, namespace modifiers like open or variable, and interactive queries such as #check. The syntax of commands is user-extensible. Specific Lean commands are documented in the corresponding chapters of this manual, rather than being listed here.

Make the index include links to all commands, then xref from here

3.1.1.4. Contents

A module includes an def and xref environment, which includes both the datatype and constant definitions from an environment and any data stored in xref its environment extensions. As the module is processed by Lean, commands add content to the environment. A module's environment can be serialized to a .olean file, which contains both the environment and a compacted heap region with the run-time objects needed by the environment. This means that an imported module can be loaded without re-executing all of its commands.

3.1.2. Packages, Libraries, and Targets

Lean code is organized into packages, which are units of code distribution. A package may contain multiple libraries or executables.

Code in a package that is intended for use by other Lean packages is organized into libraries. Code that is intended to be compiled and run as independent programs is organized into executables. Together, libraries and executables are referred to as targets in Lake, the standard Lean build tool. section xref

Write Lake section, coordinate this content with it