This fragment is about to be reported (you'll remain on this page):

You can enter a comment to clarify the mistake if you would like to:

LiteComment

A lightweight text formatter focused on extreme simplicity and speed.

LiteComment
LiteComment
  1. 1. Features
  2. 2. Formatting
    1. 2.1. Typographics
  3. 3. Usage
    1. 3.1. Methods
    2. 3.2. Properties
  4. 4. Configuration
  5. 5. Extensibility
    1. 5.1. Custom commands via special insets
      1. 5.1.1. Embedding YouTube videos
        1. 5.1.1.1. Bonus
    2. 5.2. Representing URLs as something else than text
      1. 5.2.1. Is it really by extension?..
    3. 5.3. New simple format (e.g. italic)
    4. 5.4. New multiline format
    5. 5.5. E-mail masking methods

Nearly a year ago I was in need of an easy-to-use and simple formatting script that will make user-input texts look neat without too much processing overhead and without need to learn any markup language beforehand. That project was The Imageboard Search Engine v2.
That's then I came up with this script – and I called it LiteComment.

After long time after its creation I learned about Markdown's markup and was very surprised to discover that its formatting was extremely similar to that of LiteCOmmment. However, it's still slightly different.

If you need a full-fledged formatting framework for serious processing take a look at UverseWiki (a modelling text processor) which among other places powers up this blog and the i-Tools.org project.

Download LiteComment PHP script. You can check the LiteComment's sandbox here.

Features

  • Extremely fast processing – using just one regular expression – a single call to preg_replace_callback per document.
    • 100 KiB of moderately formatted English text is rendered into HTML within ≈0.2 sec.
    • The whole module is under 550 lines of PHP code (60 lines of which is taken by the regular expression in /x mode), including comments and blank lines.
  • Intuitive and intentionally limited markup.
  • URLs never clutter output – up to 2 words before an URL is used for its caption or, if there are none, only domain name is shown instead of full URL string.
  • Some essential typographics (dashes, ellipsis and such).
  • Ability to prevent formatting of strings by placing them into LiteComment`code blocks`.
  • Several options of automatic e-mail masking (including JavaScript, not or both): LiteComment.
  • Extensibility (albeit limited) and special inset syntax for custom commands: [- arg | name=value | arg | ... -]. By default it creates an explicit link.
  • LiteComment can optionally process nested statements (for chosen formatting elements) so when it's turned off >*bold* will be an inline quote but its content will be output in plain without bolding.
  • Full Unicode (UTF-8) support.
  • Valid XHTML with support for both semantic and <span>ish markup (customizable).
  • CSS-ready; each and every element produced by the formatter has a set of CSS classes.

Formatting

== Heading of level 1 ==
=== Heading of level 2 ===
... up till:
======= The smalles heading of level 6 =======

You can specify any number of "="s on
the right or omit them at once:
======= Like this ==
======= ...this =
======= ...or this!

*bold text*, `monospaced` (code)

`
Multiline code
(unformatted text)
`

"
Multiline quotation
(blockquote)
"

>>> inline quote
>> more recent saying
>most recent (spaces after > are optional)

Citations: "Long time ago when the world was ruled by Nevermore..."
Or using single apostrophes: 'Long time ago'...

Links - http://google.com   Or this www.embarcadero.com
Up to 2 preceding words are used for the caption: www.ya.ru
[- http://google.com | Caption of the link -]
E-mail masking: e@mail.em.com

Pictures: icon.png, [- logo.png | Image-link -]
With a thumbnail: [- logo.png | icon.png -]

Rules - 4 types (CSS classes); they are lines
consisting of 3 or more identical symbols:
---
~~~~
=====
++++++
LiteComment

Heading of level 1

Heading of level 2


up till:
The smalles heading of level 6


You can specify any number of "="s on
the right or omit them at once:
Like this
this
or this!


*bold text*, `monospaced` (code)Multiline code
(unformatted text)
Multiline quotation
(blockquote)


>>> inline quote
>> more recent saying
> most recent (spaces after > are optional)

Citations: «Long time ago when the world was ruled by Nevermore»
Or using single apostrophes: «Long time ago»...

Links Or this
Up to 2 preceding words are used for the caption
Caption of the link
E-mail masking: e~@~mail.em~com

Pictures: icon.png, Image-link
With a thumbnail: icon.png

Rules 4 types (CSS classes); they are lines
consisting of 3 or more identical symbols:



Typographics

Dashes:       short (en) - dash, long (em) -- dash, the ---- same
Ellipsis:     2.. (one removed) or more... dots......
Symbols:      (c) (r) (p) (tm)
Numbers:      #1  #33  1st  55th  #100th
Matrix:       10*10  10x10  10X10  10 x 10
Division:     10/10  10:10  10 / 10 (spaces are fine)
Plus-minus:   +-  +-5  +- 5

Arrows (number of "=" doesn't matter if it's >= 2):
<=>  <=  =>  <====  ====>  <=====>

Flood protection against repeated chars:
OMG!!!  WTF?!?  Erm????
LiteCommentDashes: short (en) dash, long (em) dash, the same
Ellipsis: 2. (one removed) or more dots
Symbols: © ® §
Numbers: 1 33 1st 55th 100th
Matrix: 10×10 10×10 10×10 10×10
Division: 10÷10 10÷10 10÷10 (spaces are fine)
Plus-minus: ± ±5 ± 5

Arrows (number of «» doesn't matter if it's >= 2):


Flood protection against repeated chars:
OMG WTF?? Erm??

Usage

The simplest way of formatting your text is to call static Format method:

PHP
require 'litecomment.php';
$str htmlspecialchars('*bold* <link>');
echo 
LiteComment::Format($str);   // => <strong>bold</strong> &lt;link&gt;

PHP$str could be an array (see the method's description).

It leaves you no chances to customize things but it's simple. Well, in fact there's not much room for tweaking class instance – most settings are static. Moreover, there shouldn't be many reasons to construct the class and then operate on it because it's constructed once per document and should not be used thereafter.

Note: before calling format methods you need to escape HTML on your own. You'd normally do this with htmlspecialchars like in the example above.

Methods

PHPstatic Format($text)
Returns a formatted HTML; PHP$text can be a string or an array of strings (just like reg_replace) – array will have all items formatted.
PHPSetAntiSpamMode($mode)
Sets e-mail masking mode; by default can be one of the following:
obfuscate (default)
Puts <a> with obfuscated email (e.g. «mail~@~at.here~com»).
asis
Puts <a> with plain e-mail address, welcoming the spammers.
js
Puts obfuscated <a> for non-scripting users and <a> with plain address using JavaScript. Usually the best way but it nearly doubles the size of each e-mail link in the source; that's why obfuscated is the default mode.
jsonly
Puts plain <a> by means of JavaScript alone; e-mails are hidden from non-scripting users.
PHPSetFileExtensions($extstr)
Sets list of extensions that will be recognized in source document and parsed as links. Must be a regular expression prepared to be inserted into brackets, e.g.: ($extstr)(bmp|png|zip). Default value: w?bmp|xbm|gif|jpe?g|png|svg.

Properties

As most properties are static and represent various settings see the Configuration section for their description.

Configuration

The following static properties of LiteComment class can be of interest:

maxQuoteLevel
Inline quotes are defined using a number of > symbols at line start: >>hello!. This property sets when wrap around line quote level: by default it's 4 so >quote is of level 1, >>>>quote is of level 4 and >>>>>quote is again of level 1.
  • this only affects CSS class name.
minHeadingLevel
Sets min heading level (for <h#> tag); by default it's 3 meaning that ==heading== produces <h3>heading</h3>, =====heading== produces <h6> and everything above, e.g. ======heading== also produce <h6> tag.
maxHeadingLevel
See minHeadingLevel above.
tagAliases
Controls tag names used, is like an alias table; by default they're set to match semantic markup guidelines (e.g. <em> for italic) but you can change them to <div>s and <span>s to (or anything else) to rely on CSS classes alone (which are attached to each tag regardless of its name).
fileExtensions
This property can't be changed directly as it affects the regular expression; use SetFileExtensions method to set it.
jsAntiSpamFuncName
Sets the name of JavaScript function that will be called to output masked e-mail address (only for masking methods like js and jsonly). That function will be passed 3 arguments: '$account', '$domain', '$zone' (e.g. 'my', 'e.mail', 'net') for «my@e.mail.net»).
  • when set to empty default implementation will be used that simply outputs an e-mail link built from given components: LiteComment.
antiSpamJSDefinition
Definition for default JavaScript masking function inserted into resulting HTML if jsAntiSpamFuncName is unset.
leftWordBoundaryChars
Defines the list of symbols which are considered «left word boundary». May contain regular expression-specific characters (they're quoted). For example, "*bold*" is a quote but will only be formatted as bold if " is listed both in left and right word boundary lists.
rightWordBoundaryChars
Defines the list of symbols which are considered «right word boundary».

The following instance properties of LiteComment class can be of interest:

keepFormatChar
If PHPtrue, simple format characters are kept: *bold*LiteComment*bold*. If PHPfalse they're removed: LiteCommentbold.
recursive
Can be an array, PHPtrue (equals to all array elements set to true) or PHPfalse. If PHPfalse – nested formatting isn't processed – extremely quick (1 regexp ran per 1 source) but "a *b* c" will result in LiteComment«a *b* c» while if this is PHPtrue it will be LiteComment«a *b*.
externalLinkAttributes
A string that is added to <a> tags that point to external resources. By default it's target="_blank" but you can add more and make it, for example,
target="_blank" rel="nofollow".

Extensibility

Although LiteComment is intentionally limited in features it nevertheless has a few tricks in its pockets (and it has like 65 pockets) that allow you to extend the markup in different ways.

Most bruteforce approach is, obviously, changing the regexp LiteComment uses but you'll also need to shift all pocket offsets after inserting new capturing brackets. That's been made easier by thanks of PHPLiteComment->MatchesFrom() but still requires some effort.
And, besides, I've preserved a few «extensible air holes» just for this occasion.

Download LiteComment PHP script. You can check the LiteComment's sandbox here.

Custom commands via special insets

This is most common and flexible way to add your own markup or features. It uses what I call "special insets". Its syntax is:
[- arg | arg | name=value | ... -]

If an argument doesn't have a name (no =... part) it'll be assigned an index. Spaces after [-, before -] and around | are optional.

How to add your own handler? Two methods of LiteComment class deal with special insets:

PHPFormatSpecialInsetInHTML(&$contents)
The main routine; PHP$contents is what was found inside [-...-].
PHPSpecialStrToSettings(&$str)
Gets called by the above method; splits string into parameters separated by pipes (|), assigning names to them (name=value).

So to add your command you need to go to PHPFormatSpecialInsetInHTML and examine its code:

PHP
$settings $this->SpecialStrToSettings($contents);

$urlKey array_shift(array_keys($settings));
$url array_shift($settings);
if (!
is_int($urlKey)) {
  
$url "$urlKey=$url";  // URL contains '=', reinsert it.
}

$title = &$settings['title'];
$title or $title = &$settings[0];
return 
$this->MakeHTMLLink($url$title);

Now your actions depend on how you want to extend the special inset.

  1. The first line splits string into command and arguments for it. If you want to override the default syntax of special inset ([- cmd | arg=value | arg | ...-]) you would want to insert your custom code right before the first line.
  2. The rest of the method is the default handler which inserts links (syntax: [- URL | caption-] or [- URL | title=caption -]). So the next block takes first argument off the argument list (it considers it an URL). If you want to have a command like [- my_command | arg | ... -] you need to add your code after this block – PHP$url will contain the name of command (which is the first argument).
  3. The last block searches for title argument taking the argument of index #0 if none found and finally inserts a link. You probably won't need to change anything here.

Embedding YouTube videos

Let's say we want to be able to embed YouTUbe videos from LiteComment formatting. The syntax can really be anything within [-...-] construction (spcial inset) so I've chosen this one: [- youtube | myCyJJdhhDk -].

First, let's locate FormatSpecialInsetInHTML method and make changes there. Having the code of this function before our eyes (in previous section) we'll add our code after the 2nd block:

PHP
if (strtolower($url) === 'youtube') {
  
$id $settings[0];
  
$html '<iframe title="YouTube video player" width="480" height="390"
                   src="http://www.youtube.com/embed/'
.$id.'?rel=0"
                   frameborder="0" allowfullscreen></iframe>'
;
  return 
$html;
}

// original code follows:
$title = &$settings['title'];
...

That's all! Now we can use this syntax to insert a YouTube video:

Here's a tutorial on getting started in well-known ModPlug Tracker:
[- youtube | myCyJJdhhDk -]
Bonus

To demonstrate other possibilities of extending special insets let's say that I also want to support this, shorter, syntax: [- youtube myCyJJdhhDk -]. Here the difference is that video's ID («myCyJJdhhDk») is no more an argument so we need to do parsing on our own.
That's not difficult at all, though – simply insert the following code right in thebeginning of the PHPFormatSpecialInsetInHTML function:

PHP
if (stripos($contents'youtube ') === 0) {
  
$id substr($contentsstrlen('youtube '));
  
// the rest is the same as in the previous example with [-youtube | ID-]:
  
$html '<iframe title="YouTube video player" width="480" height="390"
                   src="http://www.youtube.com/embed/'
.$id.'?rel=0"
                   frameborder="0" allowfullscreen></iframe>'
;
  return 
$html;
}

// original code follows:
$settings $this->SpecialStrToSettings($contents);
...

Possibilities for extending special insets are quite endless.

Representing URLs as something else than text

By default when you format an URL (like http://google.com or [- goo.com -]) you'll get a link with text caption. However, this is boring and sometimes we want to display a thumbnail pointing to the full image – text caption just isn't good enough.
And that's what you can do – make plain URLs look different.

By default LiteComment already includes handlers that will show URLs with picture extensions (such as .png) as images (<img />) so that when you write: picture.jpg or [-http://my-home | thumb.png-] you'll see an image linking to the actual page.

You can extend this by adding your own handlers (handlers are triggered based on file (URL) extension).
Adding a custom handler is as simple as adding a method to LiteCommentclass with the name of PHPHTMLByExt_<EXTENSION>. For example, if we want to provide a download counter for archives and also a neat icon of them before the link we can add this method for ZIP archives:

PHP
function HTMLByExt_ZIP($url) {
  
$counts unserializefile_get_contents('dl-counter.txt') );
  if (!
is_array($counts)) { $counts = array(); }
  
$thisCount $counts[$url]++;
  
file_put_contents('dl-counter.txt'serialize($counts), LOCK_EX);

  return 
'<img src="images/zip-link.png" />'.basename($url).
         
" ($thisCount downloads so far)";
}

Remember that link (<a>) will be added automatically. Also, since this method is called when formatting texts the counter won't update if you're caching formatted HTML (not that it's necessary with the kind of speed LiteComment has).

Now the following snippet will be neatly formatted:

In this archive: litecomment.zip you'll find the software with all necessary instructions.

Note that LiteComment will only recognize extensions in-text that were registered using PHPSetFileExtensions().

Is it really by extension?..

Erm, well, when I said that handlers are triggered based on file extension I tricked you a little :) They're actually triggered based on the whole URL and «extension trigger» is just the simplest method of adding a new handler.
What do I mean?

Let's say we want to warn users about links to a particular site. Such URLs don't have to be of one extension – just linking to the same resource. Say, spammer.org.
We'll start off with locating PHPGetHTMLForURL method and examining its code:

PHP
if ($methodName $this->GetHTMLFileMethodFor($url)) {
  return 
$this->$methodName($url);
}

Pretty straightforward, eh? This function accepts PHP$url argument which holds the entire URL that was passed to special inset ([-url|caption-]) or was found in the text (like www.site.ru/file). Now we know what to do: add an extra condition to the beginning of this method:

PHP
if (stripos($url'spammer.org') !== false) {
  return 
'<em>This site might harm your system.</em>';
} elseif (
$methodName $this->GetHTMLFileMethodFor($url)) {
  
// original code follows.

Now each link to spammer.org will have that text included: «Visit spammer.org»VIsit <a href="..."><em>This site might harm your system.</em></a>.

New simple format (e.g. italic)

A «simple format» is a text placed between 2 identical strings. For example, default simple formats are bold (*bold*) and preformatted (`code`) texts. As you can see they are created by * and ` symbols correspondingly. Note that a simple format doesn't have to use a single symbol – it can be a string (e.g. ##).
You can easily add a new simple format if it follows this rule.

Let's say we want to underline text. It'll have this syntax: _underlined_ and it will be using <ins> tag.

Side note: <u> isn't HTML5-compliant while <ins> is displayed underlined in all browsers as far as I have tested. Similar story with <del> and <s>, <strike> tags – the first is semantic and is displayed striked-through by default).

We need to do 3 things:

  1. Add format symbol to the regexp: find a place in the code that looks like this:
    ([\*\`]) ((?=[^\s]) .+? (?<=[^\s])) \30 # formatting char (eg. *)
    Let's add _ there: ([\*\`_]) ... (the rest of line is the same).
  2. Add the tag alias to PHPstatic $tagAliases property. Example:
    PHPstatic $tagAliases = array('code' => 'code''strong' => 'strong', ...
    Let's change this line to the following:
    PHPstatic $tagAliases = array('ins' => 'ins''code' => 'code', ...
  3. Finally, add your tag to the list of simple formats in PHPHTMLReplaceCallback() method:
PHP
} elseif ($formatChar = &$matches[30]) {
  static 
$charToClass = array('*' => array('strong''emphasis'), '`' => array('tt''monotype'));
  ...

Simply add an item to PHP$charToClass:

PHP
static $charToClass = array('_' => array('ins''underlined'), ...

The first item (PHP'ins') is our tag, the second (PHP'underline') is CSS class to assign to it. It can also be an array to specify several classes – for example:

PHP
array('_' => array('ins', array('underlined''inserted')), ...

New multiline format

A «multiline format», or a block, spans multiple lines (obviously). Default multiline formats are code and blockquotes:

`
code goes here
and isn't processed
`

"
this is a blockquote
with many lines
"
LiteCommentcode goes here
and isn't processed
this is a blockquote
with many lines

Similarly to simple formats multiline formats are created by identical strings (1 or more characters) placed on separate lines.

Let's say we want to add some «attention box» that will be expressed via <div> tag with CSS class set to attention. It will be created with this markup:

!!
Please use the forum search function
before asking your questions!
!!

To implement this we need 3 things – just like with a simple format:

  1. Add format symbol to the regexp: find a place in the code that looks like this:
    \n+ (`|&quot;)\s*?\n ([\s\S]+?) \n+\3\s*? $ # multiline insets; [\s\S] is like . + \n
    Let's add !! there: \n+ (`|&quot;|!!)\s* ... (the rest of line is the same).
  2. Add the tag alias to PHPstatic $tagAliases property. Example:
    PHPstatic $tagAliases = array('code' => 'code''strong' => 'strong', ...
    By default it already includes the tag we want to use (<div>) so no need to do anything here.
  3. Finally, add your tag to the list of multiline formats in PHPHTMLReplaceCallback() method: PHPstatic $multilineToTag = array('`' => array('code''code'), We need change this line to: PHP...array('`' => array('code''code'), '!!' => array('div''attention'), ...

That's it!

E-mail masking methods

LiteComment is able to replace plain e-mail addresses in texts with obfuscated alternatives. By default it has 3 spam-protection methods (obfuscating, JavaScript protection and JavaScript protection with obfuscating for those who have JavaScript turned off) but you can always add more. How?

Quite easily:

  1. Add a record to PHPSetAntiSpamMode();
  2. Add a method named MakeEmail_<NAME> to the LiteComment class.

Let's say we want to display e-mails as images. For this let's go to PHPSetAntiSpamMode() function and add a new PHPcase 'our_method' there. Example:

PHP
...
case 
'jsonly':
case 
'image':   // <- added
  
$this->antiSpamMethod "MakeEmail_$mode";
}

Now let's create PHPMakeEmail_image method (it needs GD library for PHP):

PHP
function MakeEmail_image($account$domain$zone) {
  
$img imagecreatetruecolor(10030);
  
$color imagecolorallocate($img000);
  
imagestring($img300"$account@$domain.$zone"$color);

  
ob_start();
  
imagepng($img);
  
$data ob_get_clean();

  
$base64 chunk_split(base64_encode($data));
  
$html '<img src="data:image/png;base64,'.$base64.'" alt="E-mail" />';
  return 
$html;
}

The code working with GD is purely for demonstration and has some limitations (e.g. we could determine the width of e-mail string before creating the image). However, I think it demonstrates well how things generally work.

Download LiteComment. You can check its sandbox here.

Please drop a comment below if you have questions or if you're using LiteComment in your project!

Comments RSS20

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »