With the /Format option, you can produce detailed output about regex results. A format string contains any text you want, plus specifiers that are replaced
with portions of your regex match. Using a format string, you can display the specific matched text, captured groups, or individual captures.

Using a format string may result in getting more than one line of output for matching lines in your source. Using matches, groups, or captures will increase processing time in regx by some amount.


DETAIL LEVEL

Search results have four levels of detail: Line, Match, Group, and Capture. The format string is examined to determine the greatest level of detail needed. The greatest level of detail determines:

  • Whether not-matching source lines have any meaning. If the format specifier shows information about Matches, Group, or Captures, then lines that don't match (and therefore have no Matches) will not be displayed. This means that any format string provided on the command line overrides any value given to /Output (and any /v).
  • How deeply the results are examined. Increasing levels of detail can produce greater cost to create the output.
  • How many lines of output are generated per matching source line. If the format string requires details about each match, then there will be one line of output per match, as opposed to one per matching source line. If the format string displays the capture groups, then there will be one line of output per capture group per match. 

    There is one exception to this: the %mgroups token is replaced with the name and value of each group in the match, all on the same line. This still requires examining the matches of each line but does not produce additional lines of output.

CONSTANT TEXT

You can use any text in a format string except the % character (but see %%, below). This can allow you to produce comma separated output, for example.

Example:
regx pattern /format "Found a matching line!"

This will output "Found a matching line!" once for every line that matches, but not any information about the line.

You can also insert a few special constants:

%% The % character
%tab A tab character
%nline A newline character

SEARCH CONTEXT

%# A counter. It is increased once per line of output produced.
%f The name of the source for the search. For a file, this is the filename. For a web page, this is the url used to retrieve the page. For a filter, this is an empty string.

 


 

LINE INFORMATION

%i The matching line (or the not-matching line in the case of /v)
%r The result of the replacement, if you provided a replace pattern. If the line didn't match the search pattern (and therefore no replacements were made), %r outputs the source line.
%lnum The line number in the source text

 


 

MATCH INFORMATION

These cause regx to produce one line of output for each match in each matching source line (unless a Group or Capture item is in the format specifier; see
below). They also mean that only lines containing a match will produce output.

%mid The count of the match in the input line. This starts at 1.
%mval The value of the matching text.
%mpos The position of the start of the match in the input string. The first character in the string is position 1.

Example:


> echo "fox in socks" |
    regx "fox|socks" /format "ID %mid at position %mpos: %mval"

  ID 1 at position 1: fox 
  ID 2 at position 8: socks

 


%mgroups   


This special value outputs each group in the match and its value without creating an additional line for each group.

The format is uses is the group name (or number), an equals sign, and the value of the group enclosed in double quotes. There is no space around the
equals sign.

Example:


> echo "socks on foxes" |
    regx "(?'f'fox)|(?'feet'socks)|on" /format "%mgroups" 

   feet="socks"
   f="fox" 
   

Example:


>echo "boxes of socks" | 
    regx "(?'rhyme'\w(?'twoletter'o\w)\w*)" /format "%mgroups" 

   1="boxes" 2="ox"
   1="socks" 2="oc"

Example:


>echo "John Smith, (123)-546-1902, DOB 12/2/1965" |
    regx "(?'name'[\w\s]+).*(?'date'(?<=DOB )\d\d?/\d\d?/\d\d\d{2}?)"
    /format "%mgroups" 

   name="John Smith" date="12/2/1965"

 


GROUP INFORMATION

If any Group tokens are present in the format string, regx will output one line per Group per Match per source line.

Unnamed groups are assigned numbers, starting from 1. Named groups do not get numbers. If you use the /ExplicitCaptures parameter (or set the Explicit
Captures Only property in the search pattern), only named groups will be displayed.

The implicit "group 0," which contains the entire match, if not displayed by regx.

%gid The name or number of the group
%gpos The position of the first character of the group in the source line
%gval The value of the group (the matching text)

Note that a group with a quantifier may match more than once in a single match of the regex. The %gval token return the last value matched. To retrieve every value matched, use Capture tokens.

CAPTURE INFORMATION

The presence of any Capture token in a format string causes regx to produce one line of output for each Capture per Group per Match per source line.

Unless a Group has a quantifier on it, it will only produce one Capture, which can be found using Group tokens. If a Group has a quantifier, it may (or may
not) capture more than once in a Match. The Capture tokens return information on every time the Group captured text. They return in reverse order (from the
last text captured to the first, which is generally rightmost to leftmost, unless right to left is enabled on the regex).

%cid The index of the capture (starting from 1)
%cpos The position of the first character of the group in the source line
%cval The value of the capture (the matching text)

 


 

DEFAULT FORMAT STRINGS

If you don't specify a format string with /Format, regx uses one of these:

If you don't have a replace pattern:

/Detail None %i
/Detail LineNumber %lnum: %i
/Detail FileAndLine %f(%lnum): %i


If you do have a replace pattern, regx uses:

/Detail None %r
/Detail LineNumber %lnum: %r
/Detail FileAndLine %f(%lnum): %r

 


 

EXAMPLES

Finding a portion of lines:

Source Now is the time for all good men to come to the aid of their party
Pattern \ball \w+ men\b
Format %mval
Output all good men
You can use %mval to pull out the exact portion of the input line that matches. If your pattern matches twice in a line, you will get two lines of output. To find only the first match, you will need to use groups.

 

Finding multiple portions of a line:

Source A circle in a spiral a wheel within a wheel
Pattern \w+ in a spiral|\w+ within a wheel
Format %mval
Output circle in a spiral
wheel within a wheel

Alternately:

Source A circle in a spiral a wheel within a wheel
Pattern \w+ (with)?in a \w+
Format %mval
Output circle in a spiral
wheel within a wheel

It is often easier to write several simple regex that can match what you want and use alternation (the | operator) to get them via %mval.
            

Finding parts of a match

Start with a simple pattern that can match a couple of things

Source You've broken the speed of the sound of loneliness
Pattern the \w+|of (the )?\w+

First, what are the matches:

Format %mval
Output the speed
of the sound
of loneliness

Now, what are the groups:

Format %gval
Output the

The parenthesis in "(the )?" create a capture group. The only other capture group is the implicit "group 0", which is disregarded by regx.
   
So let's add some explicit groups:

Pattern the (\w+)|of (the )?(\w+)
Format %gval
Output speed
sound
the
loneliness

Let's get that "the" group out of there. It's just an artifact of needing the quantifier. The "(?:)" operator makes a non-capturing group, which is useful for quantifiers but doesn't add to the Groups collection.

Pattern the (\w+)|of (?:the )?(\w+)
Format %gval
Output speed
sound
loneliness

Note that using "%mgroups" would still produce three lines of output because the pattern matches three separate times.

Named and nested example:

Source You've broken the speed of the sound of loneliness
Pattern (?'phrase'the (?'noun'\w+)|of (?:the )?(?'noun'\w+)|You've (?'verb'\w+))
Format %mgroups
Output phrase="You've broken" verb="broken"
phrase="the speed" noun="speed"
phrase="of the sound" noun="sound"
phrase="of loneliness" noun="loneliness"

Using Capture tokens with balancing constructs


The Capture tokens are especially useful with balancing groups, since each nesting of the balancing definition results in a separate, named Capture.

Source abc (def (ghi (jkl) mno) pqr) stu
Pattern ((?'Open'\()[^()]*)+((?'Close-Open'\))[^()]*)+
Format Group '%gid', capture %cid: %cval
Output Group 'Close', capture 1: jkl
Group 'Close', capture 2: ghi (jkl) mno
Group 'Close', capture 3: def (ghi (jkl) mno) pqr

Note that outputting groups (with %gval instead of %cval) would have only
output the latest capture for the group (in this case, capture 3).

Last edited Oct 15, 2012 at 6:14 AM by SethMorris, version 4

Comments

No comments yet.