regex Archives - rweber.net

Bookmarkable Ajax-Driven Pages

Rebecca — Mon, 09 Jul 2018 12:00:56 +0000

If someone might reasonably expect to bookmark or link others to content, I like that to be possible. With ajax-updated pages it doesn’t come for free, though. The newest addition to the Utilities repository is a little bit of get/set code for query strings to support bookmarkability.

The code consists of two functions, parseQuery and updateQuery. They should live inside a scope (perhaps a self-executing anonymous function) with variables holding the default and current values of the parameters that determine the content of the page.

var default1 = "default1";
var param1;

Parsing queries is straightforward and can be done in multiple ways. This version uses a regular expression to grab everything between the parameter name with equals sign, and the next ampersand if any, provided that’s at least one character long.

function parseQuery() {
  var queryString = window.location.search.substr(1);
  // set all parameters to their default values
  param1 = default1;
  if (queryString.length > 0) {
    // if there's a query string, check for each param within it
    var val1 = queryString.match(/.*param1=([^&]+).*/i);
    if (val1) {
        param1 = val1[1];
    }
  }
}

To handle multiple parameters you’d repeat the blocks right after the comments: set all variables to the default, and then do a match and a length check for each of them within the single “if there’s a query string” check.

You can also split the query string into an array, but it’s a little more difficult to deal with exceptional cases like parameter names that lack values.

Updating the query is more complicated. I wanted my query string to consist of parameters with non-default values only, plus any parameters that don’t belong to this code. My original version simply rewrote the whole query string to consist exactly of all the parameters – default or not – and would not permit other parameters to persist, such as those that you might use to track (or deactivate) A/B tests. The reason to keep default parameters out of the query string was that one of those had a default value dependent on the date; if you bookmarked the “now” version of the page I wanted it to still be “now” when you came back a month later.

The answer was once again regular expressions (when is it not?).

function updateQuery() {
  var newUrl = window.location.href;
  // clean out valueless parameters to simplify ensuing matching
  newUrl = newUrl.replace(/(.*[?&])param1(&(.*))?$/, "$1$3");
  if (param1 !== default1) {
    if (newUrl.match(/[?&]param1=/)) {
      newUrl = newUrl.replace(/(.*[?&]param1=)[^&]*(.*)/, 
      '$1' + param1 + '$2');
    } else if (newUrl.indexOf('?') > 0) {
      newUrl = newUrl + '¶m1=' + param1;
    } else {
      newUrl = newUrl + '?param1=' + param1;
    }
  } else {
    newUrl = newUrl.replace(/(.*[?&])param1=[^&]*&?(.*)/, '$1$2');
  }

  // tidy up
  if (newUrl.match(/[?&]$/)) {
    newUrl = newUrl.slice(0, -1);
  }    
  window.history.pushState('', '', newUrl);
}

For each parameter in turn, clean out any valueless instance of it (meaning “without an equals sign”, really; if the browser allows valueless with an equals sign is will be handled in the if statements). Then, if the parameter has a non-default value, replace its value or add it as a new parameter to the string. If it is the default, remove it from the string. That is, the whole section between the two comments would be repeated for each parameter. All of this business might leave a trailing question mark or ampersand, so clean that away if needed and push the new URL into the browser history.

There’s a sample webpage in the repo as well, in which you can try this out, though you’ll need to update the internal links on the page to match its location on your localhost.

Kate Greenaway illustrations as bookmarks via Emmie_Norfolk on Pixabay.

The post Bookmarkable Ajax-Driven Pages appeared first on rweber.net.

Google Analytics: Simple RegExp for Advanced Filtration

Rebecca — Mon, 14 Aug 2017 12:00:27 +0000

Just a little bit of special syntax for describing patterns can greatly increase the flexibility of your filters in Google Analytics. This post is to give you that bit.

What are we working with?

In Google Analytics you can filter using what I’ll call the basic filtration box, that input box with the magnifying glass button above the table of data, and the advanced filtration area which opens if you click the “advanced” link next to the basic filtration box.

I’ll assume in this post that we’re looking at my craft blog’s analytics, specifically the Behavior > Site Content > All Pages report, with the default primary dimension of Page.

The basic filtration box will give you generic pattern-matching: typing “crochet” will give you all URLs that have “crochet” anywhere from the beginning to the end. In the advanced area you can further specify that the URL begin with, exactly match, or end with your search string. In both locations you can use regular expressions.

Regular expressions are a way to describe a pattern to be matched. In full generality the language is extensive and can express very complex patterns. We don’t need the full language (and GA doesn’t support all parts of it anyway), but a little RegExp goes a long way toward easily filtering to the data you’re interested in.

Your first batch of syntax

Regular expressions work by having a collection of reserved characters, symbols that hold special meaning in the RegExp context.

The most useful in GA is | (pipe), found above the return key along with backslash. It means “or.” For example, I did a series about embroidery on crochet where the introductory post’s slug is embroider-crochet and the later posts’ slugs begin embroidery-crochet. I can capture both together with
embroider-crochet|embroidery-crochet

Portions of a regular expression can be enclosed in parentheses. This does nothing by itself, but can be combined with other operations. Enclosing an “or” expression in parentheses lets you make it part of a longer expression. This lets me shorten my previous filter, such as to
(embroidery|embroider)-crochet

Since regular expressions are their own singular option in the advanced filters, you have to use RegExp symbols to get “begins with,” “ends with,” and “exactly matches” filters (unless otherwise specified RegExps match like “contains”). Preceding your expression with ^ means “begins with” and following your expression with $ means “ends with.” Using both gives you “exactly matches.”

For example, if I filtered by /embroidery, I would get both posts in the embroidery category (they begin with /embroidery) and the posts in the “embroidery on crochet” series (which contain /embroidery but begin /crochet). To limit myself to posts in the embroidery category I can filter with ^/embroidery. If for some reason I wanted to filter to just the main blog page, which shows up as /, I could filter with ^/$.

Summary

exp1|exp2 : matches strings matching exp1 or exp2
^exp1 : matches strings beginning with a match to exp1
exp1$ : matches strings ending with a match to exp1
(exp1) : allows exp1 to be part of a longer pattern

Special characters versus ordinary characters

What if you need to use a reserved character literally? Very few reserved characters would ever appear in a URL, but they could in page titles and elsewhere.

There is a straightforward means to get your regular expression to interpret a character as the ordinary version and not the special RegExp version: precede it with a backslash. This is called escaping the character. For example, $ and $ get you literal parentheses.

Characters that need to be escaped are: \ ^ $ . | ? * + ( ) [ {

I have a Related Posts plugin on the craft blog that adds query parameters to its links. If I put /?related into the filtration box, it wouldn’t give me what I was expecting. The ? needs to be escaped: /\?related.

Cautionary notes

In the basic filtration box, you always need to escape reserved characters since it assumes you’ve typed a regular expression by default (though GA is smart enough to interpret a lone or leading ?, say, as a literal character – meaning in our last example filtering on ?related without the / would work just fine).

In the advanced filtration area, the match type drop-down must be set to “Matching RegExp” for the filter to be interpreted as a regular expression. In that case you must escape special characters, but in any other case the backslash will be interpreted literally and break your filter.

A second batch of syntax

What’s above may meet all of your needs. However, you may find situations in which you can’t quite get where you need to be with pipe, parens, caret and dollar sign, or where filters based on those are cumbersome.

The wildcard

A period in a regular expression will match any single character. For example, /page/./ will match /page/2/ but not /page/10/. /page/../ will match /page/10/ but not /page/2/, unless it happened to actually be /page/2//. Since I know my data doesn’t include any URLs with double slashes, I can see ultra-deep dives into content by filtering on /page/../ to get only pages 10 and up.

Repeats

Instead of typing some large number of periods to match a longer string that varies, we can use characters that indicate repetition. This also allows us to match when the varying string does not always have the same length.

Repetition is indicated by one of three “suffix” characters: question mark, asterisk, or plus sign. They mean, respectively, 0 or 1 repeat, 0 or more repeats, 1 or more repeats. For an example:
A.? matches A, AB, A5; does not match ABC, AB12
A.* matches A, AB, A5, ABC, AB12
A.+ matches AB, A5, ABC, AB12; does not match A
(the lists of strings matched or not matched is representative, not comprehensive)

Going back to the page number example, I’d like to look at engagement with pages 2 and later of all category archives. I know the URL structure will be /category/[category-name]/page/[number]/, and that the part from “page” on doesn’t exist on the first page.

Basically I need /category/ and /page/ with something in between, so here is my RegExp:
/category/.+/page/
.* could be used interchangeably with .+ here, because there won’t be a match to category//page.

All three modifiers – ?, +, and * – can be used on any character, not just the period. This lets us simplify our “embroidery on crochet” filter even further. The only different between embroidery-crochet and embroider-crochet is the y, so embroidery?-crochet will match both. It will not match embroiders-crochet, though either embroider.?-crochet or embroider(y|s)?-crochet would match all three.

Summary

. : matches any single character
? : indicates the part of the pattern preceding it can occur 0 or 1 times
* : indicates the part of the pattern preceding it can occur 0 or more times
+ : indicates the part of the pattern preceding it can occur 1 or more times

One little side note

All of my regular expressions so far have matched the case of the URLs I was trying to filter down to. By default, though, Google Analytics makes matches in a case-insensitive manner, meaning “thread” would match “Thread” and “THREAD” as well as the all-lowercase version. This generally is a helpful simplification but if capitalization is meaningful for your site, be aware you can’t filter for it simply by capitalizing in your RegExp.

The full reference list

Characters that need to be escaped (preceded with a backslash) to be interpreted literally:
\ ^ $ . | ? * + ( ) [ {

`\|`	or	`exp1\|exp2` matches strings matching `exp1` or `exp2`
`^`	beginning	`^exp1` matches strings beginning with a match to `exp1`
`$`	end	`exp1$` matches strings ending with a match to `exp1`
`()`	enclosure	`(exp1)` allows `exp1` to be part of a longer pattern
`.`	wildcard	`.` matches any single character
`?`	optional	`AB?` matches A and AB
`*`	unlimited	`AB*` matches A, AB, ABB, ABBB, ABBBB, …
`+`	at least 1	`AB+` matches AB, ABB, ABBB, ABBBB, … but not A

Coffee photo by miheco on Flickr.

The post Google Analytics: Simple RegExp for Advanced Filtration appeared first on rweber.net.

Regular Expressions

Rebecca — Wed, 30 Jul 2014 12:00:46 +0000

I am not sure how someone with my background got to this point in life without learning regular expressions. I minored in computer science in college and took a few more classes in graduate school. I took a class on models of computation that included regular languages. I ended up in the area of mathematics most strongly associated with Stephen Kleene, computability/recursion theory, and am a great admirer of his.

And yet, here I was, with an understanding of regular expressions generously described as “rudimentary.” I decided to fix that recently with the help of Kevin Skoglund, Regular-Expressions.info, and Regular Expressions 101. That last site lets you input an expression and text and not only shows you matches, but takes apart your expression and describes what it’s doing, and shows you the content of any captures that were made. Learning regex felt like being given a secret decoder ring, and it didn’t take long at all to learn (the Lynda course, which I recommend, is a bit over 5.5 hours, but the last two are examples).

I also used single pages found via Google searches and on Stack Overflow, and the Perl documentation, though unfortunately the Perl site is nigh unnavigable. Finally, though it’s flagged with multiple issues, Wikipedia has a comparison of regular expression engines that includes which additions to ERE are and are not supported by different regex flavors.

I’ve typed up my notes from all these sites, and should I decide to make them pretty you’ll probably see them here.

The post Regular Expressions appeared first on rweber.net.