From ebcf0fbbb6bf2c994bb99288f0bd120d210b17d5 Mon Sep 17 00:00:00 2001 From: Janis Lesinskis Date: Tue, 19 Mar 2019 18:53:46 +1100 Subject: [PATCH 1/5] Start page on regex verbose mode --- .../regex2019/regex_verbose_mode.md | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 content/tutorial-pages/regex2019/regex_verbose_mode.md diff --git a/content/tutorial-pages/regex2019/regex_verbose_mode.md b/content/tutorial-pages/regex2019/regex_verbose_mode.md new file mode 100644 index 00000000..f381d9ba --- /dev/null +++ b/content/tutorial-pages/regex2019/regex_verbose_mode.md @@ -0,0 +1,22 @@ +--- +title: "How verbose mode makes your regex more easy to use" +authors: + - "Janis Lesinskis" +date: +tags: + - Python + - Regex +contentType: "tutorial" +callToActionText: "Have you got a project that requires in depth knowledge of regex? We'd love to hear about it so fill in the form below with some details." +hideCallToAction: false +--- + +Verbose mode makes your regex far more readable and maintainable, here's how you can use it. + + + +Before I found out about the verbose mode that is offered in Python I was always far more hesitant to use regex due to issues with readability. + +Let's look at an example, parsing emails with regex. + +(Note that parsing emails with regex is **hard**, see for example ) \ No newline at end of file From ed6207804134bd78f5a00c8ba3e591937038f5b2 Mon Sep 17 00:00:00 2001 From: Janis Lesinskis Date: Tue, 19 Mar 2019 19:45:58 +1100 Subject: [PATCH 2/5] Used officeIpsum for some sample text --- .../tutorial-pages/regex2019/regex_verbose_mode.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/content/tutorial-pages/regex2019/regex_verbose_mode.md b/content/tutorial-pages/regex2019/regex_verbose_mode.md index f381d9ba..e33f47a4 100644 --- a/content/tutorial-pages/regex2019/regex_verbose_mode.md +++ b/content/tutorial-pages/regex2019/regex_verbose_mode.md @@ -17,6 +17,14 @@ Verbose mode makes your regex far more readable and maintainable, here's how you Before I found out about the verbose mode that is offered in Python I was always far more hesitant to use regex due to issues with readability. -Let's look at an example, parsing emails with regex. +Let's look at an example, parsing emails with regex. For some example text lets use [Office Ipsum](http://officeipsum.com/) with a sprinkling of emails and not-quite-emails: -(Note that parsing emails with regex is **hard**, see for example ) \ No newline at end of file +```python +office_ipsum = """ +It's a simple lift and shift job. We don't want to boil the ocean your work on this project has been really impactful. Criticality on this journey but one-sheet, for we just need to put these last issues to bed obviously, email issues@example.com. Locked and loaded organic growth@10%. Wheelhouse out of scope. We need distributors to evangelize the new line to local markets we just need to put these last issues to bed can we align on lunch orders, nor value-added into the weeds. Fire up your browser. Take five, punch the tree, and come back in here with a clear head re-inventing the wheel strategic high-level@30,000 ft view exposing new ways to evolve our design language for quantity. Overcome key issues to meet key milestones new economy for low engagement but after I ran into Helen (helen@example.com) at a restaurant. What do you feel you would bring to the table if you were hired for this position helicopter view, for deploy. Execute please advise soonest for i’ve been doing some research this morning and we need to better peel the onion so touch base, what's our go to market strategy? imagineer. Closing these latest prospects is like putting socks on an octopus. Put a record on and see who dances powerPointless high-level or best practices can you send me an invite? (list@invites.example.com). Thought shower low-hanging fruit. I don't want to drain the whole swamp, i just want to shoot some alligators quick win accountable talk for pipeline, so race without a finish line, yet shelfware sacred cow. Punter forcing function . Low-hanging fruit. When does this sunset? +""" +``` + +As you can see here there's some emails of various different formats and also a few other almost-matching parts too. + +(Note accurately parsing all valid emails with regex is **hard**, TODO: example links) \ No newline at end of file From ed0f6caf850d5149042a5782e38b04d95c8cc7a7 Mon Sep 17 00:00:00 2001 From: Janis Lesinskis Date: Tue, 19 Mar 2019 20:27:59 +1100 Subject: [PATCH 3/5] Start putting together an example --- .../regex2019/regex_verbose_mode.md | 22 ++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/content/tutorial-pages/regex2019/regex_verbose_mode.md b/content/tutorial-pages/regex2019/regex_verbose_mode.md index e33f47a4..85029b78 100644 --- a/content/tutorial-pages/regex2019/regex_verbose_mode.md +++ b/content/tutorial-pages/regex2019/regex_verbose_mode.md @@ -27,4 +27,24 @@ It's a simple lift and shift job. We don't want to boil the ocean your work on t As you can see here there's some emails of various different formats and also a few other almost-matching parts too. -(Note accurately parsing all valid emails with regex is **hard**, TODO: example links) \ No newline at end of file +(Note accurately parsing all valid emails with regex is **hard**, TODO: example links) + +Here's an email pattern specified in the verbose mode: + +```python +email_pattern = r''' +\S+ # username: one or more non-whitespace chars +@ +\S+ # domain: 1+ non-whitespace chars +\. # matches exactly a dot +[a-zA-Z]+ +''' +``` + +Non verbose mode looks like this: + +```python +email_p = r'\S+@\S+\.[a-zA-Z]+' +``` + +I find the readability is starting to already be a win with verbose mode here, but it gets more pronounced as the regex gets longer. \ No newline at end of file From 64dffd72ec590008c4a6094d8de1d4082df4a56e Mon Sep 17 00:00:00 2001 From: Janis Lesinskis Date: Tue, 19 Mar 2019 20:38:08 +1100 Subject: [PATCH 4/5] Added in a some more code samples --- .../regex2019/regex_verbose_mode.md | 68 ++++++++++++++++++- 1 file changed, 67 insertions(+), 1 deletion(-) diff --git a/content/tutorial-pages/regex2019/regex_verbose_mode.md b/content/tutorial-pages/regex2019/regex_verbose_mode.md index 85029b78..9dbd1bed 100644 --- a/content/tutorial-pages/regex2019/regex_verbose_mode.md +++ b/content/tutorial-pages/regex2019/regex_verbose_mode.md @@ -47,4 +47,70 @@ Non verbose mode looks like this: email_p = r'\S+@\S+\.[a-zA-Z]+' ``` -I find the readability is starting to already be a win with verbose mode here, but it gets more pronounced as the regex gets longer. \ No newline at end of file +I find the readability is starting to already be a win with verbose mode here, but it gets more pronounced as the regex gets longer. + +## Capture groups + +Let's try to capture the name in email: + +```python +email_name_only_re = r''' +(?P # capture this following block into name + \S+ # username: one or more non-whitespace chars +) +@ +\S+ # domain: 1+ non-whitespace chars +\. # matches exactly a dot +[a-zA-Z]+ # matches characters +''' +``` + +This will capture the name. You can use the indentation to show the scope of the capture group. + +```python +names = re.findall(email_name_only_re, office_ipsum, flags=re.VERBOSE) +print(names) +``` + +TODO: output + +## Capture of the entire domain + +```python +email_capture_re = r''' +(?P + (?P # capture this following block into name + \S+ # username: one or more non-whitespace chars + ) + @ + (?P # capture this following block into domain + \S+ # domain: 1+ non-whitespace chars + \. # matches exactly a dot + [a-zA-Z]+ # matches characters + ) +) +''' +``` + + +## Capturing the TLD + +TODO test this + +```python +email_capture_re = r''' +(?P + (?P # capture this following block into name + \S+ # username: one or more non-whitespace chars + ) + @ + (?P # capture this following block into domain + \S+ # domain: 1+ non-whitespace chars + \. # matches exactly a dot + (?P # capture this following block into tld + [a-zA-Z]+ # matches characters + ) + ) +) +''' +``` \ No newline at end of file From d65625c96056a62e9d063e6f1894eaed1510217f Mon Sep 17 00:00:00 2001 From: Janis Lesinskis Date: Tue, 19 Mar 2019 21:51:32 +1100 Subject: [PATCH 5/5] Added in some output from running regex --- .../tutorial-pages/regex2019/regex_verbose_mode.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/content/tutorial-pages/regex2019/regex_verbose_mode.md b/content/tutorial-pages/regex2019/regex_verbose_mode.md index 9dbd1bed..07109525 100644 --- a/content/tutorial-pages/regex2019/regex_verbose_mode.md +++ b/content/tutorial-pages/regex2019/regex_verbose_mode.md @@ -21,7 +21,7 @@ Let's look at an example, parsing emails with regex. For some example text lets ```python office_ipsum = """ -It's a simple lift and shift job. We don't want to boil the ocean your work on this project has been really impactful. Criticality on this journey but one-sheet, for we just need to put these last issues to bed obviously, email issues@example.com. Locked and loaded organic growth@10%. Wheelhouse out of scope. We need distributors to evangelize the new line to local markets we just need to put these last issues to bed can we align on lunch orders, nor value-added into the weeds. Fire up your browser. Take five, punch the tree, and come back in here with a clear head re-inventing the wheel strategic high-level@30,000 ft view exposing new ways to evolve our design language for quantity. Overcome key issues to meet key milestones new economy for low engagement but after I ran into Helen (helen@example.com) at a restaurant. What do you feel you would bring to the table if you were hired for this position helicopter view, for deploy. Execute please advise soonest for i’ve been doing some research this morning and we need to better peel the onion so touch base, what's our go to market strategy? imagineer. Closing these latest prospects is like putting socks on an octopus. Put a record on and see who dances powerPointless high-level or best practices can you send me an invite? (list@invites.example.com). Thought shower low-hanging fruit. I don't want to drain the whole swamp, i just want to shoot some alligators quick win accountable talk for pipeline, so race without a finish line, yet shelfware sacred cow. Punter forcing function . Low-hanging fruit. When does this sunset? +It's a simple lift and shift job. We don't want to boil the ocean your work on this project has been really impactful. Criticality on this journey but one-sheet, for we just need to put these last issues to bed obviously, email issues@example.com. Locked and loaded organic growth@10%. Wheelhouse out of scope. We need distributors to evangelize the new line (overseas-line@example.com.au) to local markets we just need to put these last issues to bed can we align on lunch orders, nor value-added into the weeds. Fire up your browser. Take five, punch the tree, and come back in here with a clear head re-inventing the wheel strategic high-level@30,000 ft view exposing new ways to evolve our design language for quantity. Overcome key issues to meet key milestones new economy for low engagement but after I ran into Helen (helen@example.com) at a restaurant. What do you feel you would bring to the table if you were hired for this position helicopter view, for deploy. Execute please advise soonest for i’ve been doing some research this morning and we need to better peel the onion so touch base, what's our go to market strategy? imagineer. Closing these latest prospects is like putting socks on an octopus. Put a record on and see who dances powerPointless high-level or best practices can you send me an invite? (list@invites.example.com). Thought shower low-hanging fruit. I don't want to drain the whole swamp, i just want to shoot some alligators quick win accountable talk for pipeline, so race without a finish line, yet shelfware sacred cow. Punter forcing function . Low-hanging fruit. When does this sunset? """ ``` @@ -67,6 +67,17 @@ email_name_only_re = r''' This will capture the name. You can use the indentation to show the scope of the capture group. +```python +>>> re.findall(email_pattern, office_ipsum, flags=re.VERBOSE) +['issues@example.com', + '(overseas-line@example.com.au', + '(helen@example.com', + '(list@invites.example.com'] +``` + +As you can see this is already not working, the first character with a parenthesis has matched when we didn't want it to. + + ```python names = re.findall(email_name_only_re, office_ipsum, flags=re.VERBOSE) print(names)