Pattern Matching Using Python, in CSC250

Due Monday October 15, class time

You are very much encouraged to work with one or two partners.

Refer to the Pattern Matching Lab held on October 10.
Answer the following questions. The whole thing should be stored in a file called relab.py. Include propoer documentation at the top of the file, using python's comment character #.
Something like:
# Names: Judy Franklin, Jean-Luc Ponty, and Stanley Clarke
# Class: csc250
# Contents: functions and text answers for relab
# Date: October 15, 2007

import re
Don't forget to put in the import re statement to import the re functions. Use python function definitions to test your regular expressions. This is easier than retyping and editing on the python interpreter command line. You will submit this file electronically, by Monday October 15, class time, by typing
submit relab relab.py
from your 250a-?? account on beowulf.
  1. Question 1
    When we left the lab last Wednesday, we used backreferencing to match two html tags (see this web page, http://www.regular-expressions.info/named.html). Write a more complex expression, using two backreferences to match two sets of html tags, one embedded in the other. Get this to produce a match on the string
    >>> as = r'<html><title> The fall 2007 foundations class</title></html>'
    as well as
    >>> as = r'<body><h3> The fall 2007 foundations class</H3></BODY>'
    
    Don't forget to turn off case sensitivity.
    Recall that for a single set of tags we used
    >>> match = re.search(r'<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>', as, re.IGNORECASE)
    
    and typed both
    >>> print match(0)
    and
    >>> print match(1)
    
    to see the results. Do this in a function definition in python, in your file called relab.py.

  2. Question 2
    Type all of your answers to this part into the same file, relab.py. Start each line of text with python's comment symbol, #.
    For example:
    # \b is a word boundary
    # \d{1,3} indicates between 1 and 3 digits
    # etc.
    
    In the IP address example on the same examples web site, http://www.regular-expressions.info/examples.html, explain exactly how the three regular expressions work:
    1.
     \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
    2.
    \b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
             (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
    (all on one line)

    3.
     \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
  3. Question 3
    Read http://www.regular-expressions.info/completelines.html, information on using regular expressions to find lines of text.
    We've already started looking at this, with our brief description of negative and positive lookahead in the lab (PatternLab.html). Write a regular expression that matches a complete line of text that contains all of the words
    "melody", "similarity", and "computer", in any order. Use the regular expression and examples within a function definition in your file relab.py.
    Describe how your regular expression works, in detail. Again
    # use python's comments to answer the text
    #   part of this homework.
    

    Don't forget to submit by class time Mon Oct 15:
    submit relab relab.py