little perl tools for convenience

These are probably not unique in any way and have been made by lots of people individually for their own use... feel free to avoid rewriting them and rather use the ones that are already there.

The License for these is:
Do to them whatever you want: use, modify, redistribute or flame them, as long as credit to me (hellweg@snark.de) remains with them. There are no guarantees, that these tools do or do not perform like i told you or like you want. They might even do bad things to your cat.

replace is a perl script that applies a perl substitution expression to each line of all the files on the command line and then replaces the original file with the modified version (or it works on standard-in if no files are named)

Example (change some text):

replace -v 's/(hate Bill Gates)/$1 & Steve Balmer/gi' *.html

Example (rip font tags out of html pages):

replace -v 's/<\/?font[^\>]*\>//gi' *.html

Another one (replace the background color in a whole web tree):

find -iname "*.htm*" | xargs replace -v 's/\<BODY ([^\>]*)(BGCOLOR\s*=\s*[^\>\s]+)?([^\>]*)\>/\<BODY $1 BGCOLOR="#FEFEFE" $3>/i'

rename applies a perl substitution expression to each file argument on the commandline and renames the files accordingly

It has two features, that exceed the standard rename implementations:

Since the systems underlying 'rename()' implementation is used, the script can move files within the same partition by renaming to something with a new path at the beginning. Usually, this fails if the resulting directory does not exist already since the systems 'rename()' call does not imply the creation of directories. My script takes care of that and creates directories if neccessary.
as a special gimick, each files 'last-modified-date' can be inserted in the replacement pattern to create names that show this date. The perl variables '$d', '$m' and '$y' stand for the files modification day, month and year respectively and the variable '$T' stands for the complete date in the format YYYYMMDD

Example (add a prefix):

rename -v s/^/dummy/ *.txt

Another one (reshuffle the digits in filenames that look like MM-DD-YYYY to look like YYYYMMDD):

rename 's/(\d\d)-(\d\d)-(\d{4})/$3$1$2/' text*2000.tex

You can use other operators than s/// - for example use tr/// to lowercase all files:

rename tr/A-Z/a-z/ *.HTML

The strangest abuse of this script: move files into subdirectories for years and months based on the files modification date - creating the directories if neccessary:

rename -v 's/^/$y\/$m\//' *.txt

filter is a perl script for finding or exluding specific lines of text. I find this more convenient than piping multiple instances of grep into each other (or building a complex anded/ored regexp for grep) for a quick look at some logfile.

you can specify any number of fixed strings, prefixed by +, - or : and any number of regular expressions prefixed by ++, -- or ::.

filter then only prints those lines that meet the following criteria:

none of the - expressions match the line
all of the + expressions match the line
at least one of the : expressions (if present) matches the line

Example (search a httplog for other people looking at a particular set of files):

filter -127.0.0.1 -localhost '++particularFileNumber\d+.html' /var/log/httpd/access_log | less

Another one: look at the system log, ignoring some standard messages

filter -MARK -automount -xscreensaver '-message repeated' -CRON /var/log/messages | less

trelay is a tiny relaying http proxy (no caching - just forwarding).

This can be very usefull, if you want to access a couple of intranet sites that are only accessible from within the sites domain (your company) and you also have an account on a ssh-able machine within your companys domain - but you are currently sitting at home and are accessing the net with some generic dial-up provider. Use ssh to build a tunnel from some port of your home machine to some other port on the ssh-able machine in your companys domain and run the tiny relay proxy on that machine and port.
Now you can point your web browsers proxy configuration to the local port on your local machine and browse all you want. To the intranet servers it will look like your requests come from within the companys domain and all is good.

Example:

ssh -L 5678:localhost:6789 -l myaccount ssh-host.domain.edu "trelay -v -p 6789"

(now just point your browsers proxy config to "localhost:5678")
All communication from your browser to the relay proxy will be protected by sshs builtin encryption so your intranet admin should be able to sleep without fear of new security breaches.

If you need a ssh client for windows that can create such a tunnel look here or here.

roamie is a small server for storing and retrieving netscape roaming user profiles.

It implements the HTTP methods GET, HEAD, PUT, MOVE and DELETE, so that the Netscape client can store user preferences and bookmarks on the server and retrieve them again. This is very usefull if you want to use different Netscape installations in different environments but still keep your bookmarks in sync (at least, that's what i use it for).

You do not need to set up special user accounts or password databases for roamie. Upon connection, roamie asks the client for a username and password but instead of trying to authenticate this information in some way, it simply uses this name/password combination to create the name of a subdirectory. If this directory does not exist, it is created. All subsequent requests from this client are then executed within this directory. This way, new accounts are created by simply connecting to the server with a new username/password combination.

This imposes the main danger of roamie however: Everybody who can read the directory structure of the roamie-data can reconstruct the usernames and passwords the users chose. You should warn your users, not to reuse sensitive passwords from other accounts with roamie !!! (I mean it !!!)

To restrict abuse, roamie allows you to define regular expressions for the ip address ranges that are allowed to connect at all and for those that are allowd to create new accounts. If you open account creation to the whole world, your roamie server might be used as a relay storage by warez kiddies. In order to avoid this, a size limit is imposed on each user subdirectory. 2MB are usually enough to store even the biggest bookmark and preference collection for a single netscape user...

I typically run roamie like this:

./roamie -c '192\.168\.1\..+' -l ./roamieLogfile -b ./roamieBasedir

nph-zipcat is a cgi script which can serve the contents of zipfiles, lying in the same directory.

I use it mainly for large documentation trees which i use only seldom but which i want to have indexed by the fulltext search engine on my workstation. The best example for this might be Suns Java Tutorial which is available online as one big zipfile. I just drop it into a directory of my workstations webserver, together with nph-zipcat.cgi and voila - there it is, browsable like i had it unpacked on my harddisk. All relative links work like they should because nph-zipcat.cgi uses the PATH_INFO for specifying the elements within the zip archive. To a browser, his looks just like another URL path in which relative addressing like "../image.gif" gets translated automagically.

see for yourself:
Within the subdirectory zipcat is the script, together with an example zipfile with the following content:

hh@hh:~/tmp/ > unzip -l zipsample.zip
Archive:  /home/hh/tmp/zipsample.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
      235  03-04-02 12:03   burst.gif
       63  03-04-02 12:05   sometext.txt
      217  03-04-02 12:04   hand.right.gif
        0  03-04-02 13:02   subdir/
      166  03-04-02 13:02   subdir/somehtml.html
        0  03-04-02 13:02   subdir/subDirWithIndex/
      179  03-04-02 13:02   subdir/subDirWithIndex/index.html
      532  03-04-02 13:02   htmlWithReferences.html
 --------                   -------
     1392                   8 files

You should not use nph-zipcat.cgi for heavily used documents because it puts a substential load on your webserver: the unzip binary is called over and over again for every document (and every little inlined image). For rarely used documentation trees it is however quite usefull (to me - your mileage may vary).

other usefull (but much more specific) tools are my comix collector and wgrab

Author: Heiko Hellweg,
last modified: March 4th 2002