The following snippet of code will make the primary key in a Django model to be
a hexadecimal string of 8 characters instead of an integer.
I wrote this earlier today, and it works fine, but I have a bad feeling about
it. I don't know why yet, but something doesn't feel right. Nevertheless, I am
putting here as a note to myself. You should probably not use it. If you do,
note that if you exhaust the possible IDs, generate_id() will recurse
forever.
importosfrombinasciiimporthexlifyfromdjango.dbimportmodels# Create your models here.classPerson(models.Model):'''Hold a Person object'''defgenerate_id():'''Generate an 8-character long hexadecimal ID'''possible=hexlify(os.urandom(4))try:# if this possible ID exists, run again:Person.objects.get(ID=possible)returnself.generate_id()except:returnpossibleID=models.CharField(max_length=8,primary_key=True,editable=False,default=generate_id)first_name=models.CharField(max_length=240)
This article is a look into the performance of one of the
regular expressions used in the
python-markdown2 Python module for
converting Markdown syntax to HTML. It was initially written for pure fun, and
in celebration of its own pointlessness, but eventually the changes proposed
here made it upstream in
pull request 207.
Replace tabs with spaces
This snippet of code replaces tab characters with a predefined number of
spaces. It is a Python port of the Perl code mentioned by
Bart Lateur in a post about
turning tabs to spaces in Perl.
A detour to Perl
The initial post in that thread was replacing tabs like this:
#!/usr/bin/perl -pis/\t/ /;
That code misses one point: if there is any string before a tab, it will simply
add four spaces after that string. However, that is not how tabs work. What
should happen is that enough spaces should be added, until the length of the
initial string plus the newly added spaces, add up to the next multiple of
four. So, the suggested substitution in Perl becomes:
s/(.*?)\t/$1.(' ' x (4-length($1)%4))/ge;
There are two flags used there: g applies the substitution for all matches of
the left pattern ((.*?)\t). Without that flag, only the first match would be
processed. The second flag, e, forces the substitute
($1.(' ' x (4-length($1)%4))) to be evaluated as an expression itself.
Without this flag, the second part would be handled as a raw string.
The _detab_re object is a compiled Regular Expression object, built with the
same pattern as the one used in the Perl example, and with the multiline flag
enabled (re.M). You can test this out at RegExr. The subn()
method of that object is called in the last line. It takes two parameters: the
_detab_sub() function, and the text to be processed. For every match of the
pattern, _detab_sub() is called, and the matched string is passed to the
_detab_sub() function for processing. Finally, subn() returns a tuple with
the text with the pattern substituted, and the number of substitutions that
happened. From that result, only the text is kept, with that subn()[0],
which seems a bit redundant, since the sub() method would do that without
requiring the [0] subscription.
No regular expressions please
Here is a Python snippet that does the same thing as the previous one, without
using regular expressions:
In the previous article on regular expressions in python-markdown2 I
dismissed the difference between a substring substitution with re.sub()
versus str.replace() as being negligible, but in this case it seems that it
is more substantial. This simple example already indicates some difference:
text='''We are NOTin Kansas any more!'''%timeit_detab(text)100000loops,bestof3:6.14usperloop%timeit_detab_no_re(text)100000loops,bestof3:3.82usperloop
# Change some spaces in the beginning of lines with tabs:
sed -i 's/^ /\t/' bzip2.c
sed -i 's/^\t /\t\t/' bzip2.c
# Lines with tabs:
grep -c '\t' bzip2.c
3032
# Total lines:
wc -l bzip2.c
6998 bzip2.c
This article is a look into the performance of one of the
regular expressions used in the
python-markdown2 Python module for
converting Markdown syntax to HTML. It was initially written for pure fun, and
in celebration of its own pointlessness, but eventually the changes proposed
here made it upstream in
pull request
204.
Standardize line endings
This regular expression appears very early in the conversion process:
text=re.sub("\r\n|\r","\n",text)
Its use is fairly obvious: it changes all single carriage returns (\r) and
all carriage returns followed by a newline (\r\n) to single newlines (\n).
The same effect can be achieved in Python with two str.replace() statements
and in fact that would be much faster. The following example uses timeit,
which comes with the IPython shell:
So the two runs of str.replace() add up to 465 nanoseconds, whereas one run
of re.sub() takes 2.31 microseconds, that is 2310 nanoseconds, or about
five times slower.
The question is: Does it matter? Well, my copy of
The Hitch Hiker's Guide to the Galaxy that includes all five books in the
series, is 776 pages long, and each full page has 42 lines (yes, I counted
twice, and now I am wondering if it was done on purpose). Following up on the
previous calculations, if you had to convert that book from Markdown to HTML,
(about 32592 lines), it would take you a whole 0.02 seconds to do that with
re.sub(), or about 0.004 seconds to do that with str.replace().
Therefore, the answer to my previous question: Does it matter? is 42.
Now the question becomes: Does it really matter? Well, if you had to
convert all 30 million paperback books that Amazon has for sale (number found
through a search on amazon.com), and assuming each book is as healthy in size
as THHGTTG, then it
would take you a week to do that with re.sub(), but only a day and a
half to do it with str.replace(). Thus, for the Python developer out there
who is pondering on converting 30 million books from Markdown to HTML, the
answer is: Go with str.replace(). For the rest of us it's still 42.
These are a couple of random links from things mentioned in talks during PyCon
2015, which took place in Dublin in October 2015. There were two tracks of
talks and two tracks of workshops. One of the non-workshop tracks was almost
dedicated to data processing with Python, and the other had various subjects.
I followed the latter track.
PyCon 2015 was organized by Python Ireland, and
took place on the 24th and 25th of October 2015.
redact-py is an Open Source Redis ORM, very simple to use and
with good performance.
I miss having the free time to attend Percona Webinars. They are really good.
This is a recording from a webinar presented by a Technical Account Manager at
Percona, regarding experiences gained from the migration of a sizeable MySQL
installation from onsite to AWS RDS. It's packed with valuable technical
information, as are Percona webinars, typically.
This is a recording of a webinar that I watched this week, titled
"Balancing Ecommerce Security with Performance". The talks refer to
companies Imperva, American Eagle and Incapsula, and there is some product
plugging going on during the talks, however there is a some amount of
introductory web application security information.
The most useful bit in my opinion was pointing out demo.testfire.net, a
test website that is open to SQL Injection for demostration purposes, and thus
it can be useful as a security awareness training material.
These instructions will allow you to run the ancient 3.6
version of Firefox on a recent Ubuntu installation, namely 15.04, but it could
apply to versions of Debian, Ubuntu and Linux Mint released close to 15.04.
The reasons why you might want to run such an old version of Firefox are
irrelevant to this post. For me, this solves a problem of very limited scope:
having to run some browser tests, written in Javascript as bookmarklets, that
only last executed correctly in Firefox 3.6. Those tests access user
information that is not available to the Javascript engine in versions of
Firefox newer that 3.6, since Mozilla tightened its security and it is not
exposing the user's visited history any more.
Now, I suppose I could migrate my tests out of the browser, read the browsing
history from some SQLite file in the user's Firefox profile, and simulate the
browser with something like Selenium, but I just cannot be bothered.
The guide
Download firefox-3.6.tar.bz2 from ftp.mozilla.org.
Decompressing this archive will give you a directory named firefox.
Move the firefox directory in /opt/. The target of these instructions
is to get /opt/firefox/firefox to execute without errors.
Trying to run /opt/firefox/firefox now, results in 'library missing'
errors for libgtk-2.0-0 and libdbus-glib-1-2. Both these libraries
exist in an Ubuntu 15.04 installation, but they are 64bit libraries
whereas Firefox 3.6 was only ever released as a 32bit application.
Both problems are solved by installing the 32bit versions of those
libraries:
Run /opt/firefox/firefox now and you should be able to enjoy the retro
experience of times gone by, with no Flash or any other plugin for that
matter. A note of caution: running such an old version of a browser is
very unsafe. Don't do anything other than testing with it, use a clean
profile (run with -P option and create a test profile), and if possible,
sandbox the application so that it can't touch anything on your main
system.
A note about library paths: Firefox 3.6 looks for libraries into its
installation directory (in this case /opt/firefox/firefox), in addition to
directories in the library path. Therefore, if you hit an issue where the
browser can't locate libraries that exist on the system, it is easier and
probably safer to create symbolic links to those libraries in
/opt/firefox/firefox rather than altering your library path just to
accommodate the needs of this old application.
Recently, one of my BackupPC clients running CentOS failed to backup, with the
contents of the host log being:
2015-06-10 01:40:10 incr backup started back to 2015-05-16 08:56:42 (backup #600) for directory /
2015-06-10 21:40:18 Aborting backup up after signal ALRM
2015-06-10 21:40:18 Got fatal error during xfer (fileListReceive failed)
...and the last bad XferLOG containing:
fileListReceive() failed
This happened a couple of times in a row, and the interval between the start
time of the backup and the failure was consistently 20 hours. While checking,
I noticed that an rsync process started on the client by BackupPC was running
for about a week. I did an strace -p <PID> on the process ID of rsync and
noticed that it was trying to stat an old NFS export, mounted from a server
that no longer exists.
As a note to myself and a future reference, here is a list of features of
VMware ESXi 6 that get disabled when the 60-day evaluation period expires:
I registered as a Beta tester and installed this evaluation version of ESXi 6
before it went into general availability, so there might be some differences
compared to GA versions, but it's not likely.
These are some notes on improving the security of a Raspberry Pi running a
fresh installation of Raspbian, before exposing it to the world, either by
giving it a public IP, or with some NAT/PAT configuration.