From Situation #158
June 2007
The world wide web Engineering Activity Power (IETF) document, RFC 3696, “Application
Techniques for Checking and Transformation of Names” by John
Klensin,
offers a number of valid e-mail addresses that are rejected by a lot of PHP
validation routines. The addresses:
Abc\@def@example.com,
customer/department=shipping@example.com and
,
Windows 7 32bit!def,
Cheap Office 2007!xyz%abc@example.com
are all valid. One of many more well-liked standard expressions identified from the
literature rejects all of them:
"^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)
↪*(\.[a-z]2,3)$"
This normal expression allows only the underscore (_) and hyphen
(-) characters, numbers and lowercase alphabetic characters. Even
assuming a preprocessing step that converts uppercase alphabetic
characters to lowercase, the expression rejects addresses with
legitimate characters,
Microsoft Office 2007 Professional, such as the slash (/), equal sign (=),
Microsoft Office Standard 2010, exclamation
point (!) and percent (%). The expression also requires that the
highest-level domain component has only two or three characters, thus
rejecting legitimate domains, such as .museum.
Another favorite normal expression solution is the following:
"^[a-zA-Z0-9_.-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$"
This normal expression rejects all the valid examples inside the preceding paragraph.
It does have the grace to allow uppercase alphabetic characters, and
it doesn't make the error of assuming a high-level domain name has only
two or three characters. It allows invalid domain names,
Windows 7 Professional Key, such as
instance..com.
Listing 1 shows an example from PHP Dev Shed (www.devshed.com/c/a/PHP/Email-Address-Verification-with-PHP/2).
The code contains (at least) three errors. First, it fails to recognize
many legitimate e-mail address characters, such as percent (%). Second, it
splits the e-mail address into user name and domain parts at the at sign
(@). E-mail addresses that contain a quoted at sign, such as
Abc\@def@example.com will break this code. Third, it fails to check
for host address DNS records. Hosts with a type A DNS entry will accept
e-mail and may not necessarily publish a type MX entry. I'm not
picking on the author at PHP Dev Shed. Far more than 100 reviewers gave
this a four-out-of-five-star rating.