parsing raw email in php

I'm seeking good/working/simple to make use of php code for analyzing raw email right into components.

I've created a number of strength remedies, yet every single time, one tiny change/header/space/ something comes and also my entire parser falls short and also the task crumbles.

And also prior to I get aimed at PEAR/PECL, I require real code. My host has some screwy config or something, I can never ever appear to get is to construct right. If I do get made, some distinction in path/environment/php. ini does not constantly make it readily available (apache vs cron vs cli).

Oh, and also one last point, I'm analyzing the raw email message, NOT POP3, and also NOT IMAP. It is being piped right into the php manuscript using a.qmail email redirect.

I'm not anticipating SOF to write it for me, I'm seeking some tips/starting factors on doing it "right". This is just one of those "wheel" troubles that I recognize has actually currently been addressed.

2019-12-02 03:04:15
Source Share
Answers: 4

What are you wanting to wind up with at the end? The body, the topic, the sender, an add-on? You need to invest time with RFC2822 to recognize the layout of the mail, yet below is the most basic regulations for well created email :


That is, the first empty line (double newline) is the separator in between the HEADERS and also the BODY. A HEADER resembles this :


HSTRING constantly begins at the start of a line and also does not have any kind of white room or colons. HTEXT can have a variety of message, consisting of newlines as long as the newline char is adhered to by whitespace.

The "BODY" is actually simply any kind of information that adheres to the first double newline. (There are various regulations if you are sending mail using SMTP, yet refining it over a pipeline you do not need to bother with that).

So, in actually straightforward, circa - 1982 RFC822 terms, an email resembles this :



Most modern-day email is extra intricate than that though. Headers can be inscribed for charsets or RFC2047 comedian words, or a lots of various other things I'm not assuming of now. The bodies are actually tough to roll your very own code for nowadays to if you desire them to be purposeful. Mostly all email that is created by an MUA will certainly be MIME inscribed. That could be uuencoded message, it could be html, it could be a uuencoded succeed spread sheet.

I wish this aids give a structure for recognizing several of the really important pails of email. If you give even more history on what you are attempting to do with the information I (or somebody else) could be able to give far better instructions.

2019-12-03 04:44:47

You are possibly not mosting likely to have much enjoyable creating your very own MIME parser. The factor you are locating "overdeveloped mail taking care of packages" is due to the fact that MIME is an actually intricate set of rules/formats/encodings. COMEDIAN components can be recursive, which becomes part of the enjoyable. I assume your best choice is to write the most effective MIME trainer you can, parse a message, throw out every little thing that is not text/plain or text/html, and afterwards compel the command in the inbound string to be prefixed with COMMAND : or something comparable to make sure that you can locate it in the filth. If you start with regulations like that you have a suitable opportunity of taking care of new carriers, yet you need to prepare to fine-tune if a new carrier comes (or hell, if your existing carrier picks to transform their messaging style).

2019-12-03 02:03:04

I'm not exactly sure if this will certainly be helpful to you - hope so - yet it will undoubtedly aid others curious about figuring out extra concerning email. Marcus Bointon did among the most effective discussions qualified "Mail () and also life after Mail () " at the PHP London meeting in March this year and also the slides and also MP3 are online. He consults with some authority, having actually functioned thoroughly with email and also PHP at a deep degree.

My assumption is that you remain in for a globe of discomfort attempting to write an absolutely common parser.

MODIFY - The documents appear to have actually been gotten rid of on the PHP London website ; located the slides on Marcus' own site : Part 1 Part 2 Couldn't see the MP3 anywhere though

2019-12-03 01:56:14

yeah, ive had the ability to write a standard parser, based off that rfc and also a few other standard tutorials. yet its the multipart comedian nested borders that maintain messing me up.

i figured out that MMS (not SMS) messages sent out from my phone are simply typical e-mails, so i have a system that reviews the inbound email, checks the from (to just permit from my phone), and also makes use of the body component to run various commands on my web server. its type of like a remote by email.

due to the fact that the system is made to send images, its obtained a number of in different ways inscribed components. a mms.smil.txt component, a text/plain (which is pointless, simply claims 'this is a html message'), a application/smil component (which the component that phones would certainly pic up on), a text/html get rid of a promotion for my service provider, after that my message, yet all covered in html, after that ultimately a textfile add-on with my simple message (which is the component i usage) (if i push a photo as an add-on in the message, its placed at add-on 1, base64 inscribed, after that my message section is affixed as add-on 2)

i had it collaborating with the specific mail layout from my service provider, yet when i ran a message from a person elses phone via it, it fell short in an entire number of unpleasant means.

i have various other tasks i would certainly such as to expand this phone - > mail - > parse - > command system to, yet i require to have a stable/solid/generic parser to get the various dismantle of the mail to utilize it.

my objective would certainly be to have a function that i can feed the raw piped mail right into, and also come back a large array with associative below - selections of headers var :val sets, and also one for the body message in its entirety string

the an increasing number of i search on this, the extra i locate the very same point : large overdeveloped mail taking care of plans that do every little thing imaginable thats connected to mails, or pointless (to me, in this task) tutorials.

i assume i'm mosting likely to need to suck it up and also simply meticulously write something my self.

2019-12-03 00:22:43