Why doesn’t Hubspot ingest incoming emails?

HubSpot Community Forums frequently contain posts from users bemoaning the fact that HubSpot does not natively ingest incoming emails1 into “deals”. Of course, we are not privy to HubSpot’s vision but one reason may be strategic – if Hubspot was to engineer native versions of all of the third-party solutions then it would possibly lose the 3,000 third-party partners2 that have helped make HubSpot so successful. It might also be a resource issue – better to concentrate engineering effort on making HubSpot the best CRM. But it is most likely to be a result of timescales and commercial viability. Allocating 100 developers to a task would not deliver a solution 100 times faster than 1 developer and it would be difficult to find a better requirement than email ingestion to illustrate this.

With Threads as third-party developers of an email ingestion application for HubSpot, you might well expect us to say this, but hopefully this blog explains the reasoning.

Threads was developed as what we call an intelligent message hub. The general concept is that Threads collects any form of digital message, mainly emails and VoIP phone calls, and stores them in a Cloud database that may be easily shared or searched by authorised users. The need for such a solution results primarily from the fact that:

  1. Most commercial messages are stored in private repositories (e.g. personal email or personal voicemail inboxes) which cannot be examined in the context of other company employees.
  2. There is a mass of useful strategic information that can be intelligently recovered by aggregating all of a company’s messages.

The body and metadata3 of text-based (i.e. Plaintext/HTML, ASCII/Unicode) messages such as emails may be stored and searched as downloaded, whereas performing the same processes on digitised telephone calls is an entirely different matter. The process of converting digitised speech into text, known as Automatic Speech Recognition (ASR) is a highly complex process, but essential if calls are able to be searched in the same way as emails – for example to search all emails and phone calls that contain the word “Threads”. Indeed, it is one of the many reasons we describe Threads as an intelligent message hub.

While collecting email messages might appear a trivial exercise, in reality it is far from trivial.
Emails are mostly exchanged using an open standard (RFC 5322). As such, messages can be sent from any platform to any other platform provided the sender and receiver comply with the standard. As it turns out, the majority of Internet users adopt email servers hosted by one of the major international IT companies – i.e., Microsoft, Google and Apple – but there is a significant proportion hosted by many other companies. The temptation is for the companies providing email services to use closed (or proprietary) standards in the hope that they will achieve greater market share by forcing communicating users to adopt their closed standard and hence their applications or services. To some extent this may work, but in general, if the application or service is adopted on its intrinsic merits rather than its lack of interoperability, then it will achieve the market share it deserves. In the case of email and telephony, the open standards (IMAP and SIP) were well entrenched before the widespread explosion in Internet use, hence service providers have had little option but to conform to open standards.

However, and this is the rub, they don’t have to conform to the open standards 100%. So while messages (including phone calls) can be exchanged, they may not look the same to the receiver as they did the sender. This can result in anything from wrongly formatted/encoded text, through inconsistent folder structures to corrupted attachments and of course, is not predictable. With so many variables in the creation of a message, and so many possible interpretations of the open standards, it may take months or even years of analysis to infer which services break the rules and how to overcome such breaches. This is one reason why ingesting email is not a straightforward engineering exercise4. After processing millions of emails over the last 10 years, we are still regularly informed of some exception or another. Not only is this impossible to predict, it is costly to support. No user wants to be told “sorry but it’s not our fault, your service provider has broken the rules” and so each incident has to be identified and, where possible, worked around. But after 10 years, the number of users affected by email clients and servers which do not conform to the standards has dropped significantly. Unfortunately, it takes the same amount of resource to identify and fix the problem for one user as it does for 1,000 users.

And message nonconformity is only part of the problem. In the last 5 years, account authentication has become an equal support burden. Such is the effect of phishing and other hacking crimes that service providers have strengthened account authorisation to the point where it is sometimes difficult for even the account owner to access his/her messages, let alone a third-party application. Processes such as two-factor authentication and app-specific passwords which many users simply may not understand let alone the need for Threads to work with them. Again, it is no good telling a customer that their email cannot be ingested because of some slip-up made in authenticating an account. It takes significant time and effort to solve many of these problems.

Last but not least, there is the issue of which emails get ingested. Most HubSpot subscribers want to ingest emails only relating to their existing employees with HubSpot accounts, so Threads must check for this before ingesting each email. Conversely, it is important not to ingest emails already in HubSpot thus creating duplicates. There are two essential processes which make manual ingestion impractical.

One can see why HubSpot would not wish to support the email “can of worms”.

So the obvious question is why would Threads wish to immerse itself into the said can of worms? Indeed, it is a question we often ask ourselves, and in many respects it would make life easier for us if HubSpot did take on the responsibility. The reason is that email is just one type of digital channel the Threads supports and since email represents a high proportion of business communications, it is not a channel we can ignore. It is the fact that Threads email processing is part of a much larger infrastructure that makes it a candidate to work with HubSpot. For sure, to develop the Threads email processing architecture for HubSpot as a stand-alone service would never have been a viable business proposition for Threads Ltd, so there is no reason to believe it is any more viable for HubSpot.

It seems that the most vocal of those HubSpot users that complain of the lack of a native email processing are those on the free service who, understandably, begrudge paying for what is an enabling part of the service. There is not much we can do to help these users since there is cost associated in hosting the service and support. Threads Ltd is not a charity. However, the overwhelming majority of Threads/HubSpot reviews rate our solution as great value, so we remain keen to support HubSpot users to integrate their outgoing and incoming email using Threads, even though it is but a small part of its functionality. Given time, we hope that HubSpot users will come to our conclusion that there is as much benefit in ingesting transcribed phone calls as there is email.

Threads provides two distinct email ingestion services, historic and ongoing. Historic, as the name implies, is designed to get historic emails into HubSpot and is frequently used by new HubSpot subscribers to get their HubSpot database up to date. The ongoing service is to keep the HubSpot service up to date once synchronised. The services are separated because not all HubSpot subscribers will necessarily require both services, and the charging structure is, of necessity, different. However, to permit potential Threads subscribers to decide if the service is for them, we provide a 30-days of email ingestion free of charge.

Some HubSpot users may feel that they would prefer to write their own code to ingest incoming emails into HubSpot. This is more practical for historic ingestion than ongoing ingestion because emails not conforming to the standards may be identified. But to summarise, these are the main subject areas requiring familiarisation..

  • The HubSpot API
  • The IMAP (and if required POP) protocols – in the case of IMAP, the definition of the folder structures for storing emails
  • Relevant user authentication methods – 2FA, app-specific passwords, etc
  • Message encoding schemes – plaintext, html, ASCII, Unicode (foreign language handling and language specific character sets)
  • Handling of attachments
  • Scheduling of email ingestion – e.g. rate at which calls are ingested and from which accounts
  • Selection of email – accounts, date ranges, etc.
  • Handling of server errors – timeouts, throttling, etc.

We hope this blog makes it clearer why Threads has become so popular for HubSpot subscribers and perhaps why email ingestion is not a native HubSpot feature.

  1. Outgoing emails are simply handled using a generic “CC” address, but incoming emails are outside of the HubSpot users’ control.
  2. According to ChatGPT, there are over 3,000 HubSpot partners worldwide. These partners are part of the HubSpot Partner Program, which includes agencies and consultants that offer services related to HubSpot’s marketing, sales, and CRM platforms. The program provides resources, training, and support to help partners deliver effective solutions to their clients using HubSpot’s tools. HubSpot partners play a crucial role in helping businesses leverage HubSpot software to grow and succeed in today’s digital landscape.
  3. The body of a message is the message itself and the metadata is information that relates to the message such as the address of sender and recipients, message subject, time of sending, etc.
  4. The official RFC5322 (which specifies what is a valid email address and what isn’t) is so obsolete and complicated that even to this day there isn’t an official regular expression that can 100% identify if an email address is valid or not! The closest one is here : https://emailregex.com/index.html but it’s rarely used because it’s quite expensive in cpu time.