WTF? Unicode Byte Order Mark

One of the benefits of being a freelance web developer is that I can easily help small businesses get on the Web. I have very little overhead (I don’t have employees), so my prices are very affordable.

Often my clients already own a domain and have a hosting account through some discount hosting company. I won’t name names, but I dislike a number of these companies because they nickle-and-dime customers for very basic features.

Case-and-point: ASP.NET support. You would think that if you bought a hosting plan on a Windows server that you could use ASP.NET to develop the website. That’s not an unreasonable assumption. But it’s very wrong to assume that.

If that weren’t interesting enough, I found that (in this particular case) PHP support was enabled by default. Yes, Windows servers can run PHP… but why is that the default option? Shouldn’t it be the other way around???

Long story short, I had to develop a recent project for a client in PHP. That’s not a problem; I’ve been writing PHP code for years (but I do prefer .NET). I wrote my code, uploaded it to the server…

…only to find that every file had a mysterious junk character added to the source code.

WTF?

I spent about 20 minutes cursing like a sailor at my computer screen. I called the web host every name in the book, none of which I really want to repeat.

After searching the internet for a few more minutes, I stumbled across this post, which explains that many IDEs insert a Byte Order Mark into each file.

WTF?

I don’t know about you, but I find this to be very, VERY annoying to have to remove.

As a public service, here’s how to remove it if you’re using Visual Studio (or Visual Web Developer):

  1. Create a new file. Write your code.
  2. Goto File > Save (file) As…
  3. Click the arrow next to the Save button. Select Save with Encoding…
  4. Confirm that you want to replace your file.
  5. From the “Encoding” drop-box, select Unicode (UTF-8 without signature) which is near the bottom of the list

Magically, the byte order mark is gone.

Share and Enjoy:
  • RSS
  • Facebook
  • StumbleUpon
  • Digg
  • Sphinn
  • Technorati
  • Reddit
  • LinkedIn
  • Twitter
  • Tumblr

About Arthur Kay

Arthur Kay is a long-time nerd and JavaScript enthusiast. He lives in the Chicago suburbs and is active in the local web development community. Arthur currently works for Sencha, Inc. as a Solutions Engineer. The thoughts, ideas, and opinions expressed on this website are Arthur's alone and do not represent his employer.
This entry was posted in Web Development. Bookmark the permalink.

3 Responses to WTF? Unicode Byte Order Mark

  1. Jorge says:

    Welcome to the club. I run into this all the time. Sometimes I just use Notepadd++, which has a more accesible “Encode in UTF-8 without BOM” menu.

  2. Pim says:

    I would recomment against saving files as UTF-8 without BOM.
    BOMs are very handy to define files as being UTF-8 encoded. Without a BOM, a UTF-8 file looks an awful lot like Windows-1252 or any other 8-bit character set.

    If you really, absolutely don’t want anything to do with BOMs, just encode as Windows-1252 (or as Microsoft likes to call it, ANSI) and be BOM-free. In the case of HTML (or PHP or anything that outputs HTML) you can write unusual characters as character entities, so you won’t need Unicode anyway.

    Pim

  3. Milton says:

    Thank you! I am using Visual Studio to edit a simple little site for a friend using SSI and this was boggling my mind.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>