Exception Driven Development

Life as usual with Zed Builds and Bugs

Not too long ago, Jeff Atwood over at codinghorror.com wrote an article on Exception Driven Development. It's an interesting post and well worth the read. As I read it, I had to smile and congratulate Jeff for discussing something that is near and dear to our hearts here at Hericus Software.

I won't comment on trying to create a new methodology for software development, but I will toot our horn a bit for something that we've had in the product for a while now. As I read the article, it was almost a clear recipe for a key feature that we have in Zed Builds and Bugs that really helps to focus our bug tracking and development time. We call it our Exception Log Forwarding system. I know, I know, never let developers name anything. We're just not good at names. :-)

That being said, we are good at writing code, and this is one of the more interesting pieces of code in Zed. From the very first day in the life of the Zed code base, the internal logging system has been at the heart of the product. We've spent a good amount of time capturing log messages, and the system we have in place works really well. It performs well, doesn't get in the way of everything else going on in the system, and allows us to be as verbose as we like with the log messages.

It wasn't too long after we had the system "out in the wild" when we realized, though, that we had a gold-mine of information that could help us to make the product better. It just wasn't flowing back to our main system quickly enough. We had to rely on customer reports of strange behavior or error messages that they would see, and then, we could try to chase the problem down. Usually, though, we were without quite enough information at our disposal, unless we could share the desktop session with our customer to watch what they were doing.

Enter the Exception Log Forwarding system. We realized that if we added the (optional) feature of allowing the Zed system to automatically forward any error message dumps that were triggered by an ERROR or PANIC level log message, we would immediately know about the problem, and we would have all of the diagnostic information we would need in order to proactively fix the problem. The message forwarding is optional, and it is easily turned off in the system properties. With even a small percentage of our customers leaving it turned on, though, we get a wealth of information about things that we need to fix in the system.

Don't take this the wrong way, though. We're not being flooded with these messages. Zed is a very stable product, and at the moment, there are 14 messages in our Exception Log queue. Those messages have come in over the past 2 weeks. Some of these actually point out places where we have been a bit overzealous in terms of marking log messages as ERROR level. Some of them really qualify as a simple WARN message that can be safely ignored.

The benefit, though, is the way that we can be proactive in fixing any issues that we see coming in from the field. Combine this with our automated updating system, and we have a way to find and fix errors often before most customers realize that there is a problem.

Zed should always work for you and be something that you can rely on. We're doing our best to ensure that this is always the case.

So thanks, Jeff for the great article on Exception Driven Development. We're right there with you and have been for some time now.

Close to Our Customers

Discussion Forums

Have you ever had a frustrating experience with customer support? I don't think there's a single person out there who could honestly say that they've never been frustrated by interacting with customer support from some company somewhere in their experience. It may be a systemic thing; it may have been just a one time event. It may have been completely unexpected on both sides. Let's hope the latter was the case.

Hericus is a software company, and software tends to get people frustrated from time to time. It won't do what you want. It won't behave the way you thought it should. It can be a frustrating experience all on its own. When you add another level of frustration because of a lack of support or a problem communicating with a company then you can really get hot under the collar.

Hericus Software has a philosophy about support. We talk about this right on our Support Web page. It's one of our main pages with a prominent link right at the top of the title bar along with the Home and Buy pages. It's that important to us. Hericus's support philosophy is to make it as easy as possible to find what you need to know and get in touch with us (and all the users of Zed Builds and Bugs Manager) when you need help.

From the ground up, we've built things into Zed Builds and Bugs Manager to make it easy for you to communicate among your team, to coordinate your efforts, and to make your group as efficient as possible. Right along with those features, we've also added ways for you to communicate with us in Hericus Support, so that you feel like we're just another part of your team.

The main feature that provides this open line of communication is built-into our Zed Discussion Forums. No doubt you've seen the "Hericus Support" discussion forum as you've been working with and exploring Zed. This is the way that you can communicate with us and take advantage of the broader community of Zed users to share your experiences, ask and answer questions, and never leave the product itself. There is no special login required to access the Hericus Support forum. You can simply switch to the Discussion Forums and ask your question. Hericus Support will answer, and don't be surprised to see answers from other customers as well.

We like our customers. They are the reason we are in business, and they are the reason that we continue to write software like Zed Builds and Bugs Manager. We are developers, too. We want to put tools into the hands of developers to help you to be more productive, more organized, and communicate better within your own team. We want to help you to achieve that perfect Continuous Integration state that you strive for. We want to help you track all of your bugs--no matter what they are for. Most importantly, we want you to always feel like we're just across the hallway from you and that you can ask a question at any time.

If you haven't yet tried out the Hericus Support discussion forums, give them a try. They're available to you 24 hours a day, and we'll keep the answers flowing!

Logging

Why a mundane topic gets us excited

Logging is one of those things that someone is either passionate about, or they just don't care. There's usually not very much of a middle ground. For those who are passionate about logging, hopefully this post will give you some new ideas and start a conversation or two. For those who don't care, I'd ask you the questions:

  • Do you know what's going on in your systems right now?
  • Do you know who's logged in on your systems right now?
  • Do you know when the last problem happened with your system, and did you get enough details to do something useful about the problem?
Logging is all about communication. Our systems know what they are doing and know when things go wrong. The question is: do we know what our systems know? Is our system actually communicating with us in a timely and useful manner? Unfortunately, the answer is more often "no" than "yes." Why? Because logging is often put in as an afterthought:

You've built your system, you've suffered through the months of pain to get it working and ready to finally end up in someone else's hands, you let it out to a select few people who you trust to exercise it (but not overly abuse it) and let you know what they think. You wait for feedback. You wait for praise. You wait for condemnation. You wait...

Finally, you can wait no longer. You must know! So you pick up the phone or send, out a few e-mails to find out. "Oh, yeah. It started fine, but then, it just stopped working after I did X. I was going to get back to you on that, but everything else has been kind of hectic." What!?!?! What happened? You want to know. "What went wrong?" you ask yourself. Laboriously you walk the users through what they did, what they saw, and how the system reacted. You look at the code: "It should work..."

You finally get some trace output from one of your users, and it helps to narrow down the problem. You're stuck, though, because the traces you do have are not enough to help you zero in completely on the problem. You have a vague notion of where it is, but you need more details. Blast! Why isn't there more logging?!?

Here's the flip side, though, that can be even nastier. If you've built your system from the ground up with logging embedded everywhere you can think of, you can get yourself into a performance nightmare. Sure, you know exactly what your system is doing, exactly when, and which parts of the code are executing. However, when you look at overall system performance, it's spending 80% of its time preparing, parsing, and writing log messages, and the rest of the system is choked for resources.

Ok, sure. Just turn down the level of logging (you did include log levels, right?), and you won't spend so much time and resources, right? Well, sure. But what happens when something goes wrong? If you've turned your logging levels down to the point where system performance is acceptable, have you turned them down so far that they still have any value?

Well, that's just the difference between development systems and production systems, right? In dev, you leave lots of logging on so that you can catch the problems early. In production, you've fixed all the problems, so logging is no longer as necessary, and you can get away with turning the log levels down to just the bare minimum. Right?

There's a problem here. Go back to the story above, and read it again. In development, everything was working fine. Once the software is handed to others (put in production), strange things happen. Users tend to use our software in ways we didn't take into account. They tend to do things we didn't think of. If we don't capture enough logging when the system is in production, what will we do when there is a problem?

If the default production logging settings are useless for debugging problems, then they are useless, period.

Here at Hericus Software we've introduced a concept in logging that lets you have your cake and eat it too. The Zed Server includes sophisticated logging mechanisms that allow the server to run at highly optimized speeds while still maintaining a useful log of state and ongoing activities. The key to the logging implementation is a buffered approach that allows the server to capture logs to memory first, and, if and when required, write these logs physically to disk.

That's only half of it, though. Buffering disk writes has been around for a long time. That's not interesting any more. What's interesting is our mechanism to look back through the buffer when a new message comes in and write out "history" prior to the message.

Each log channel has two flags. The first is the Enabled flag. This flag indicates whether messages on this channel are always written to disk. The second is the Capture To Buffer flag. This flag indicates whether messages on this channel are written to the in-memory buffer. When any log message is written to disk, a certain quantity of previous messages that have been stored in the buffer will be written first, and then the log message will be written to disk. This gives each message written to disk a "history" of other messages that provide context for what was happening at the time leading up to the log message. Once a log message has been written to disk, the in-memory buffer is thrown away, and messages are again accumulated.

This allows the system to keep most logging enabled all of the time, with very little actual performance overhead incurred. Yet, any time a severe message is received, there will be enough context of what the server was doing leading up to the event to provide good debugging and tracing information.

For example, debug messages are usually not written to the disk, but they are captured to the buffer because they can contain useful information. However, they are only useful when something goes wrong. Most of the time they stay in the memory buffer for a short period and then get thrown away as the buffer cycles through. Every now and then, though, a debug message will be followed by an error message. When this happens, the error message causes the 50 (or so) prior debug, warning, and info messages to be physically written to the disk first, followed by the error message itself.

Context! All of the sudden you have context. Your error message may be great, but what happened just before it? With the logging system in Zed, we can see the context leading up to an error message and gain a wealth of information about what the problem really is.

Without that context, sometimes even the best error messages still leave you in the dark.

Welcome to Hericus Software

Home of Zed Builds and Bugs Manager

Welcome to the Hericus Software Blog. We will be using this blog as a way to share a window into the activities that are going on within Hericus Software and as a way for us to talk about the technology behind Zed Builds and Bugs Manager.

Hericus Software was started in early 2008 with the vision to produce a single product that addressed a large number of the day-to-day needs of the software development community. There is a common subset of tools every software development group needs and uses. It has been our experience, these tools rarely work as well as they could, and they almost never work together. There is only one other company that we know of that has the integration philosophy that we do, and that's FogBugz. They also get it: everything a developer needs should be in a single tool and work out of the box. The one piece they are missing, though, is Continuous Integration.

Enter Hericus Software. We provide out of the box support for:

There is no need to buy 4 separate applications to get all of this functionality. You do not have to install 4 applications on your desktop to interact with our product. In fact, chances are you will not have to install anything. A one time installation of our standard server by your IT department or project lead will make the whole suite available to everyone on your team.

Using our advanced web workspace, you have access to all applications from your favorite browser. Features such as drag and drop, tabbed views, and flexible workspaces allow you to work the way that you are most comfortable. Our goal is for the application to adapt to you rather than forcing you to adapt to the application.

Check out our demo system, which is fully functional and gives you an immediate feel for how the application suite can work for you in your environment.

Browse through our documentation, and you will see the rich content that will help you feel at ease while you are learning the system.

Coming soon: video tutorials, common use-cases, and how-to guides to help you follow best practices for your development setup.