All Domestic Flights Grounded in the USA last night due to a Massive FAA NOTAM Computer Failure

Status
Not open for further replies.
2 other thoughts, technology moves forward but has to remain backwards compatible with whatever this NOTAM system is? Is that a problem?
Also, will hardware to keep it running become a problem?

Any chance this was a hack?
 
2 other thoughts, technology moves forward but has to remain backwards compatible with whatever this NOTAM system is? Is that a problem?
Also, will hardware to keep it running become a problem?

Any chance this was a hack?
Even if it was the GOV/FAA will never admit it....
 
Scott, as our BITOG resident expert (you volunteered) on something like this, how does it get done? At some point it has be be updated, or does it? Kick the can down the road far enough will there be a complete system failure/shutdown?
Good question. If a company has chosen a platform and architecture with a lasting future, upgrading to a new system is relatively easy because these kinds of upgrades don't change the underlying platform, per se. Instead these kinds of upgrades are for faster CPUs, I/O subsystems, etc. The underlying operating system and application software is unchanged. Easy peasy, relatively speaking - but still very stressful.

When migrating to a whole new architecture, one to a different vendor, one that may even require re-writing the customer's unique, custom written application software in a completely different language....the horror! These are the problematic ones.

Sometimes the customer's development team needs to hire new engineering talent to support the change to a new programming language. These types of migrations can take YEARS, and even then they might never see the light of day. I have personally seen sales that never made it to completion simply because there was no viable migration plan given the mission critical nature of the customer's systems.

I suspect the FAA systems have origins that go back decades or more. There will be a time they must upgrade and migrate, but when that times comes I'm not sure how it can be done on a large scale without service interruptions or compromises of some sort.

When you're talking migrations from one vendor to another, the vendor on the way out has little motivation to cooperate with the new vendor. Like most things in life, computer sales are a zero sum game. For every vendor that wins, one loses.

Scott
 
Last edited:
No backup system???

At towered airports, yes, there essentially is. At least to safely complete a flight - but not necessarily to legally dispatch a flight.

The “ATIS” (Automatic Terminal Information System) provides a recording (or a digital transmission to planes) of the hourly weather and airport status. The ATIS is produced by the ATC tower.

The ATIS will provided real-time updates to the NOTAM system because it’s made by people right at the airport.

For example you could read a NOTAM before departure that a runway is closed. But 5 hours later when the pilots listen to, or digitally request, the ATIS, they will see the current status of the runways that are still closed.

It won’t cover everything, but many real-time things it will. Taxiway closures, runway closures, nearby construction cranes, runway friction reports, etc.

As an airline pilot, my only beef with the current NOTAM system is the formatting. It’s hard to read. BUT, it’s obviously done this way to save space/paper, so I kinda “get it”. 10 pages of NOTAMS turns into 40 pages if you format it so it’s easy to read.
 
Last edited:
As an airline pilot, my only beef with the current NOTAM system is the formatting. It’s hard to read. BUT, it’s obviously done this way to save space/paper, so I kinda “get it”. 10 pages of NOTAMS turns into 40 pages if you format it so it’s easy to read.


I’ve seen some NOTAMs. They remind me of the old TTY system we used in the Coast Guard many moons ago with their abbreviated formats.
 
I’ve seen some NOTAMs. They remind me of the old TTY system we used in the Coast Guard many moons ago with their abbreviated formats.
When I was in the USAF in weather in 1973...We would get the NOTAMS over the teletype and give them to Base Ops next to us. They would post them in the pilot briefing room.....
 
Well the gov. was all over Southwest a few weeks ago for the delays....Lets see if the Gov is all over the FAA for this....and will the Gov. pay all of the extra costs to the airlines and refund money back if flights were cancelled because of this..I say NO...double standard....do as I say not as I do...
Do you seriously see no difference between the two incidents?
 
The FAA handled the name change from Notice To Airmen (NOTAM) to the more inclusive Notice To Air Missions (NOTAM) so well a year ago, it's hard to believe that they could screw up the relatively simpler task of changing out the computer system that supports it. After their good work a year ago, I'm giving them a pass on the system update and canceled flights.
 
All I can say it was stressful. One time we had a disastrous migration to some new, fully custom software developed by the bank. It required a system shutdown on our part to support their extensive redesign.

People lost their jobs over this - literally fired within days. We're talking long time bank employees, senior engineers who had intimate knowledge of the bank's custom applications.

The fact that they planned and tried their best was not even considered. That fact that they had erasers on their pencils is what got them fired. The bank employees I'm referring to were some of the best engineers I ever worked with in my career.

Scott
Was this stuff all cobol code by chance?

Similar story for a big company I worked for. In about 1998 they were going to throw away there home grown business system and go to Oracle. they laid off all there in house programmers. I knew a few of them - just through my mentor, I was in a totally different department.

About July 1999 Oracle team said no way on earth they will be done. Hired all the laid off people back - for big money.

Y2k came and went without a hitch.
 
Our government has the money to upgrade entire FAA computer IT / system / network, security, etc…

No excuses !!!!!
 
Last edited:
At towered airports, yes, there essentially is. At least to safely complete a flight - but not necessarily to legally dispatch a flight.

The “ATIS” (Automatic Terminal Information System) provides a recording (or a digital transmission to planes) of the hourly weather and airport status. The ATIS is produced by the ATC tower.

The ATIS will provided real-time updates to the NOTAM system because it’s made by people right at the airport.

For example you could read a NOTAM before departure that a runway is closed. But 5 hours later when the pilots listen to, or digitally request, the ATIS, they will see the current status of the runways that are still closed.

It won’t cover everything, but many real-time things it will. Taxiway closures, runway closures, nearby construction cranes, runway friction reports, etc.

As an airline pilot, my only beef with the current NOTAM system is the formatting. It’s hard to read. BUT, it’s obviously done this way to save space/paper, so I kinda “get it”. 10 pages of NOTAMS turns into 40 pages if you format it so it’s easy to read.
Which is why ATIS isn’t a backup. It just isn’t.

You can’t get the 10 pages of NOTAMS for one airport into a repeating voice broadcast.
 
BTW Canadian NOTAM system is now down. Canada says we don't really need it so keep flying, no restrictions.

I think someone is sending a message.
 
An IBM mainframe sysplex with several systems running the same set of applications can "spray" requests across the systems. You can IPL one system at a time and update all the systems over the course of a few hours with no disruption to service. And then you mirror all the disk activity to another mainframe sysplex a few hours away for disaster recovery. This technology has been available for large business since the mid 1990s.

The FAA may have more specialized applications. But should have implemented the same type of clustering and disaster recovery.
 
An IBM mainframe sysplex with several systems running the same set of applications can "spray" requests across the systems. You can IPL one system at a time and update all the systems over the course of a few hours with no disruption to service. And then you mirror all the disk activity to another mainframe sysplex a few hours away for disaster recovery. This technology has been available for large business since the mid 1990s.

The FAA may have more specialized applications. But should have implemented the same type of clustering and disaster recovery.
In a mission critical environment taking hours to update backup systems to mirror the primary system would be completely unacceptable. The systems I worked on did it in the background in real-time , even when geographically separated. Database integrity was accomplished via two phase commits.

Our architecture could literally have one system disintegrate into thin air and the "backup" system would take over the workload seamlessly. For example, if you were in the process of getting money from an ATM connected to the "primary" system and it failed, the "backup" system would assume the workload so quickly you wouldn't even notice.

And I put "primary" and "backup" in quotes. On our systems there was no such thing. Each system would manage its own workload, both systems keeping an identical record of the database via checkpointing and two phase commits. The "primary" datacenter explodes into flames? The "backup" system has a fully updated, identical copy of the database and would assume all the workload of the "primary" system, as well as continuing to do its own. There was no such thing as a "backup" system. Each system was a "primary" with its own workload, each system being a fully up-to-date real-time "backup" for the other.

Scott
 
Last edited:
All sorts of banter on the FAA system however it has few known failures over time. It might be antiquated however seems to still work.

I have been migrating items off legacy systems into cloud and a whole new set of problems . That being said I love cloud software development.
 
Which is why ATIS isn’t a backup. It just isn’t.

You can’t get the 10 pages of NOTAMS for one airport into a repeating voice broadcast.

I’m not saying it’s a full, duplicate replacement for the FAA NOTAM system.

I’m saying some of the data contained in the NOTAMs, at some airports is “backed up” by the ATIS.

I guess there’s different ways to interpret what a “backup” is in this case.
 
Last edited:
In a mission critical environment taking hours to update backup systems to mirror the primary system would be completely unacceptable. The systems I worked on did it in the background in real-time , even when geographically separated. Database integrity was accomplished via two phase commits.

Our architecture could literally have one system disintegrate into thin air and the "backup" system would take over the workload seamlessly. For example, if you were in the process of getting money from an ATM connected to the "primary" system and it failed, the "backup" system would assume the workload so quickly you wouldn't even notice.

And I put "primary" and "backup" in quotes. On our systems there was no such thing. Each system would manage its own workload, both systems keeping an identical record of the database via checkpointing and two phase commits. The "primary" datacenter explodes into flames? The "backup" system has a fully updated, identical copy of the database and would assume all the workload of the "primary" system, as well as continuing to do its own. There was no such thing as a "backup" system. Each system was a "primary" with its own workload, each system being a fully up-to-date real-time "backup" for the other.

Scott
You are not understanding. The sentence you highlighted in bold was not a backup or disaster recovery scenario. It was members in a cluster in the same location sharing the same disk drives. All the systems are sharing the workload. You can pull out one system at a time to update it. The other systems in the cluster pick up the workload. When one system is returned to the cluster another one is pulled out of the cluster. Over a few hours all the systems have been pulled out, updated and started back up again. All the systems in the sysplex share the same set of disk drives/database. No movement of data within the sysplex.
 
Status
Not open for further replies.
Back
Top