Securing Web Application Log Files

Rating: +1

Positive Negative


I am investigating ways to protect log files from our web applications.

We have several (about 10) servers (running weblogic) behind a reverse proxy, each running the same set of Web applications.
These applications use a common Java module to write log entries in XML files.
Currently the files are downloaded and converted into DB entries once a day. But many admins have access to the servers and may change log entries (although we do not see any direct reason why, but its a security issue)
We would like to have improve the security of these files and need to retain the logs for 10 years. The data quantity is about 1GB uncompressed XML per server per day.

Possible solutions we are think of:

* Using a SIEM tool like ArcSight Logger and have the log module or a collector have the data sent to its storage instantly. Is this Possible? probably is, but also probably expensive? any comments or experiences in this area?

*Securing the original log files. Is it possible to protect them in some way? Could we check and control access, changes to these files at runtime on weblogic? are there other tools that can be used? The admins need access to the server, not necessarily to the log files?

*other suggestions are welcome :)

Many thanks in advance for all your replies!

Pieter,

After you determine how you are going to output the data files to be logged there are a few other considerations that you may need to review. You may need to consider tamper-proof media,WORM storage, there are many choices today, EMC Centera is one example. And, although your volumes are small, 1GB per day, even with growth to 10GB per day, don't let that fool you into thinking that you don't need data protection. You probably need replicated storage & basic backup services at a minimum. I don't see that you would have a business continuity requirement for this type of data, but you will need sufficient temp cache in the event that the LUNS representing the final storage targets become unavailable, so plan for temp cache space, again the data volume is not a big deal, but data protection is; you will want to make sure that the staging or temp disk is Raid 0 or better to prevent points of failure between origination, staging & final writes to permanent storage.

Data accessibility is the next question; for what reasons are the log files being stored? Who will require access? Are there levels of security access required? Do you need to "log" who sees what when? Will the data have to be printed and or extracted? If yes what formats are the extracts expected?

What consequence, if any, are the deltas between format changes from the time the system generates the log file to the format that you archived it in? What meta-data do you need to capture and index as key fields so that system users can find data?

Also, with a 10 yr retention period you will need to plan for at least 1, physical & logical migration. This is where you need to consider what impact a relational db server will have on the architecture. I would recommend that you plan on keeping the meta-data together with the data, using flat file database approaches.

I’m sure that as you approach the requirement end-to-end there may be a lot more questions to answer. But the basics to consider now are storage? Do you need tamper proof media? Remember that these are API based, so you need to program the ‘puts’ and the ‘gets’. Many of these requirements are found built in to real archival server technology.

My company has a long term digital archival solution, the server runs on Linux, Solaris or Windows that you may want to take a look at. The archive technology pre-processes raw data (your staged log files) and indexes selected data elements for later search & query access, provides web services for programmatic access, & Java and Web based clients for end user access. Rights and security administration is built in so you can control who sees what; down to the row or page level; The archive has built in compression and also containerizes many files into single files for more efficient use of the storage blocks, so instead of writing many small files to disk (not efficient) the archive technology collapses hundreds, thousands or millions of small files into single larger files which can then be written to disk, the application keeps track of hex locations to ensure access to specific pages or rows from each container without having to open and read the entire file. It will also handle multiple commits so you can write to multiple storage targets, and then delete the source data after all writes are successful.

It’s actually a very mature technology for your purpose, it’s called Report Archival – it’s been used on Wall St. by the big brokerage houses to archive millions of pages of ASCI, PCL5, Xerox-MetaCode, EBCDIC formatted data that comes from Open and Mainframe systems daily and all day long intra-day.

This might actually work quite well for your purpose, considering that a formatted (ie. Syslog) log files or XML output is basically no different than a report.

Good Luck,
Peter
February 2009

* http://www.axsone.com/products_central.shtml
* http://www.emc.com/products/family/emc-centera-family.htm


Speak Your Mind

*