Data Format or Service

I have just read about OpenCARE project to convert any data format to EDXL. This is a good idea to integrate incompatible data formats that conceptually indicate to the same meaning. For example, organization A may store people database in MySQL, organization B stores people database in MS Access with totally different schema, and organization C stores people database in MS Excel. It is also impossible to force all organizations to use the same data format so OpenCARE is designed to convert any data format to XML-based format like EDXL which is an oasis standard.

In order to share free-formed raw data and information across organizations, there might be 2 approaches. Above approach is to unify data format in the last step instead of convincing all organizations to change their data storage. Another approach is to develop services for each organization to obtain raw data in RPC style. In other words, it could be called like web service. Which one is better?

Data format approach is easier to implement but you may loss lots of data, especially organization specific data. Web service sounds better but each organization needs to work more. In my opinion, we should use both approaches according to time constraint. Data format approach is suitable for urgent projects to quickly start data exchange as fast as possible. After that web service approach should be used on top of that data format.

The main advantage of web service is that you don’t need to care about how to parse data markup language. Service will always give you the raw data. So each organization may provide different data, some more and some less.

Technorati Tags: , , , , , , , ,

OpenCARE data approach

Just found your posting -- where have I been? During the tsunami incident in late 2004, system flaws were stood out: insufficient bandwidth, wrong machine sizing, inconsistent/outdated information, "Portal mindset" I-am-the-authority-if-you-want-to-know-more-come-here-often, etc. Unify data format doesn't work particularly in times of crisis; web services consume too much overhead. OpenCARE deals with structured data by putting everything in XML (EDXL). Unstructured data such as pictures can be MIME-encoded and encapsulated into EDXL as a payload; webboard-type is deliberately ignored since it lacks a good approach to information update. Using this approach, incompatible systems may share critical information by mean of field mapping from EDXL to native formats. There will be one plug-in for each native data format. It's not a goal to implement intergalactic any-to-any mapping; OpenCARE plug-ins map only critical data in EDXL standard fields, not the payload. In the end, we should have latest information on "who suffers", "where is the situation", "what kind of helps are needed", "who is doing what", "where is nothing done", "where is my relative", etc.
  • Specifications
  • blog
  • forums
opencare at gmail dot com

push or pull online or offline

In my opinion, data format is not the only one important part of the system. XML is a good candidate for structured data and programming language independent situation. Anyway, the most important is the exchanging mechanism. I have read about Communication among Nodes and it seems nobody is talking about this issue. This is just a design issue but it is the most important thing of this useful project. I just have some notes.
  • OpenCARE should be truely opened. I like Java as long as the code was written in pure Java and opensource. Some Java codes were written in native code. In addition, some of them is free but close. It would be better to choose underlying technology which is completely opened. I don't want to find big host to run Java and serve 100,000 visitors per hours.
  • Instead of focusing on technology, we should consider model, algorithm, performance and scalability. Most of all, high availability and fault tolerant must be concerned.
  • In perfect condition, all nodes are connected to the network. Online Push model should work fine and we get the most recent data very fast.
  • If all nodes are disconnected or network malfunctions, Offline Push and Online Pull model should be applied to synchronize information as often as possible.
  • P2P may be too complex and information are inconsistent. If you just want to use P2P for distributed looking up something, Kademlia should help.
  • By the way, the biggest problem of OpenCARE is the frontend which must serve massive visitors who want to search for up-to-date information from the database.
  • I have 2 approaches; a web cluster or geographically distributed web clusters. If you do care bandwidth for exchanging data among clusters, BitTorrent should help a lots. Anyway, I don't think there will have that scale of data to synchronize via BitTorrent.
As you might notice, I focus on protocol. Sorry if I misunderstand you in some points because I basically work in system programming career.

OpenCARE design trade-off

Appreciate the comment. It would be nice if this is carried on on OpenCARE wiki so that late comers could follow the ideas. The spec has been opened for critical review for 2 months. Any input that can help improve OpenCARE is truely appreciated. It is a small piece that IT people with volunteer spirit can do to mankind. It starts out in Thailand and naturally have been discussed among Thai IT people. But OpenCARE is not positioned as a relief co-ordination system for any particular country. Mass disaster is a common threat to humanity. It affects all of us in some way or another. In a recent UNDP meeting at ESCAP Bangkok, many countries shared the same sentiment. The problem was no one has started to tackle this problem. Since we are on the forefront in this space, we shouldn't take it for granted that there is an answer to every question. Besides whether or not it is a true opensource to the DNA level, which in my view appears as a religious issue, I concur with most of the suggestions. I feel sorry that I proposed OpenCARE in an international conference in May - while it was well received, we didn't have a system running to mitigate the June flood in Uttaradit, not for the July tsunami in Java. People are still suffering after no more media coverage and public attention has faded away. How long should we wait to perfect our design? In my view, an OpenCARE node should not be a presentation layer (web serving). This is one of the mistakes uncovered during the 2004 tsunami incident 20 months ago; almost all systems were designed in a monolithic manner (packing everything into one machine making it too fat - too slow). The web part should be rendered by external systems which are kept current by a data feed plug-in from a nearby OpenCARE node; this approach frees up node to handle the business logic (dissemination of information and co-ordinate relief activities). The idea of separating the web layer out of data engine was also tested, as it have been tested throughput the world, during the distribution of picture archives of the 60th anniversary celebrations of HMK's accession to the throne; a mid-range desktop PC could pump up 2.2 TB of data in a day -- that was a 214 Mbps sustained throughput over 24 hours (that's a hard limit of disk speed). It is also possible for an XML-literated client such as to fetch XML data from (another) plug-in and render the information the way it best serves the users. P2P might be an overkill, after all. Kademlia looks good. Or if we have a true separation of presentation layer, perhaps a simple ActiveMQ implementation might work and we have many additional transports and language bindings to extend our plug-in collection. Part of OpenCARE problem is myself. It is difficult to express the idea into words. Designing OpenCARE is like trying to solve a jigsaw puzzle; there are many possibilities that you don't know exactly if it is the right piece. Therefore, expert opinions are much needed, particulrly for design review. And, yes, OpenCARE will be opensource. I wish it to be truely open from the ground up, maintained and accepted by the international community. It has been and will be volunteer efforts. Difficult! but that's the way it should be. Cheers, opencare at gmail dot com

you are a good researcher

You are a great researcher and engineer. It seems you already choosed J2EE as the reference implimentation platform. As you mentioned above, we can't wait no more to find the best design and solution. I do agree. People is waiting for OpenCARE. So what is development life cycle of OpenCARE? Since it is opensource, I expect to see a code repository with public read permission. Anyone interesting in this project can check it out and try it in action. When ones find bugs or want to leave feature requests, they could be able to submit to a kind of issue tracker. And then you are able to build up community from that ground. I have registered to wiki and forum. By the way, I don't know where to start. It would be nice to have todo list for volunteers and how to contribute effectively. I don't want to mess up what you have done.

OpenCARE: next steps + situation explained

The decision on J2EE is a tough one for me; I'm not a Java programmer. But personal preferences should not be weight over design decision. I picked J2EE primarily because its versatile functionalities. We need a jump start in order to bring up a prototype as soon as we can. I've looked at the development ecosystem and have already spotted GForge. But it is difficult to do all this alone. I have settled application server part on IBM Websphere Application Server - Community Edition (wasce) where customization has not been done. The wasce is a Geronimo-based J2EE 1.4 server with lots of useful components that we can choose; we may leave alone heavyweight component, like EJB, as it takes too much of system resources. The development playform is, naturally, eclipse. This is not required; each developer has his/her own preference. And we can interact with wasce via command line. A prototype server has been setup at OpenCARE.org -- nothing there at the moment. Account is available to familiarize yourself with the system. Or you can setup your own. If OpenCARE will be a sustainable project, we need more participation, particularly from the developer community. Everyone I talked to seemed excited about the idea. But when it comes to implementation, everyone seems to adopt wait-and-see attitude. There is nothing to blame them. OpenCARE concept is so abstract that those without system architecture experience can grasp easily. This situation has been anticipated from the beginning. Therefore I tried my best to have OpenCARE as modular as possible. Then we can achieve a higher degree of parallelism such that once a node/plug-in interface is well defined, the node people can focus on EDXL parsing/validation, internode communications, etc., while the plug-in people can work on plug-in implementations -- each is independent of each other. Eventhough an easy part in this picture is to sign up to write a plug-in, there is a question what plug-in? A useful plug-in needs useful data. And many useful data is available from government agencies, which sometimes are so skeptical about NGOs and the private sector. It may take some time to talk them into openness, trust, equipping people with good information so that they can make a better judgement, etc. So you might see that recent activities are more on getting the government bought into the conceptual idea rather than technical implementation. And once we have useful data, useful plug-ins, working nodes, a complete picture is materialized -- easier for people to understand its usefulness -- I hope we have a snowball effect. Another reason that state of the project appears like this is that there are only two people working on the project. My colleague is helping on government interface, and I do the rest. I admit that it may look crazy. This is a test of our determination. Last year, the webmaster association disclosed part of the pre-cursor project to OpenCARE and its entire complexity. OpenCARE is not about money (ha ha ha), not about credit, not about who writes the code and his name apprears in history. It is about bringing up a system that is useful to the public. OpenCARE is greater than any of us, IMHO. The years I graduated, HMK speeches were on responsibilities of knowledgeable people. I'm trying on my part. A clear benefit of taking part in OpenCARE, at least to me, is that I see many people who are thinking of others. Each are different but we are smart enough to exploit these differences for public interest. We don't lose hope in goodness and righteousness. It's a small proof that we the people can burn our excess energy for something useful to others even without personal gain. I'm trying to put together a document that describe the current state of OpenCARE. The monthly update is good for those who have followed OpenCARE from the beginning but it is difficult for people who step forward later on to get start. At least this threat produces something really good. I really enjoy it. Thank you. opencare at gmail dot com

good update

I have read the current status and found out that you gave low priority to bug tracking system. In my opinion, this is the most important piece by now if you want to find volunteers, to build up community, and to track progress. GForge may be more than sufficient solution for OpenCARE. OpenCARE is a big project comprised of small components like plugins. Trac should be fit here and it is very simple to setup and make it run.

In deed! OpenCARE is such a

In deed! OpenCARE is such a big project compared to tsunami victim search. Being a low priority item doesn't mean it is not important at all. It simply mean that I am overwhelmed. Any help you or anyone can provide will be highly appreciated. opencare at gmail dot com

Post new comment