Programming :: Choice Of Programming Solutions For Collated Document Repository?

Feb 18, 2011

We have documents on multiple workstations and want to collate them into a single repository to provide text search and download. So far we have implemented rsync to copy files from each workstation under a directory for each workstation on a server (incidentally providing a backup) and have set up text search using Xapian with Omega; users access it via a web browser. Still to do is to set up a system to copy files from each workstation's area on the server to the repository.

Many files are duplicated. In these cases we want to preserve the names but keep a single copy of the file;hard links can be used for that.For each file to be copied from a workstation's area into the collated area we need to check whether it is a duplicate (file size and, if same, MD5 sum) and if so, create a hard link to the original rather than create a copy.A system to detect and replace duplicates in the collated area has been written using ruby and postgresql but the developer cannot commit to continuing this work. It does mean we have a postgresql database populated with "fingerprints" of files in the collated area.My first priority is to get the system working; in the longer term whatever is developed must be maintainable; I do not yet know which language skills are available locally.

I am fluent in bash and competent with awk. Ruby looks nice but I have started to learn python and do think it prudent to learn both at the same time. Python's postgresql capabilities are not settled but may be fine for the simple usage required.What to do? A bash solution would run very slowly but could be developed quickly. Language knowledge aside, I have found it difficult to install ruby on the server (CentOS 5.5;installed rvm but "gem" still not installed; seems a very complex system with its own package management).

View 7 Replies

Programming :: Choice Of Programming Solutions For Collated Document Repository?

Programming :: C++ Get Variable Name From Document Name?

Programming :: Line Buffering Using Here Document In Bash?

Programming :: Open Source Solution To Convert Pdf File To Excel Document?

Programming :: Simple Example Document For Adding A System Call On Fedora 14 Kernel?

Programming :: Latex - Include Landscape Figures In 2 Column Article Class Document?

Programming :: Preferred Method For Obtaining Harvard Style Referencing In LaTeX Document?

Programming :: Copy "WGR614v8_toolchain_src.tar.bz2" Into A Directory Of Choice?

Programming :: Code Committed In Repository Does Not Work (Next Day)

Programming :: GIT - Remote Repository Moved - How To Tell My Local Repo To Use The New One

Ubuntu Servers :: Document Repository Application - Replace An Office File Server

Programming :: Document Regarding "__gnu_cxx::hash_map"?

Programming :: 3D Programming - Difference Between JOGL And C++ OpenGL Programming?

Ubuntu :: Get Xsane To Scan A Document And Have It Display As A Full 8.5x11 Sized Document Instead Of Something Half That Size?

OpenSUSE Network :: Document Is Referencing 10.2. Or A Document For Use With SLED?

Programming :: Groovy Scripting - An Object-oriented Programming Language For The Java Platform ?

Programming :: Searching For Video Or Screen Cast Which Shows Device Driver Programming?

Programming :: Timer In Socket Programming - Wait For X Sec After Read() And Then Disconnect The Client Connection

Programming :: Totem Python Plugin Programming: Any Signal For Video Mouse Click?

Programming :: Socket And Timer Programming - Server Doesn't Execute Any Msg Which Client Sent

Programming :: Headers From Richard Stevens Network Programming Book Not Installing

Programming :: Unix Programming - Single Thread Server Can Support Exactly 2 Clients At Once

Programming :: Unable To Find Description Of Alsa's Programming Language

Programming :: Socket Programming While Displaying Received Message In File

Programming :: C++ Programming With Simple RTAI Functions Outputing Words?

Programming :: Programming Languages For Project Euler / Additional Ideas?

Programming :: Use Socket Programming In Order To Implement Chatting Feature?

Programming :: Termios Programming - Without Removing Carriage Return / Enter Key

Programming :: Unix Programming - Single Process That Does Not Start Up Any Other Threads

Programming :: Arrays In The C Programming Language Are Pointers To The First Element Of The Array?