You are viewing a single comment's thread.

view the rest of the comments →

0
7

[–] jsprogrammer [S] 0 points 7 points (+7|-0) ago 

I think the best initial approach will be to leave storage to the current major players (eg. YouTube). The first iteration would be a database of metadata and links to the content. This would drastically reduce the cost and complexity of the project, while allowing the benefit of using highly available, existing systems.

I have not been working on this specific project for three years. I have prototyped out some of the interface and worked on some data models. I have also been spending time working on other, not necessarily related, projects that I feel are also important. Many of these projects share common problems (mainly, the entire stack; from the back-end services, to network engineering, to front-end development). I have a pretty decent base project for web front-end work built up now, where I can easily begin development on new projects or prototypes in a near state-of-the-art environment. Additionally, I have been working on tools to launch and run everything necessary to host these projects on any of the cloud providers or, really, any collection of networked Linux machines.

A feature that would be nice is searching speaches so could quary John Bohner and the work Bonner to hear about Bohners bonner from Bohner, or something like that.

I think searching is a must. C-SPAN already provides some transcripts and I think YouTube does too, but I think human editable transcripts will also be required. I know that similar systems already exist. Shows like The Daily Show record television channels 24/7 to record and search through the media to find their content. The awesome thing about what we can do on the web now, is that anyone with a web browser will effectively be able to run their own Daily Show, incorporating existing media and live streams...all in a browser...broadcasted all over the world.

0
7

[–] ashekchum 0 points 7 points (+7|-0) ago 

Such a system would require the hoster to be impartial, hasn't YouTube proven that incorrect recently if they removed a link then you would need a backup to reupload it while that backup dosen't need to be weblinked it still means that you would need to have access to some storage. Running a search algorithim for terms would likly require at the least the trasnscripts to be hosted on your site for proccessing speed.

Really what you need is funding, maybe kickstarter/go fund me? But you would need a demo system working before then.

0
1

[–] jsprogrammer [S] 0 points 1 point (+1|-0) ago 

Yes, that is a potentially huge issue. I think that there would be huge pressure on anyone hosting a file to keep it. Also, if YouTube is the only place with a recording of a particular event, there isn't much choice. I think that a huge concern of the community would be ensuring that there are sufficient copies available so that nothing is lost.

I would consider transcripts to be metadata, and yes it would need to be quickly searchable. Fortunately, text is much, much smaller than audio and video, so the storage requirements are not quite as complex.

Really what you need is funding, maybe kickstarter/go fund me? But you would need a demo system working before then.

This is true, but there are constraints on what kind of projects and be offered. In past research I have done on kickstarter, I believe you must eventually produce a "finished product", so I believe it would rule out a "hosted website". I do think it would be possible to frame the project as just producing the source-code to run such a system, however.

0
2

[–] WhiteRonin 0 points 2 points (+2|-0) ago 

So what is your stack? And what languages are you using?

I'm a partical type of guy and hate Google like interviews ;-)

0
2

[–] jsprogrammer [S] 0 points 2 points (+2|-0) ago 

Well, I have not made any final decisions. There is a lot of interesting things going on at all levels it seems.

My prototyping has been ES6/7/8/2015/2016 on the front-end with a Angular1 base project I have. [I have some issues with react and Angular2, but I have also been looking at and playing with other alternatives (cycle.js, riot, and many others that I can't remember off the top of my head)]. For styling LESS, and in HTML I try to write custom, semantic tags.

For databases, I am partial to PostgreSQL...and I think it could work at some scale. I have also experimented with some of my own designs for a log-based (with full records) mutable store. I am leaning towards "proven and simple", so I am leaning towards Postgre, but it's possible that something else would be better suited at some point.

On the back-end, it's looking like CoreOS with either Kubernetes or my own scheduler (not built yet). I have a WebRTC signalling solution written in ES2015 [so I can run it on the server and on the client :)], so Node will probably come into play at some point. nginx has been my favorite for HTTPS and asset serving. I've also considered places for C++ or possibly Haskell, but I have not employed them in any of this work so far.

[–] [deleted] ago 

[Deleted]

0
0

[–] jsprogrammer [S] ago 

This has been mentioned in one or two comments, and ipfs specifically. I do think this (decentralization) is the larger goal, but I'm not sure what, if anything, is up for this challenge yet. I do think that relying on external hosts provides some level of decentralization as they become a dependency and gather, at least, social obligations. Additionally, since a large service like YouTube is already hosting this kind of content, but also tons of other, unrelated content, it would be much harder to outright shutdown. I do think the media files need to be duplicated widely though and I would be very interested to read more about how ipfs or another distributed storage solution could work for this project.